我想打破下string一句话:
library(NLP) # NLP_0.1-7
string <- as.String("Mr. Brown comes. He says hello. i give him coffee.")
Run Code Online (Sandbox Code Playgroud)
我想展示两种不同的方式.一个来自包装openNLP:
library(openNLP) # openNLP_0.2-5
sentence_token_annotator <- Maxent_Sent_Token_Annotator(language = "en")
boundaries_sentences<-annotate(string, sentence_token_annotator)
string[boundaries_sentences]
[1] "Mr. Brown comes." "He says hello." "i give him coffee."
Run Code Online (Sandbox Code Playgroud)
第二个来自包装stringi:
library(stringi) # stringi_0.5-5
stri_split_boundaries( string , opts_brkiter=stri_opts_brkiter('sentence'))
[[1]]
[1] "Mr. " "Brown comes. "
[3] "He says hello. i give him coffee."
Run Code Online (Sandbox Code Playgroud)
在第二种方式之后,我需要准备句子以删除多余的空格或再次将新的字符串分解成句子.我可以调整stringi函数来提高结果的质量吗?
当它是一个大数据时,openNLP(非常)慢stringi.
有没有办法结合stringi( - >快速)和openNLP …