小编SRR*_*sel的帖子

如何使用OpenNLP和stringi检测句子边界?

我想打破下string一句话:

library(NLP) # NLP_0.1-7  
string <- as.String("Mr. Brown comes. He says hello. i give him coffee.")
Run Code Online (Sandbox Code Playgroud)

我想展示两种不同的方式.一个来自包装openNLP:

library(openNLP) # openNLP_0.2-5  

sentence_token_annotator <- Maxent_Sent_Token_Annotator(language = "en")  
boundaries_sentences<-annotate(string, sentence_token_annotator)  
string[boundaries_sentences]  

[1] "Mr. Brown comes."   "He says hello."     "i give him coffee."  
Run Code Online (Sandbox Code Playgroud)

第二个来自包装stringi:

library(stringi) # stringi_0.5-5  

stri_split_boundaries( string , opts_brkiter=stri_opts_brkiter('sentence'))

[[1]]  
 [1] "Mr. "                              "Brown comes. "                    
 [3] "He says hello. i give him coffee."
Run Code Online (Sandbox Code Playgroud)

在第二种方式之后,我需要准备句子以删除多余的空格或再次将新的字符串分解成句子.我可以调整stringi函数来提高结果的质量吗?

当它是一个大数据时,openNLP(非常)慢stringi.
有没有办法结合stringi( - >快速)和openNLP …

regex r text-mining opennlp stringi

12
推荐指数
2
解决办法
705
查看次数

标签 统计

opennlp ×1

r ×1

regex ×1

stringi ×1

text-mining ×1