在R中拆分句子,其中不需要拆分电子邮件ID或十进制数

use*_*140 1 regex string split r strsplit

我想通过句号或句号将段落分成句子.但在执行此操作时,十进制数字,电子邮件ID也会分成不同的数据帧.任何人都可以帮助我将数据拆分成句子.

例如:

aa = "For Important Disclosure information, please visit our website at 0.5%  https://javatar.bluematrix.com/sellside/Disclosures.action or call 1.888.JEFFERIES. An organization. 0.5% have an analysis."
Run Code Online (Sandbox Code Playgroud)

这应该分成

  1. For Important Disclosure information, please visit our website at 0.5% https://javatar.bluematrix.com/sellside/Disclosures.action or call 1.888.JEFFERIES.
  2. An organization.
  3. 0.5% have an analysis

码:

sentences = as.matrix(unlist(strsplit(aa,"\\.")))
Run Code Online (Sandbox Code Playgroud)

Lyz*_*deR 5

这看起来像是在工作:

strsplit(aa, '. ', fixed = TRUE)
#[[1]]
#[1] "For Important Disclosure information, please visit our website at 0.5% https://javatar.bluematrix.com/sellside/Disclosures.action or call 1.888.JEFFERIES"
#[2] "An organization"                                                                                                                                          
#[3] "0.5% have an analysis." 
Run Code Online (Sandbox Code Playgroud)