如何将段落分成句子?

Pea*_*Gen 6 java regex string split text-segmentation

请看下面的内容.

String[]sentenceHolder = titleAndBodyContainer.split("\n|\\.(?!\\d)|(?<!\\d)\\.");
Run Code Online (Sandbox Code Playgroud)

这就是我试图将一个段落分成句子的方式.但有个问题.我的段落包括日期,如Jan. 13, 2014单词U.S和数字2.2.他们都被上面的代码分开了.所以基本上,这个代码分裂了许多"点",无论它是否完全停止.

我试着String[]sentenceHolder = titleAndBodyContainer.split(".\n");String[]sentenceHolder = titleAndBodyContainer.split("\\.");为好.都失败了.

如何"恰当地"将一个段落分成句子?

Ruc*_*era 14

你可以试试这个

String str = "This is how I tried to split a paragraph into a sentence. But, there is a problem. My paragraph includes dates like Jan.13, 2014 , words like U.S and numbers like 2.2. They all got split by the above code.";

Pattern re = Pattern.compile("[^.!?\\s][^.!?]*(?:[.!?](?!['\"]?\\s|$)[^.!?]*)*[.!?]?['\"]?(?=\\s|$)", Pattern.MULTILINE | Pattern.COMMENTS);
Matcher reMatcher = re.matcher(str);
while (reMatcher.find()) {
    System.out.println(reMatcher.group());
}
Run Code Online (Sandbox Code Playgroud)

输出:

This is how I tried to split a paragraph into a sentence.
But, there is a problem.
My paragraph includes dates like Jan.13, 2014 , words like U.S and numbers like 2.2.
They all got split by the above code.
Run Code Online (Sandbox Code Playgroud)

  • 嗯......"Jan."之后的空间在哪里 去...多么神秘;) (2认同)

小智 1

String[] sentenceHolder = titleAndBodyContainer.split("(?i)(?<=[.?!])\\S+(?=[a-z])");
Run Code Online (Sandbox Code Playgroud)

试试这个,它对我有用。