Java Scanner按句子拆分字符串

use*_*965 1 java regex java.util.scanner

我试图将一段文本分成基于标点符号的单独句子,即[.?!]但是,即使我已经指定了特定的模式,扫描程序也会在每个新行的末尾分割行.我该如何解决这个问题?谢谢!

this is a text file. yes the
deliminator works
no it does not. why not?

Scanner scanner = new Scanner(fileInputStream);
scanner.useDelimiter("[.?!]");
while (scanner.hasNext()) {
  line = scanner.next();
  System.out.println(line);
}
Run Code Online (Sandbox Code Playgroud)

mrz*_*zli 5

我不相信扫描仪会在换行符上拆分它,只是你的"行"变量中有换行符,这就是你得到那个输出的原因.例如,您可以用空格替换这些换行符:

(我正在读取您从文件中提供的相同输入文本,因此它有一些额外的文件读取代码,但您将获得图片.)

try {
    File file = new File("assets/test.txt");
    Scanner scanner = new Scanner(file);
    scanner.useDelimiter("[.?!]");
    while (scanner.hasNext()) {
        String sentence = scanner.next();
        sentence = sentence.replaceAll("\\r?\\n", " ");
        // uncomment for nicer output
        //line = line.trim();
        System.out.println(sentence);
    }
    scanner.close();
} catch (FileNotFoundException e) {
    e.printStackTrace();
}
Run Code Online (Sandbox Code Playgroud)

这是结果:

this is a text file
 yes the deliminator works no it does not
 why not
Run Code Online (Sandbox Code Playgroud)

如果我取消修剪修剪线,它会更好一些:

this is a text file
yes the deliminator works no it does not
why not
Run Code Online (Sandbox Code Playgroud)