use*_*965 1 java regex java.util.scanner
我试图将一段文本分成基于标点符号的单独句子,即[.?!]但是,即使我已经指定了特定的模式,扫描程序也会在每个新行的末尾分割行.我该如何解决这个问题?谢谢!
this is a text file. yes the
deliminator works
no it does not. why not?
Scanner scanner = new Scanner(fileInputStream);
scanner.useDelimiter("[.?!]");
while (scanner.hasNext()) {
line = scanner.next();
System.out.println(line);
}
Run Code Online (Sandbox Code Playgroud)
我不相信扫描仪会在换行符上拆分它,只是你的"行"变量中有换行符,这就是你得到那个输出的原因.例如,您可以用空格替换这些换行符:
(我正在读取您从文件中提供的相同输入文本,因此它有一些额外的文件读取代码,但您将获得图片.)
try {
File file = new File("assets/test.txt");
Scanner scanner = new Scanner(file);
scanner.useDelimiter("[.?!]");
while (scanner.hasNext()) {
String sentence = scanner.next();
sentence = sentence.replaceAll("\\r?\\n", " ");
// uncomment for nicer output
//line = line.trim();
System.out.println(sentence);
}
scanner.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Run Code Online (Sandbox Code Playgroud)
这是结果:
this is a text file
yes the deliminator works no it does not
why not
Run Code Online (Sandbox Code Playgroud)
如果我取消修剪修剪线,它会更好一些:
this is a text file
yes the deliminator works no it does not
why not
Run Code Online (Sandbox Code Playgroud)