My method would read from a text file and find the word "the" inside of each line and count how many lines contain the word. My method does work but the issue is that I need only lines that contain the word by itself, not a substring of the word as well
For example, I wouldn't want "therefore" even though it contains "the" it's not by itself.
I'm trying to find a way to limit the lines to those that contain "the" and have the length of the word be exactly 3 but I'm unable to do that.
Here is my method right now:
public static long findThe(String filename) {
long count = 0;
try {
Stream<String> lines = Files.lines(Paths.get(filename));
count = lines.filter(w->w.contains("the"))
.count();
}
catch (IOException x)
{
// TODO Auto-generated catch block
System.out.println("File: " + filename + " not found");
}
System.out.println(count);
return count;
}
Run Code Online (Sandbox Code Playgroud)
For example, if a text file contains these lines:
public static long findThe(String filename) {
long count = 0;
try {
Stream<String> lines = Files.lines(Paths.get(filename));
count = lines.filter(w->w.contains("the"))
.count();
}
catch (IOException x)
{
// TODO Auto-generated catch block
System.out.println("File: " + filename + " not found");
}
System.out.println(count);
return count;
}
Run Code Online (Sandbox Code Playgroud)
The method would return 4
使用正则表达式来强制字边界:
count = lines.filter(w -> w.matches("(?i).*\\bthe\\b.*")).count();
Run Code Online (Sandbox Code Playgroud)
或者对于一般情况:
count = lines.filter(w -> w.matches("(?i).*\\b" + search + "\\b.*")).count();
Run Code Online (Sandbox Code Playgroud)
细节:
\b 意思是“词边界”(?i) 意思是“忽略大小写”使用单词边界可防止"Therefore"匹配。
请注意,在 java 中,与许多其他语言不同,String#matches()必须匹配整个字符串(不仅仅是在字符串中找到匹配项)才能 return true,因此.*在正则表达式的任一端。
| 归档时间: |
|
| 查看次数: |
58 次 |
| 最近记录: |