How can I find the number of lines that contain a certain word in java using Java Stream?

Question

How can I find the number of lines that contain a certain word in java using Java Stream?

My method would read from a text file and find the word "the" inside of each line and count how many lines contain the word. My method does work but the issue is that I need only lines that contain the word by itself, not a substring of the word as well

For example, I wouldn't want "therefore" even though it contains "the" it's not by itself.

I'm trying to find a way to limit the lines to those that contain "the" and have the length of the word be exactly 3 but I'm unable to do that.

Here is my method right now:

public static long findThe(String filename) {
    long count = 0;
    
    try {
        Stream<String> lines = Files.lines(Paths.get(filename));
         count = lines.filter(w->w.contains("the"))
                .count();
        
        } 
    catch (IOException x)
    {
        // TODO Auto-generated catch block
        System.out.println("File: " + filename + " not found");
    }

    
    System.out.println(count);
    return count;
}

Run Code Online (Sandbox Code Playgroud)

For example, if a text file contains these lines:

public static long findThe(String filename) {
    long count = 0;
    
    try {
        Stream<String> lines = Files.lines(Paths.get(filename));
         count = lines.filter(w->w.contains("the"))
                .count();
        
        } 
    catch (IOException x)
    {
        // TODO Auto-generated catch block
        System.out.println("File: " + filename + " not found");
    }

    
    System.out.println(count);
    return count;
}

Run Code Online (Sandbox Code Playgroud)

The method would return 4

Answer 1

Boh*_*ian 6

使用正则表达式来强制字边界：

count = lines.filter(w -> w.matches("(?i).*\\bthe\\b.*")).count();

Run Code Online (Sandbox Code Playgroud)

或者对于一般情况：

count = lines.filter(w -> w.matches("(?i).*\\b" + search + "\\b.*")).count();

Run Code Online (Sandbox Code Playgroud)

细节：

\b 意思是“词边界”
(?i) 意思是“忽略大小写”

使用单词边界可防止"Therefore"匹配。

请注意，在 java 中，与许多其他语言不同，String#matches()必须匹配整个字符串（不仅仅是在字符串中找到匹配项）才能 return true，因此.*在正则表达式的任一端。

归档时间：	5 年，5 月前
查看次数：	58 次
最近记录：	5 年，5 月前