字符串列表中的单词频率

Vei*_*pse 6 java string list arraylist

我有一个字符串列表:

List<String> terms = ["Coding is great", "Search Engines are great", "Google is a nice search engine"]
Run Code Online (Sandbox Code Playgroud)

如何获得列表中每个单词的频率:例如{Coding:1, Search:2, Engines:1, engine:1, ....}

这是我的代码:

    Map<String, Integer> wordFreqMap = new HashMap<>(); 
    for (String contextTerm : term.getContexTerms()  ) 
                {
                    String[] wordsArr = contextTerm.split(" ");
                    for (String  word : wordsArr) 
                    {
                        Integer freq = wordFreqMap.get(word); //this line is getting reset every time I goto a new COntexTerm
                        freq = (freq == null) ? 1: ++freq;
                        wordFreqMap.put(word, freq);
                    }
                }
Run Code Online (Sandbox Code Playgroud)

Mar*_*o13 10

Java 8流的惯用解决方案:

import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

public class SplitWordCount
{
    public static void main(String[] args)
    {
        List<String> terms = Arrays.asList(
            "Coding is great",
            "Search Engines are great",
            "Google is a nice search engine");

        Map<String, Integer> result = terms.parallelStream().
            flatMap(s -> Arrays.asList(s.split(" ")).stream()).
            collect(Collectors.toConcurrentMap(
                w -> w.toLowerCase(), w -> 1, Integer::sum));
        System.out.println(result);
    }
}
Run Code Online (Sandbox Code Playgroud)

请注意,您可能需要考虑字符串的大小写是否应该起作用.这个将字符串转换为小写字母,并将它们用作最终映射的键.结果是:

{coding=1, a=1, search=2, are=1, engine=1, engines=1, 
     is=2, google=1, great=2, nice=1}
Run Code Online (Sandbox Code Playgroud)