按频率顺序排序单词?(最少到最大)

use*_*781 4 java sorting

有没有人知道如何使用内置collection.sortcomparator<string>界面按照频率(从最小到最大)的顺序对单词列表进行排序?

我已经有一个方法可以获取文本文件中某个单词的计数.现在,我只需要创建一个方法来比较每个单词的计数,然后将它们放在按最小频率排序到最大值的列表中.

任何想法和提示将非常感谢.我在开始使用这种特殊方法时遇到了麻烦.

public class Parser implements Comparator<String> {

    public Map<String, Integer> wordCount;

    void parse(String filename) throws IOException {
        File file = new File(filename);
        Scanner scanner = new Scanner(file);

        //mapping of string -> integer (word -> frequency)
        Map<String, Integer> wordCount = new HashMap<String, Integer>();

        //iterates through each word in the text file
        while(scanner.hasNext()) {
            String word = scanner.next();
            if (scanner.next()==null) {
                wordCount.put(word, 1);
            }
            else {
                wordCount.put(word, wordCount.get(word) + 1);;
                }
            }
            scanner.next().replaceAll("[^A-Za-z0-9]"," ");
            scanner.next().toLowerCase();
        }

    public int getCount(String word) {
        return wordCount.get(word);
    }

    public int compare(String w1, String w2) {
        return getCount(w1) - getCount(w2);
    } 

        //this method should return a list of words in order of frequency from least to   greatest
    public List<String> getWordsInOrderOfFrequency() {
        List<Integer> wordsByCount = new ArrayList<Integer>(wordCount.values());
        //this part is unfinished.. the part i'm having trouble sorting the word frequencies
        List<String> result = new ArrayList<String>();


    }
}
Run Code Online (Sandbox Code Playgroud)

rod*_*ion 7

首先,您的使用scanner.next()似乎不正确.next()将返回下一个单词并在每次调用时移动到下一个单词,因此以下代码:

if(scanner.next() == null){ ... }
Run Code Online (Sandbox Code Playgroud)

并且

scanner.next().replaceAll("[^A-Za-z0-9]"," ");
scanner.next().toLowerCase();
Run Code Online (Sandbox Code Playgroud)

将消耗,然后只是扔掉的话.你可能想做的是:

String word = scanner.next().replaceAll("[^A-Za-z0-9]"," ").toLowerCase();
Run Code Online (Sandbox Code Playgroud)

while循环的开头,以便对单词的更改保存在word变量中,而不是丢弃.

其次,wordCount地图的使用略有破坏.你想要做的是检查word地图中是否已有,以决定要设置的字数.要做到这一点,不要检查scanner.next() == null你应该查看地图,例如:

if(!wordCount.containsKey(word)){
  //no count registered for the word yet
  wordCount.put(word, 1);
}else{
  wordCount.put(word, wordCount.get(word) + 1);
}
Run Code Online (Sandbox Code Playgroud)

或者你可以这样做:

Integer count = wordCount.get(word);
if(count == null){
  //no count registered for the word yet
  wordCount.put(word, 1);
}else{
  wordCount.put(word, count+1);
}
Run Code Online (Sandbox Code Playgroud)

我更喜欢这种方法,因为它更清洁一点,并且每个单词只查找一个地图,而第一种方法有时会进行两次查找.

现在,要获得按频率降序排列的单词列表,您可以先将地图转换为列表,然后Collections.sort()按照此帖中的建议进行应用.以下是适合您需求的简化版本:

static List<String> getWordInDescendingFreqOrder(Map<String, Integer> wordCount) {

    // Convert map to list of <String,Integer> entries
    List<Map.Entry<String, Integer>> list = 
        new ArrayList<Map.Entry<String, Integer>>(wordCount.entrySet());

    // Sort list by integer values
    Collections.sort(list, new Comparator<Map.Entry<String, Integer>>() {
        public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {
            // compare o2 to o1, instead of o1 to o2, to get descending freq. order
            return (o2.getValue()).compareTo(o1.getValue());
        }
    });

    // Populate the result into a list
    List<String> result = new ArrayList<String>();
    for (Map.Entry<String, Integer> entry : list) {
        result.add(entry.getKey());
    }
    return result;
}
Run Code Online (Sandbox Code Playgroud)

希望这可以帮助.

编辑: 更改了@ dragon66建议的比较功能.谢谢.