有没有人知道如何使用内置collection.sort和comparator<string>界面按照频率(从最小到最大)的顺序对单词列表进行排序?
我已经有一个方法可以获取文本文件中某个单词的计数.现在,我只需要创建一个方法来比较每个单词的计数,然后将它们放在按最小频率排序到最大值的列表中.
任何想法和提示将非常感谢.我在开始使用这种特殊方法时遇到了麻烦.
public class Parser implements Comparator<String> {
public Map<String, Integer> wordCount;
void parse(String filename) throws IOException {
File file = new File(filename);
Scanner scanner = new Scanner(file);
//mapping of string -> integer (word -> frequency)
Map<String, Integer> wordCount = new HashMap<String, Integer>();
//iterates through each word in the text file
while(scanner.hasNext()) {
String word = scanner.next();
if (scanner.next()==null) {
wordCount.put(word, 1);
}
else {
wordCount.put(word, wordCount.get(word) + 1);;
}
}
scanner.next().replaceAll("[^A-Za-z0-9]"," ");
scanner.next().toLowerCase();
}
public int getCount(String word) {
return wordCount.get(word);
}
public int compare(String w1, String w2) {
return getCount(w1) - getCount(w2);
}
//this method should return a list of words in order of frequency from least to greatest
public List<String> getWordsInOrderOfFrequency() {
List<Integer> wordsByCount = new ArrayList<Integer>(wordCount.values());
//this part is unfinished.. the part i'm having trouble sorting the word frequencies
List<String> result = new ArrayList<String>();
}
}
Run Code Online (Sandbox Code Playgroud)
首先,您的使用scanner.next()似乎不正确.next()将返回下一个单词并在每次调用时移动到下一个单词,因此以下代码:
if(scanner.next() == null){ ... }
Run Code Online (Sandbox Code Playgroud)
并且
scanner.next().replaceAll("[^A-Za-z0-9]"," ");
scanner.next().toLowerCase();
Run Code Online (Sandbox Code Playgroud)
将消耗,然后只是扔掉的话.你可能想做的是:
String word = scanner.next().replaceAll("[^A-Za-z0-9]"," ").toLowerCase();
Run Code Online (Sandbox Code Playgroud)
在while循环的开头,以便对单词的更改保存在word变量中,而不是丢弃.
其次,wordCount地图的使用略有破坏.你想要做的是检查word地图中是否已有,以决定要设置的字数.要做到这一点,不要检查scanner.next() == null你应该查看地图,例如:
if(!wordCount.containsKey(word)){
//no count registered for the word yet
wordCount.put(word, 1);
}else{
wordCount.put(word, wordCount.get(word) + 1);
}
Run Code Online (Sandbox Code Playgroud)
或者你可以这样做:
Integer count = wordCount.get(word);
if(count == null){
//no count registered for the word yet
wordCount.put(word, 1);
}else{
wordCount.put(word, count+1);
}
Run Code Online (Sandbox Code Playgroud)
我更喜欢这种方法,因为它更清洁一点,并且每个单词只查找一个地图,而第一种方法有时会进行两次查找.
现在,要获得按频率降序排列的单词列表,您可以先将地图转换为列表,然后Collections.sort()按照此帖中的建议进行应用.以下是适合您需求的简化版本:
static List<String> getWordInDescendingFreqOrder(Map<String, Integer> wordCount) {
// Convert map to list of <String,Integer> entries
List<Map.Entry<String, Integer>> list =
new ArrayList<Map.Entry<String, Integer>>(wordCount.entrySet());
// Sort list by integer values
Collections.sort(list, new Comparator<Map.Entry<String, Integer>>() {
public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {
// compare o2 to o1, instead of o1 to o2, to get descending freq. order
return (o2.getValue()).compareTo(o1.getValue());
}
});
// Populate the result into a list
List<String> result = new ArrayList<String>();
for (Map.Entry<String, Integer> entry : list) {
result.add(entry.getKey());
}
return result;
}
Run Code Online (Sandbox Code Playgroud)
希望这可以帮助.
编辑: 更改了@ dragon66建议的比较功能.谢谢.
| 归档时间: |
|
| 查看次数: |
9997 次 |
| 最近记录: |