Apache Lucene - 改进拼写检查器的结果

Question

Apache Lucene - 改进拼写检查器的结果

COB*_*BOL 2 java apache lucene performance spell-checking

我最近使用 Apache Lucene 实现了一个拼写检查器。我的代码如下：

public void loadDictionary() {
    try {
        File dir = new File("c:/spellchecker/");
        Directory directory = FSDirectory.open(dir);
        spellChecker = new SpellChecker(directory);
        Dictionary dictionary = new PlainTextDictionary(new File("c:/dictionary.txt"));
        IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, null);
        spellChecker.indexDictionary(dictionary, config, false);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

public String performSpellCheck(String word) {
    try {
         String[] suggestions = spellChecker.suggestSimilar(word, 1);
         if (suggestions.length > 0) {
             return suggestions[0];
         }
         else {
             return word; 
         }
    } catch (Exception e) {
        return "Error";
    }
}

Run Code Online (Sandbox Code Playgroud)

上面的代码使用了英语单词词典。我的准确性有问题。我想要它做的是建议与拼写错误的单词（即未出现在正在使用的词典中的单词）类似的单词。但是，如果我将单词“post”发送到performSpellCheck方法，它返回“poet”，也就是说，它正在纠正不需要纠正的单词（这些单词存在于词典文件中）。

关于如何改善结果有什么建议吗？

Answer 1

blu*_*er9 5

我认为，你应该使用SpellChecker.exists()方法。仅当字典中不存在该单词时才使用 suggestSimilar 方法。

归档时间：	12 年，2 月前
查看次数：	3675 次
最近记录：	12 年，2 月前