如何阻止java拼写检查程序纠正重复单词

ary*_*ary 3 java algorithm if-statement spell-checking

我已经实现了一个执行以下操作的程序:

  1. 将网页中的所有单词扫描成字符串(使用jsoup)
  2. 过滤掉所有HTML标记和代码
  3. 将这些单词放入拼写检查程序并提供建议

拼写检查程序将dictionary.txt文件加载到数组中,并将字符串输入与字典内的单词进行比较.

我当前的问题是,当输入包含多次相同的单词时,例如"teh program was teh worst",代码将打印出来

You entered 'teh', did you mean 'the'?
You entered 'teh', did you mean 'the'?
Run Code Online (Sandbox Code Playgroud)

有时网站会反复出现多个单词,这可能会变得混乱.

如果可能的话,打印单词以及错误拼写的次数将是完美的,但是对每个打印一次的单词施加限制就足够了.

我的程序有一些方法和两个类,但拼写检查方法如下:

注意:原始代码包含一些删除标点符号的"if"语句,但为了清楚起见,我删除了它们.

static boolean suggestWord;

public static String checkWord(String wordToCheck) {
        String wordCheck;
        String word = wordToCheck.toLowerCase();

    if ((wordCheck = (String) dictionary.get(word)) != null) {
        suggestWord = false; // no need to ask for suggestion for a correct
                                // word.
        return wordCheck;
    }

    // If after all of these checks a word could not be corrected, return as
    // a misspelled word.
    return word;
}
Run Code Online (Sandbox Code Playgroud)

TEMPORARY EDIT:根据要求,完整的代码:

第1类:

public class ParseCleanCheck {

        static Hashtable<String, String> dictionary;// To store all the  words of the
        // dictionary
        static boolean suggestWord;// To indicate whether the word is spelled
                                    // correctly or not.

        static Scanner urlInput = new Scanner(System.in);
        public static String cleanString;
        public static String url = "";
        public static boolean correct = true;


        /**
         * PARSER METHOD
         */
        public static void PageScanner() throws IOException {
            System.out.println("Pick an english website to scan.");

            // This do-while loop allows the user to try again after a mistake
            do {
                try {
                    System.out.println("Enter a URL, starting with http://");
                    url = urlInput.nextLine();
                    // This creates a document out of the HTML on the web page
                    Document doc = Jsoup.connect(url).get();
                    // This converts the document into a string to be cleaned
                    String htmlToClean = doc.toString();
                    cleanString = Jsoup.clean(htmlToClean, Whitelist.none());


                    correct = false;
                } catch (Exception e) {
                    System.out.println("Incorrect format for a URL. Please try again.");
                }
            } while (correct);
        }

        /**
         * SPELL CHECKER METHOD
         */
        public static void SpellChecker() throws IOException {
            dictionary = new Hashtable<String, String>();
            System.out.println("Searching for spelling errors ... ");

            try {
                // Read and store the words of the dictionary
                BufferedReader dictReader = new BufferedReader(new FileReader("dictionary.txt"));

                while (dictReader.ready()) {
                    String dictInput = dictReader.readLine();
                    String[] dict = dictInput.split("\\s"); // create an array of
                                                            // dictionary words

                    for (int i = 0; i < dict.length; i++) {
                        // key and value are identical
                        dictionary.put(dict[i], dict[i]);
                    }
                }
                dictReader.close();
                String user_text = "";

                // Initializing a spelling suggestion object based on probability
                SuggestSpelling suggest = new SuggestSpelling("wordprobabilityDatabase.txt");

                // get user input for correction
                {

                    user_text = cleanString;
                    String[] words = user_text.split(" ");

                    int error = 0;

                    for (String word : words) {
                        if(!dictionary.contains(word)) {
                            checkWord(word);


                            dictionary.put(word, word);
                        }
                        suggestWord = true;
                        String outputWord = checkWord(word);

                        if (suggestWord) {
                            System.out.println("Suggestions for " + word + " are:  " + suggest.correct(outputWord) + "\n");
                            error++;
                        }
                    }

                    if (error == 0) {
                        System.out.println("No mistakes found");
                    }
                }

            } catch (IOException e) {
                e.printStackTrace();
                System.exit(-1);
            }
        }

        /**
         * METHOD TO SPELL CHECK THE WORDS IN A STRING. IS USED IN SPELL CHECKER
         * METHOD THROUGH THE "WORD" STRING
         */

        public static String checkWord(String wordToCheck) {
            String wordCheck;
            String word = wordToCheck.toLowerCase();

        if ((wordCheck = (String) dictionary.get(word)) != null) {
            suggestWord = false; // no need to ask for suggestion for a correct
                                    // word.
            return wordCheck;
        }

        // If after all of these checks a word could not be corrected, return as
        // a misspelled word.
        return word;
    }
    }
Run Code Online (Sandbox Code Playgroud)

有一个第二类(SuggestSpelling.java),它包含一个概率计算器,但现在不相关,除非你计划为自己运行代码.

Kai*_*dul 5

使用a HashSet来检测重复项 -

Set<String> wordSet = new HashSet<>();
Run Code Online (Sandbox Code Playgroud)

并存储输入句子的每个单词.如果在插入过程中已存在任何单词HashSet,请不要调用checkWord(String wordToCheck)该单词.像这样的东西 -

String[] words = // split input sentence into words
for(String word: words) {
    if(!wordSet.contains(word)) {
        checkWord(word);
        // do stuff
        wordSet.add(word);
    }
}
Run Code Online (Sandbox Code Playgroud)

编辑

// ....
{

    user_text = cleanString;
    String[] words = user_text.split(" ");
    Set<String> wordSet = new HashSet<>();

    int error = 0;

    for (String word : words) {
        // wordSet is another data-structure. Its only for duplicates checking, don't mix it with dictionary
        if(!wordSet.contains(word)) {

            // put all your logic here

            wordSet.add(word);
        }
    }

    if (error == 0) {
        System.out.println("No mistakes found");
    }
}
// .... 
Run Code Online (Sandbox Code Playgroud)

你也一样喜欢你逝去的其他错误String wordCheck作为的论据checkWord和内部重新声明它checkWord()String wordCheck;这是不对的.请检查其他部分.