我可以为此循环提供更快的性能吗?

Bra*_*rad 4 java performance loops

我正在读一本书并从中删除了一些单词.我的问题是这个过程需要很长时间,我想让它的性能更好(更少的时间),例如:

Vector<String> pages = new Vector<String>();  // Contains about 1500 page, each page has about 1000 words.
Vector<String> wordsToDelete = new Vector<String>();  // Contains about 50000 words.

for( String page: pages ) {
    String pageInLowCase = page.toLowerCase();

    for( String wordToDelete: wordsToDelete ) {
        if( pageInLowCase.contains( wordToDelete ) )
            page = page.replaceAll( "(?i)\\b" + wordToDelete + "\\b" , "" );
    }

    // Do some staff with the final page that does not take much time.
}
Run Code Online (Sandbox Code Playgroud)

此代码大约需要3分钟才能执行.如果我跳过replaceAll(...)的循环,我可以节省超过2分钟.那么有没有办法以更快的性能执行相同的循环?

Nik*_*bak 12

是的,您可以以不同的方式处理页面.基本思路如下

for (String word : page) {
    if (!forbiddenWords.contains(word)) {
        pageResult.append(word);
    }
}
Run Code Online (Sandbox Code Playgroud)

forbiddenWords是一套.
此外,for (String word : page)还是将页面解析为单词列表的简写.不要忘记将空格添加到结果中(为了清楚起见,我跳过它).

在原始版本中处理一个页面的复杂性是~50000*1000,而现在它只有~1000.(检查单词是否处于HashSet恒定时间)

编辑
因为我想把自己从工作中转移十分钟,这里是代码:)

    String text = "This is a bad word, and this is very bad, terrible word.";
    Set<String> forbiddenWords = new HashSet<String>(Arrays.asList("bad", "terrible"));

    text += "|"; // mark end of text
    boolean readingWord = false;
    StringBuilder currentWord = new StringBuilder();
    StringBuilder result = new StringBuilder();

    for (int pos = 0; pos < text.length(); ++pos) {
        char c = text.charAt(pos);
        if (readingWord) {
            if (Character.isLetter(c)) {
                currentWord.append(c);
            } else {
                // finished reading a word
                readingWord = false;
                if (!forbiddenWords.contains(currentWord.toString().toLowerCase())) {
                    result.append(currentWord);
                }

                result.append(c);
            }
        } else {
            if (Character.isLetter(c)) {
                // start reading a new word
                readingWord = true;
                currentWord.setLength(0);
                currentWord.append(c);
            } else {
                // append punctuation marks and spaces to result immediately
                result.append(c); 
            }
        }
    }

    result.setLength(result.length() - 1); // remove end of text mark
    System.out.println(result);
Run Code Online (Sandbox Code Playgroud)


Boz*_*zho 5

首先,你可以摆脱contains(..)支票.它增加了不必要的开销.并且有时会返回真实情况,但事实并非如此.例如true,即使页面上只有"结",它也会返回"not".

另一件事 - 替换VectorArrayList.

正如康拉德在评论中指出的那样 - 你没有改变向量.String是不可变的,所以你不是在改变对象.您必须使用set(..)(并维护迭代索引).