同时搜索多个HashMaps

fuk*_*uri 6 java multithreading dictionary hashmap

tldr:如何同时在多个(只读)Java HashMaps中搜索条目?


长版:

我有几个不同大小的字典存储为HashMap< String, String >.一旦他们被读入,他们永远不会被改变(严格只读).我想检查是否以及哪个字典用我的密钥存储了一个条目.

我的代码最初正在寻找这样的密钥:

public DictionaryEntry getEntry(String key) {
    for (int i = 0; i < _numDictionaries; i++) {
        HashMap<String, String> map = getDictionary(i);
        if (map.containsKey(key))
             return new DictionaryEntry(map.get(key), i);
    }
    return null;
}
Run Code Online (Sandbox Code Playgroud)

然后它变得有点复杂:我的搜索字符串可能包含拼写错误,或者是存储条目的变体.就像,如果存储的密钥是​​"香蕉",我可能会查找"bannana"或"香蕉",但仍然希望"banana"的条目返回.使用Levenshtein-Distance,我现在遍历所有词典及其中的每个条目:

public DictionaryEntry getEntry(String key) {
    for (int i = 0; i < _numDictionaries; i++) {
        HashMap<String, String> map = getDictionary(i);
        for (Map.Entry entry : map.entrySet) {
            // Calculate Levenshtein distance, store closest match etc.
        }
    }
    // return closest match or null.
}    
Run Code Online (Sandbox Code Playgroud)

到目前为止一切正常,我得到了我想要的条目.不幸的是,我需要查找大约7000个字符串,在五个不同大小的字典中(约30-70k条目),这需要一段时间.从我的处理输出中,我的强烈印象是我的查找主导整个运行时.

我改善运行时的第一个想法是并行搜索所有字典.由于没有任何字典需要更改,并且不会有多个线程同时访问字典,因此我没有看到任何安全问题.

问题是:我该怎么做?我之前从未使用过多线程.我的搜索只提出了Concurrent HashMaps(但根据我的理解,我不需要这个)和Runnable类,我必须将我的处理放入方法中run().我想我可以重写我当前的类以适应Runnable,但我想知道是否有更简单的方法可以做到这一点(或者我怎么能用Runnable做到这一点,现在我的有限理解认为我必须重组很多).


因为我被要求分享Levenshtein-Logic:这真的没什么特别的,但是你走了:

private int _maxLSDistance = 10;
public Map.Entry getClosestMatch(String key) {
    Map.Entry _closestMatch = null;
    int lsDist;

    if (key == null) {
        return null;
    }

    for (Map.Entry entry : _dictionary.entrySet()) {
        // Perfect match
        if (entry.getKey().equals(key)) {
            return entry;
        }
        // Similar match
        else {
            int dist = StringUtils.getLevenshteinDistance((String) entry.getKey(), key);

            // If "dist" is smaller than threshold and smaller than distance of already stored entry
            if (dist < _maxLSDistance) {
                if (_closestMatch == null || dist < _lsDistance) {
                    _closestMatch = entry;
                    _lsDistance = dist;
                }
            }
        }
    }
    return _closestMatch
}
Run Code Online (Sandbox Code Playgroud)

Joh*_*hny 2

为了在您的情况下使用多线程,可能是这样的:

“监视器”类,主要存储结果并协调线程;

public class Results {

    private int nrOfDictionaries = 4; //

    private ArrayList<String> results = new ArrayList<String>();

    public void prepare() {
        nrOfDictionaries = 4;
        results = new ArrayList<String>();
    }

    public synchronized void oneDictionaryFinished() {
        nrOfDictionaries--;
        System.out.println("one dictionary finished");
        notifyAll();
    }

    public synchronized boolean isReady() throws InterruptedException {

        while (nrOfDictionaries != 0) {
            wait();
        }

        return true;
    }

    public synchronized void addResult(String result) {
        results.add(result);
    }

    public ArrayList<String> getAllResults() {
        return results;
    }
}
Run Code Online (Sandbox Code Playgroud)

Thread 是 self,可以设置为搜索特定的字典:

public class ThreadDictionarySearch extends Thread {

    // the actual dictionary
    private String dictionary;
    private Results results;

    public ThreadDictionarySearch(Results results, String dictionary) {
        this.dictionary = dictionary;
        this.results = results;
    }

    @Override
    public void run() {

        for (int i = 0; i < 4; i++) {
            // search dictionary;
            results.addResult("result of " + dictionary);
            System.out.println("adding result from " + dictionary);
        }

        results.oneDictionaryFinished();
    }

}
Run Code Online (Sandbox Code Playgroud)

以及演示的主要方法:

public static void main(String[] args) throws Exception {

    Results results = new Results();

    ThreadDictionarySearch threadA = new ThreadDictionarySearch(results, "dictionary A");
    ThreadDictionarySearch threadB = new ThreadDictionarySearch(results, "dictionary B");
    ThreadDictionarySearch threadC = new ThreadDictionarySearch(results, "dictionary C");
    ThreadDictionarySearch threadD = new ThreadDictionarySearch(results, "dictionary D");

    threadA.start();
    threadB.start();
    threadC.start();
    threadD.start();

    if (results.isReady())
    // it stays here until all dictionaries are searched
    // because in "Results" it's told to wait() while not finished;

for (String string : results.getAllResults()) {
        System.out.println("RESULT: " + string);
    }
Run Code Online (Sandbox Code Playgroud)