fuk*_*uri 6 java multithreading dictionary hashmap
tldr:如何同时在多个(只读)Java HashMaps中搜索条目?
长版:
我有几个不同大小的字典存储为HashMap< String, String >
.一旦他们被读入,他们永远不会被改变(严格只读).我想检查是否以及哪个字典用我的密钥存储了一个条目.
我的代码最初正在寻找这样的密钥:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
if (map.containsKey(key))
return new DictionaryEntry(map.get(key), i);
}
return null;
}
Run Code Online (Sandbox Code Playgroud)
然后它变得有点复杂:我的搜索字符串可能包含拼写错误,或者是存储条目的变体.就像,如果存储的密钥是"香蕉",我可能会查找"bannana"或"香蕉",但仍然希望"banana"的条目返回.使用Levenshtein-Distance,我现在遍历所有词典及其中的每个条目:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
for (Map.Entry entry : map.entrySet) {
// Calculate Levenshtein distance, store closest match etc.
}
}
// return closest match or null.
}
Run Code Online (Sandbox Code Playgroud)
到目前为止一切正常,我得到了我想要的条目.不幸的是,我需要查找大约7000个字符串,在五个不同大小的字典中(约30-70k条目),这需要一段时间.从我的处理输出中,我的强烈印象是我的查找主导整个运行时.
我改善运行时的第一个想法是并行搜索所有字典.由于没有任何字典需要更改,并且不会有多个线程同时访问字典,因此我没有看到任何安全问题.
问题是:我该怎么做?我之前从未使用过多线程.我的搜索只提出了Concurrent HashMaps(但根据我的理解,我不需要这个)和Runnable类,我必须将我的处理放入方法中run()
.我想我可以重写我当前的类以适应Runnable,但我想知道是否有更简单的方法可以做到这一点(或者我怎么能用Runnable做到这一点,现在我的有限理解认为我必须重组很多).
因为我被要求分享Levenshtein-Logic:这真的没什么特别的,但是你走了:
private int _maxLSDistance = 10;
public Map.Entry getClosestMatch(String key) {
Map.Entry _closestMatch = null;
int lsDist;
if (key == null) {
return null;
}
for (Map.Entry entry : _dictionary.entrySet()) {
// Perfect match
if (entry.getKey().equals(key)) {
return entry;
}
// Similar match
else {
int dist = StringUtils.getLevenshteinDistance((String) entry.getKey(), key);
// If "dist" is smaller than threshold and smaller than distance of already stored entry
if (dist < _maxLSDistance) {
if (_closestMatch == null || dist < _lsDistance) {
_closestMatch = entry;
_lsDistance = dist;
}
}
}
}
return _closestMatch
}
Run Code Online (Sandbox Code Playgroud)
为了在您的情况下使用多线程,可能是这样的:
“监视器”类,主要存储结果并协调线程;
public class Results {
private int nrOfDictionaries = 4; //
private ArrayList<String> results = new ArrayList<String>();
public void prepare() {
nrOfDictionaries = 4;
results = new ArrayList<String>();
}
public synchronized void oneDictionaryFinished() {
nrOfDictionaries--;
System.out.println("one dictionary finished");
notifyAll();
}
public synchronized boolean isReady() throws InterruptedException {
while (nrOfDictionaries != 0) {
wait();
}
return true;
}
public synchronized void addResult(String result) {
results.add(result);
}
public ArrayList<String> getAllResults() {
return results;
}
}
Run Code Online (Sandbox Code Playgroud)
Thread 是 self,可以设置为搜索特定的字典:
public class ThreadDictionarySearch extends Thread {
// the actual dictionary
private String dictionary;
private Results results;
public ThreadDictionarySearch(Results results, String dictionary) {
this.dictionary = dictionary;
this.results = results;
}
@Override
public void run() {
for (int i = 0; i < 4; i++) {
// search dictionary;
results.addResult("result of " + dictionary);
System.out.println("adding result from " + dictionary);
}
results.oneDictionaryFinished();
}
}
Run Code Online (Sandbox Code Playgroud)
以及演示的主要方法:
public static void main(String[] args) throws Exception {
Results results = new Results();
ThreadDictionarySearch threadA = new ThreadDictionarySearch(results, "dictionary A");
ThreadDictionarySearch threadB = new ThreadDictionarySearch(results, "dictionary B");
ThreadDictionarySearch threadC = new ThreadDictionarySearch(results, "dictionary C");
ThreadDictionarySearch threadD = new ThreadDictionarySearch(results, "dictionary D");
threadA.start();
threadB.start();
threadC.start();
threadD.start();
if (results.isReady())
// it stays here until all dictionaries are searched
// because in "Results" it's told to wait() while not finished;
for (String string : results.getAllResults()) {
System.out.println("RESULT: " + string);
}
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1009 次 |
最近记录: |