I Z*_*I Z 6 python lookup performance dictionary startswith
我有一个字典,其密钥以共享相同前缀的集合形式出现,如下所示:
d = { "key1":"valA", "key123":"valB", "key1XY":"valC",
"key2":"valD", "key2-22":"valE" }
Run Code Online (Sandbox Code Playgroud)
给定一个查询字符串,我需要查找与以该前缀开头的键相关联的所有值,例如query="key1"我需要获取["valA", "valB", "valC"]
我的下面的实现工作,但对于大量的查询来说太慢了,因为字典d有大约30,000个键,大多数键的长度超过20个字符:
result = [d[s] for s in d.keys() if s.startswith(query)]
Run Code Online (Sandbox Code Playgroud)
是否有更快/更有效的方法来实现这一点?
您可以避免生成dict.keys()(在python 2.x中)生成的中间列表:
result = [d[key] for key in d if key.startswith(query)]
Run Code Online (Sandbox Code Playgroud)
但是你很可能想要使用trie而不是字典,因此你可以找到与具有公共前缀的键相关联的所有值(trie类似于基于前缀的树).
在这里,您可以找到一些不同的尝试实现.
键"A","to","tea","ted","ten","i","in"和"inn"的特里.(来源维基百科)
让我们比较不同解决方案的时间:
# create a dictionary with 30k entries
d = {str(x):str(x) for x in xrange(1, 30001)}
query = '108'
# dict with keys()
%timeit [d[s] for s in d.keys() if s.startswith(query)]
100 loops, best of 3: 8.87 ms per loop
Run Code Online (Sandbox Code Playgroud)
# dict without keys()
%timeit [d[s] for s in d if s.startswith(query)]
100 loops, best of 3: 7.83 ms per loop
# 11.72% improvement
Run Code Online (Sandbox Code Playgroud)
# PyTrie (https://pypi.python.org/pypi/PyTrie/0.2)
import pytrie
pt = pytrie.Trie(d)
%timeit [pt[s] for s in pt.iterkeys(query)]
1000 loops, best of 3: 320 µs per loop
# 96.36% improvement
Run Code Online (Sandbox Code Playgroud)
# datrie (https://pypi.python.org/pypi/datrie/0.7)
import datrie
dt = datrie.Trie('0123456789')
for key, val in d.iteritems():
dt[unicode(key)] = val
%timeit [dt[s] for s in dt.keys(unicode(query))]
10000 loops, best of 3: 162 µs per loop
# 98.17% improvement
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1290 次 |
| 最近记录: |