我确实理解以我的方式查询defaultdict中不存在的键会将项添加到defaultdict.这就是为什么在性能方面将我的第二个代码片段与我的第一个代码片段进行比较是公平的.
import numpy as num
from collections import defaultdict
topKeys = range(16384)
keys = range(8192)
table = dict((k,defaultdict(int)) for k in topKeys)
dat = num.zeros((16384,8192), dtype="int32")
print "looping begins"
#how much memory should this use? I think it shouldn't use more that a few
#times the memory required to hold (16384*8192) int32's (512 mb), but
#it uses 11 GB!
for k in topKeys:
for j in keys:
dat[k,j] = table[k][j]
print "done"
Run Code Online (Sandbox Code Playgroud)
这里发生了什么?此外,与第一个相比,这个类似的脚本需要运行,并且还使用了荒谬的内存量.
topKeys = range(16384)
keys = range(8192)
table …Run Code Online (Sandbox Code Playgroud)