将Numpy数组转换为稀疏字典的最快方法?

saf*_*fsd 5 python performance numpy

我有兴趣尽快将numpy数组转换为稀疏字典.让我详细说明:

鉴于阵列:

numpy.array([12,0,0,0,3,0,0,1])
Run Code Online (Sandbox Code Playgroud)

我希望出版字典:

{0:12, 4:3, 7:1}
Run Code Online (Sandbox Code Playgroud)

如您所见,我们只是将序列类型转换为从非零值到其值的显式映射.

为了使这更有趣,我提供以下测试工具来尝试替代方案:

from timeit import Timer

if __name__ == "__main__":
  s = "import numpy; from itertools import izip; from numpy import nonzero, flatnonzero; vector =         numpy.random.poisson(0.1, size=10000);"

  ms = [ "f = flatnonzero(vector); dict( zip( f, vector[f] ) )"
             , "f = flatnonzero(vector); dict( izip( f, vector[f] ) )"
             , "f = nonzero(vector); dict( izip( f[0], vector[f] ) )"
             , "n = vector > 0; i = numpy.arange(len(vector))[n]; v = vector[n]; dict(izip(i,v))"
             , "i = flatnonzero(vector); v = vector[vector > 0]; dict(izip(i,v))"
             , "dict( zip( flatnonzero(vector), vector[flatnonzero(vector)] ) )"
             , "dict( zip( flatnonzero(vector), vector[nonzero(vector)] ) )"
             , "dict( (i, x) for i,x in enumerate(vector) if x > 0);"
             ]
  for m in ms:
    print "  %.2fs" % Timer(m, s).timeit(1000), m
Run Code Online (Sandbox Code Playgroud)

我正在使用泊松分布来模拟我感兴趣转换的数组.

以下是我到目前为止的结果:

   0.78s f = flatnonzero(vector); dict( zip( f, vector[f] ) )
   0.73s f = flatnonzero(vector); dict( izip( f, vector[f] ) )
   0.71s f = nonzero(vector); dict( izip( f[0], vector[f] ) )
   0.67s n = vector > 0; i = numpy.arange(len(vector))[n]; v = vector[n]; dict(izip(i,v))
   0.81s i = flatnonzero(vector); v = vector[vector > 0]; dict(izip(i,v))
   1.01s dict( zip( flatnonzero(vector), vector[flatnonzero(vector)] ) )
   1.03s dict( zip( flatnonzero(vector), vector[nonzero(vector)] ) )
   4.90s dict( (i, x) for i,x in enumerate(vector) if x > 0);
Run Code Online (Sandbox Code Playgroud)

如您所见,我发现的最快解决方案是

n = vector > 0;
i = numpy.arange(len(vector))[n]
v = vector[n]
dict(izip(i,v))
Run Code Online (Sandbox Code Playgroud)

有更快的方法吗?

编辑:这一步

i = numpy.arange(len(vector))[n]
Run Code Online (Sandbox Code Playgroud)

看起来特别笨拙 - 在选择一些元素之前生成整个数组,特别是当我们知道它可能只有大约1/10的元素被选中时.我认为这可能仍有待改进.

小智 -1

试过这个吗?

从 numpy 导入其中

我 = 其中(向量 > 0)[0]