我试图numpy.array根据给定的密钥翻译a的每个元素:
例如:
a = np.array([[1,2,3],
[3,2,4]])
my_dict = {1:23, 2:34, 3:36, 4:45}
Run Code Online (Sandbox Code Playgroud)
我想得到:
array([[ 23., 34., 36.],
[ 36., 34., 45.]])
Run Code Online (Sandbox Code Playgroud)
我可以看到如何使用循环:
def loop_translate(a, my_dict):
new_a = np.empty(a.shape)
for i,row in enumerate(a):
new_a[i,:] = map(my_dict.get, row)
return new_a
Run Code Online (Sandbox Code Playgroud)
是否有更高效和/或纯粹的numpy方式?
编辑:
我计时了,np.vectorizeDSM提出的方法对于更大的数组要快得多:
In [13]: def loop_translate(a, my_dict):
....: new_a = np.empty(a.shape)
....: for i,row in enumerate(a):
....: new_a[i,:] = map(my_dict.get, row)
....: return new_a
....:
In [14]: def vec_translate(a, my_dict):
....: return np.vectorize(my_dict.__getitem__)(a)
....:
In [15]: a = np.random.randint(1,5, (4,5))
In [16]: a
Out[16]:
array([[2, 4, 3, 1, 1],
[2, 4, 3, 2, 4],
[4, 2, 1, 3, 1],
[2, 4, 3, 4, 1]])
In [17]: %timeit loop_translate(a, my_dict)
10000 loops, best of 3: 77.9 us per loop
In [18]: %timeit vec_translate(a, my_dict)
10000 loops, best of 3: 70.5 us per loop
In [19]: a = np.random.randint(1, 5, (500,500))
In [20]: %timeit loop_translate(a, my_dict)
1 loops, best of 3: 298 ms per loop
In [21]: %timeit vec_translate(a, my_dict)
10 loops, best of 3: 37.6 ms per loop
In [22]: %timeit loop_translate(a, my_dict)
Run Code Online (Sandbox Code Playgroud)
DSM*_*DSM 66
我不知道有效的,但你可以使用np.vectorize的.get字典的方法:
>>> a = np.array([[1,2,3],
[3,2,4]])
>>> my_dict = {1:23, 2:34, 3:36, 4:45}
>>> np.vectorize(my_dict.get)(a)
array([[23, 34, 36],
[36, 34, 45]])
Run Code Online (Sandbox Code Playgroud)
Joh*_*ard 11
这是另一种方法,使用numpy.unique:
>>> a = np.array([[1,2,3],[3,2,1]])
>>> a
array([[1, 2, 3],
[3, 2, 1]])
>>> d = {1 : 11, 2 : 22, 3 : 33}
>>> u,inv = np.unique(a,return_inverse = True)
>>> np.array([d[x] for x in u])[inv].reshape(a.shape)
array([[11, 22, 33],
[33, 22, 11]])
Run Code Online (Sandbox Code Playgroud)
Joh*_*ard 10
我认为最好遍历字典,并“一次”在所有行和列中设置值:
>>> a = np.array([[1,2,3],[3,2,1]])
>>> a
array([[1, 2, 3],
[3, 2, 1]])
>>> d = {1 : 11, 2 : 22, 3 : 33}
>>> for k,v in d.iteritems():
... a[a == k] = v
...
>>> a
array([[11, 22, 33],
[33, 22, 11]])
Run Code Online (Sandbox Code Playgroud)
编辑:
虽然它可能不会像性感为帝斯曼(真的很好)的答案用numpy.vectorize,我的所有建议的方法的实验表明,该方法(使用@ jamylak的建议)实际上是一个有点快:
from __future__ import division
import numpy as np
a = np.random.randint(1, 5, (500,500))
d = {1 : 11, 2 : 22, 3 : 33, 4 : 44}
def unique_translate(a,d):
u,inv = np.unique(a,return_inverse = True)
return np.array([d[x] for x in u])[inv].reshape(a.shape)
def vec_translate(a, d):
return np.vectorize(d.__getitem__)(a)
def loop_translate(a,d):
n = np.ndarray(a.shape)
for k in d:
n[a == k] = d[k]
return n
def orig_translate(a, d):
new_a = np.empty(a.shape)
for i,row in enumerate(a):
new_a[i,:] = map(d.get, row)
return new_a
if __name__ == '__main__':
import timeit
n_exec = 100
print 'orig'
print timeit.timeit("orig_translate(a,d)",
setup="from __main__ import np,a,d,orig_translate",
number = n_exec) / n_exec
print 'unique'
print timeit.timeit("unique_translate(a,d)",
setup="from __main__ import np,a,d,unique_translate",
number = n_exec) / n_exec
print 'vec'
print timeit.timeit("vec_translate(a,d)",
setup="from __main__ import np,a,d,vec_translate",
number = n_exec) / n_exec
print 'loop'
print timeit.timeit("loop_translate(a,d)",
setup="from __main__ import np,a,d,loop_translate",
number = n_exec) / n_exec
Run Code Online (Sandbox Code Playgroud)
输出:
orig
0.222067718506
unique
0.0472617006302
vec
0.0357889199257
loop
0.0285375618935
Run Code Online (Sandbox Code Playgroud)
该numpy_indexed包(免责声明:我是它的作者)提供了一个优雅和高效的矢量化解决方案,这种类型的问题:
import numpy_indexed as npi
remapped_a = npi.remap(a, list(my_dict.keys()), list(my_dict.values()))
Run Code Online (Sandbox Code Playgroud)
实现的方法与John Vinyard提到的方法相似,但更为通用。例如,数组的项不必是整数,而可以是任何类型,甚至是nd-subarrays本身。
如果将可选的“ missing” kwarg设置为“ raise”(默认为“ ignore”),则性能会稍好一些,并且如果键中并非所有“ a”元素都存在,则会出现KeyError。
| 归档时间: |
|
| 查看次数: |
20614 次 |
| 最近记录: |