根据键转换numpy数组中的每个元素

Question

根据键转换numpy数组中的每个元素

我试图numpy.array根据给定的密钥翻译a的每个元素:

例如:

a = np.array([[1,2,3],
              [3,2,4]])

my_dict = {1:23, 2:34, 3:36, 4:45}

Run Code Online (Sandbox Code Playgroud)

我想得到:

array([[ 23.,  34.,  36.],
       [ 36.,  34.,  45.]])

Run Code Online (Sandbox Code Playgroud)

我可以看到如何使用循环:

def loop_translate(a, my_dict):
    new_a = np.empty(a.shape)
    for i,row in enumerate(a):
        new_a[i,:] = map(my_dict.get, row)
    return new_a

Run Code Online (Sandbox Code Playgroud)

是否有更高效和/或纯粹的numpy方式？

编辑:

我计时了,np.vectorizeDSM提出的方法对于更大的数组要快得多:

In [13]: def loop_translate(a, my_dict):
   ....:     new_a = np.empty(a.shape)
   ....:     for i,row in enumerate(a):
   ....:         new_a[i,:] = map(my_dict.get, row)
   ....:     return new_a
   ....: 

In [14]: def vec_translate(a, my_dict):    
   ....:     return np.vectorize(my_dict.__getitem__)(a)
   ....: 

In [15]: a = np.random.randint(1,5, (4,5))

In [16]: a
Out[16]: 
array([[2, 4, 3, 1, 1],
       [2, 4, 3, 2, 4],
       [4, 2, 1, 3, 1],
       [2, 4, 3, 4, 1]])

In [17]: %timeit loop_translate(a, my_dict)
10000 loops, best of 3: 77.9 us per loop

In [18]: %timeit vec_translate(a, my_dict)
10000 loops, best of 3: 70.5 us per loop

In [19]: a = np.random.randint(1, 5, (500,500))

In [20]: %timeit loop_translate(a, my_dict)
1 loops, best of 3: 298 ms per loop

In [21]: %timeit vec_translate(a, my_dict)
10 loops, best of 3: 37.6 ms per loop

In [22]:  %timeit loop_translate(a, my_dict)

Run Code Online (Sandbox Code Playgroud)

Answer 1

DSM*_*DSM 66

我不知道有效的,但你可以使用np.vectorize的.get字典的方法:

>>> a = np.array([[1,2,3],
              [3,2,4]])
>>> my_dict = {1:23, 2:34, 3:36, 4:45}
>>> np.vectorize(my_dict.get)(a)
array([[23, 34, 36],
       [36, 34, 45]])

Run Code Online (Sandbox Code Playgroud)

+1如果OP知道每个键都将包含在`my_dict`中,如'a`,那么`my_dict .__ getitem__`将是一个更好的选择 (4认同)

Answer 2

Joh*_*ard 11

这是另一种方法,使用numpy.unique:

>>> a = np.array([[1,2,3],[3,2,1]])
>>> a
array([[1, 2, 3],
       [3, 2, 1]])
>>> d = {1 : 11, 2 : 22, 3 : 33}
>>> u,inv = np.unique(a,return_inverse = True)
>>> np.array([d[x] for x in u])[inv].reshape(a.shape)
array([[11, 22, 33],
       [33, 22, 11]])

Run Code Online (Sandbox Code Playgroud)

这确实是一个天才的解决方案。我用它来为灰度图像（此处为“a”）着色，并使用字典将 1d 像素值映射为带有查找字典（此处为“d”）的 RGB 颜色。我尝试了“numpy.vectorize”和“pandas.DataFrame.apply”（顺便说一句，这比矢量化快），但这是最快的。谢谢！ (2认同)

Answer 3

Joh*_*ard 10

我认为最好遍历字典，并“一次”在所有行和列中设置值：

>>> a = np.array([[1,2,3],[3,2,1]])
>>> a
array([[1, 2, 3],
       [3, 2, 1]])
>>> d = {1 : 11, 2 : 22, 3 : 33}
>>> for k,v in d.iteritems():
...     a[a == k] = v
... 
>>> a
array([[11, 22, 33],
       [33, 22, 11]])

Run Code Online (Sandbox Code Playgroud)

编辑：

虽然它可能不会像性感为帝斯曼（真的很好）的答案用numpy.vectorize，我的所有建议的方法的实验表明，该方法（使用@ jamylak的建议）实际上是一个有点快：

from __future__ import division
import numpy as np
a = np.random.randint(1, 5, (500,500))
d = {1 : 11, 2 : 22, 3 : 33, 4 : 44}

def unique_translate(a,d):
    u,inv = np.unique(a,return_inverse = True)
    return np.array([d[x] for x in u])[inv].reshape(a.shape)

def vec_translate(a, d):    
    return np.vectorize(d.__getitem__)(a)

def loop_translate(a,d):
    n = np.ndarray(a.shape)
    for k in d:
        n[a == k] = d[k]
    return n

def orig_translate(a, d):
    new_a = np.empty(a.shape)
    for i,row in enumerate(a):
        new_a[i,:] = map(d.get, row)
    return new_a


if __name__ == '__main__':
    import timeit
    n_exec = 100
    print 'orig'
    print timeit.timeit("orig_translate(a,d)", 
                        setup="from __main__ import np,a,d,orig_translate",
                        number = n_exec) / n_exec
    print 'unique'
    print timeit.timeit("unique_translate(a,d)", 
                        setup="from __main__ import np,a,d,unique_translate",
                        number = n_exec) / n_exec
    print 'vec'
    print timeit.timeit("vec_translate(a,d)",
                        setup="from __main__ import np,a,d,vec_translate",
                        number = n_exec) / n_exec
    print 'loop'
    print timeit.timeit("loop_translate(a,d)",
                        setup="from __main__ import np,a,d,loop_translate",
                        number = n_exec) / n_exec

Run Code Online (Sandbox Code Playgroud)

输出：

orig
0.222067718506
unique
0.0472617006302
vec
0.0357889199257
loop
0.0285375618935

Run Code Online (Sandbox Code Playgroud)

我发现矢量化在我的情况下更快，其中 `a` 的形状为 `(50, 50, 50)`，`d` 有 5000 个键，数据为 `numpy.uint32`。而且它不是非常接近...... ~0.1 秒 vs ~ 1.4 秒。展平阵列无济于事。：/ (4认同)
这种方法有多快，取决于映射中存在多少唯一键。在您的情况下，键的数量远小于 2D 数组的维度，这就是性能接近矢量化解决方案的原因。如果键的数量与数组的维数相当，向量化会变得更快。 (4认同)

Answer 4

Eel*_*orn 5

该numpy_indexed包（免责声明：我是它的作者）提供了一个优雅和高效的矢量化解决方案，这种类型的问题：

import numpy_indexed as npi
remapped_a = npi.remap(a, list(my_dict.keys()), list(my_dict.values()))

Run Code Online (Sandbox Code Playgroud)

实现的方法与John Vinyard提到的方法相似，但更为通用。例如，数组的项不必是整数，而可以是任何类型，甚至是nd-subarrays本身。

如果将可选的“ missing” kwarg设置为“ raise”（默认为“ ignore”），则性能会稍好一些，并且如果键中并非所有“ a”元素都存在，则会出现KeyError。

归档时间：	12 年，4 月前
查看次数：	20614 次
最近记录：	7 年，9 月前