如何在numpy ndarray中找到最常见的值?

oop*_*ops 19 python numpy multidimensional-array

我有一个形状为(30,480,640)的numpy ndarray,第1和第2轴代表位置(纬度和长度),第0轴包含实际数据点.我想在每个位置沿第0轴使用最频繁的值,是构造一个形状为(1,480,640).ie的新数组:

>>> data
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[40, 40, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])

(perform calculation)

>>> new_data 
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]]])
Run Code Online (Sandbox Code Playgroud)

数据点将包含负数和正数浮点数.我该如何进行这样的计算?非常感谢!

我尝试使用numpy.unique,但我得到了"TypeError:unique()得到了一个意外的关键字参数'return_inverse'".我在Unix上安装了numpy版本1.2.1并且它不支持return_inverse ..我也试过了模式,但处理如此大量的数据需要永远...所以有没有另一种方法来获得最频繁的值?再次感谢.

eca*_*mur 20

要查找平面数组的最常见值,请使用unique,bincountargmax:

arr = np.array([5, 4, -2, 1, -2, 0, 4, 4, -6, -1])
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.bincount(indices))]
Run Code Online (Sandbox Code Playgroud)

要使用多维数组工作,我们并不需要担心unique,但我们确实需要使用apply_along_axisbincount:

arr = np.array([[5, 4, -2, 1, -2, 0, 4, 4, -6, -1],
                [0, 1,  2, 2,  3, 4, 5, 6,  7,  8]])
axis = 1
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
                                None, np.max(indices) + 1), axis=axis)]
Run Code Online (Sandbox Code Playgroud)

使用您的数据:

data = np.array([
   [[ 0,  1,  2,  3,  4],
    [ 5,  6,  7,  8,  9],
    [10, 11, 12, 13, 14],
    [15, 16, 17, 18, 19]],

   [[ 0,  1,  2,  3,  4],
    [ 5,  6,  7,  8,  9],
    [10, 11, 12, 13, 14],
    [15, 16, 17, 18, 19]],

   [[40, 40, 42, 43, 44],
    [45, 46, 47, 48, 49],
    [50, 51, 52, 53, 54],
    [55, 56, 57, 58, 59]]])
axis = 0
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
                                None, np.max(indices) + 1), axis=axis)]
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])
Run Code Online (Sandbox Code Playgroud)

NumPy 1.2,真的吗?您可以np.unique(return_inverse=True)合理有效地使用np.searchsorted(它是一个额外的O(n log n),因此不应该显着改变性能):

u = np.unique(arr)
indices = np.searchsorted(u, arr.flat)
Run Code Online (Sandbox Code Playgroud)


Tar*_*ato 7

使用SciPy的模式功能:

import numpy as np
from scipy.stats import mode

data = np.array([[[ 0,  1,  2,  3,  4],
                  [ 5,  6,  7,  8,  9],
                  [10, 11, 12, 13, 14],
                  [15, 16, 17, 18, 19]],

                 [[ 0,  1,  2,  3,  4],
                  [ 5,  6,  7,  8,  9],
                  [10, 11, 12, 13, 14],
                  [15, 16, 17, 18, 19]],

                 [[40, 40, 42, 43, 44],
                  [45, 46, 47, 48, 49],
                  [50, 51, 52, 53, 54],
                  [55, 56, 57, 58, 59]]])

print data

# find mode along the zero-th axis; the return value is a tuple of the
# modes and their counts.
print mode(data, axis=0)
Run Code Online (Sandbox Code Playgroud)