二维 numpy 数组中行或列最常见的元素

Question

二维 numpy 数组中行或列最常见的元素

Ers*_* Er 2 python numpy multidimensional-array

我试图找到二维 numpy 数组中最常见的元素。我想要它们按行或按列。我搜索了文档和网络，但找不到我正在寻找的内容。让我用一个例子来解释一下；假设我有arr如下：

import numpy as np
arr = np.random.randint(0, 2, size=(5, 2))
arr

# Output
array([[1, 1],
       [0, 0],
       [0, 1],
       [1, 1],
       [1, 0]])

Run Code Online (Sandbox Code Playgroud)

预期输出是一个数组，其中包含列或行中最常见的元素，具体取决于给定的axis输入。我知道np.unique()返回给定输入数组中每个唯一值的计数axis。因此，它计算二维数组中唯一的行或列：

np.unique(arr, return_counts=True, axis=0)

# Output
(array([[0, 0],
       [0, 1],
       [1, 0],
       [1, 1]]), array([1, 1, 1, 2]))

Run Code Online (Sandbox Code Playgroud)

因此，它表明唯一元素[0, 0]、[0, 1]和[1, 0]出现一次，而[1, 1]在中出现两次arr。这对我不起作用。因为我需要查看行（或列）中最常见的元素。所以我的预期输出如下：

array([[1, 1],    # --> 1
       [0, 0],    # --> 0
       [0, 1],    # --> 0 or 1 since they have same frequency
       [1, 1],    # --> 1
       [1, 0]])   # --> 0 or 1 since they have same frequency

Run Code Online (Sandbox Code Playgroud)

因此，结果可以是array([1, 0, 0, 1, 0])或array([1, 0, 1, 1, 1])形状为(5, )。

附：

我知道可以通过迭代列或行并使用查找最频繁的元素来找到解决方案np.unique()，但是我想找到最有效的方法。因为，通常 numpy 用于大型数组的矢量化计算，而在我的例子中，输入数组arr有太多元素。使用 for 循环的计算成本会很高。

编辑：

为了更清楚，我添加了一个基于循环的解决方案。由于arr不仅可以包含 0 和 1，还可以包含不同的元素，因此我决定使用不同的随机数arr

arr = np.random.randint(1, 4, size=(10, 3)) * 10

# arr:
array([[30, 30, 30],
       [10, 20, 30],
       [30, 30, 30],
       [30, 10, 20],
       [20, 20, 10],
       [20, 30, 20],
       [20, 30, 10],
       [10, 30, 10],
       [20, 10, 10],
       [20, 30, 30]])

most_freq_elem_in_rows = []
for row in arr:
  elements, counts = np.unique(row, return_counts=True)
  most_freq_elem_in_rows.append(elements[np.argmax(counts)])

# most_freq_elem_in_rows:
# [30, 10, 30, 10, 20, 20, 10, 10, 10, 30]

most_freq_elem_in_cols = []
for col in arr.T:
  elements, counts = np.unique(col, return_counts=True)
  most_freq_elem_in_cols.append(elements[np.argmax(counts)])

# most_freq_elem_in_cols:
# [20, 30, 10]

Run Code Online (Sandbox Code Playgroud)

然后，most_freq_elem_in_rows和most_freq_elem_in_cols可以简单地使用转换numpy数组np.array()

Answer 1

FBr*_*esi 6

如果您可以添加 scipy 依赖项，则scipy.stats.mode可以实现：

import numpy as np
from scipy.stats import mode

arr = np.random.randint(0, 2, size=(5, 2))

mode(arr, 0)
ModeResult(mode=array([[0, 0]]), count=array([[3, 3]]))

mode(arr,1)
ModeResult(mode=array([[0],
                       [1], 
                       [0],
                       [0],
                       [0]]), 
           count=array([[1],
                        [2],
                        [2],
                        [2],
                        [1]]))

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，7 月前
查看次数：	2254 次
最近记录：	3 年，1 月前