有效地计算NumPy中唯一子阵列的出现次数?

Wil*_*ill 8 python arrays numpy counting

我有一个形状数组,(128, 36, 8)我想找到最后一个维度中长度为8的唯一子数组的出现次数.

我知道np.unique并且np.bincount,但那些似乎是元素而不是子阵列.我已经看到了这个问题,但它是关于找到特定子阵列的第一次出现,而不是所有独特子阵列的计数.

Div*_*kar 4

问题指出输入数组是有形状的(128, 36, 8),我们有兴趣8在最后一个维度中找到长度唯一的子数组。所以,我假设独特性是沿着前两个维度合并在一起的。让我们假设A为输入 3D 数组。

获取唯一子数组的数量

# Reshape the 3D array to a 2D array merging the first two dimensions
Ar = A.reshape(-1,A.shape[2])

# Perform lex sort and get the sorted indices and xy pairs
sorted_idx = np.lexsort(Ar.T)
sorted_Ar =  Ar[sorted_idx,:]

# Get the count of rows that have at least one TRUE value 
# indicating presence of unique subarray there
unq_out = np.any(np.diff(sorted_Ar,axis=0),1).sum()+1
Run Code Online (Sandbox Code Playgroud)

样本运行 -

In [159]: A # A is (2,2,3)
Out[159]: 
array([[[0, 0, 0],
        [0, 0, 2]],

       [[0, 0, 2],
        [2, 0, 1]]])

In [160]: unq_out
Out[160]: 3
Run Code Online (Sandbox Code Playgroud)

获取唯一子数组出现的次数

# Reshape the 3D array to a 2D array merging the first two dimensions
Ar = A.reshape(-1,A.shape[2])

# Perform lex sort and get the sorted indices and xy pairs
sorted_idx = np.lexsort(Ar.T)
sorted_Ar =  Ar[sorted_idx,:]

# Get IDs for each element based on their uniqueness
id = np.append([0],np.any(np.diff(sorted_Ar,axis=0),1).cumsum())

# Get counts for each ID as the final output
unq_count = np.bincount(id) 
Run Code Online (Sandbox Code Playgroud)

样本运行 -

In [64]: A
Out[64]: 
array([[[0, 0, 2],
        [1, 1, 1]],

       [[1, 1, 1],
        [1, 2, 0]]])

In [65]: unq_count
Out[65]: array([1, 2, 1], dtype=int64)
Run Code Online (Sandbox Code Playgroud)