按重复的列值删除行

Question

按重复的列值删除行

use*_*538 3 python numpy matrix multidimensional-array python-3.x

我有一个numpy.ndarray类似于此的大型数据集：

array([[ -4,   5,   9,  30,  50,  80],
       [  2,  -6,   9,  34,  12,   7],
       [ -4,   5,   9,  98, -21,  80],
       [  5,  -9,   0,  32,  18,   0]])

Run Code Online (Sandbox Code Playgroud)

我想删除重复的行，其中第 0、1、2 和 5 列相等。即在上述矩阵上，响应将是：

-4, 5, 9, 30, 50, 80
2, -6, 9, 34, 12, 7
5, -9, 0, 32, 18, 0

Run Code Online (Sandbox Code Playgroud)

numpy.unique做了一些非常相似的事情，但它只在所有列（轴）上找到重复项。我只想要特定的列。如何解决这个问题numpy？我找不到任何像样的numpy算法来做到这一点。有没有更好的模块？

Answer 1

Div*_*kar 5

使用np.unique的切片阵列上return_indexPARAM了axis=0，这给了我们独特的指数，考虑到各行作为一个实体。然后可以使用这些索引对原始数组进行行索引以获得所需的输出。

因此，a作为输入数组，它将是 -

a[np.unique(a[:,[0,1,2,5]],return_index=True,axis=0)[1]]

Run Code Online (Sandbox Code Playgroud)

示例运行以分解步骤并希望使事情变得清晰-

In [29]: a
Out[29]: 
array([[ -4,   5,   9,  30,  50,  80],
       [  2,  -6,   9,  34,  12,   7],
       [ -4,   5,   9,  98, -21,  80],
       [  5,  -9,   0,  32,  18,   0]])

In [30]: a_slice = a[:,[0,1,2,5]]

In [31]: _, unq_row_indices = np.unique(a_slice,return_index=True,axis=0)

In [32]: final_output = a[unq_row_indices]

In [33]: final_output
Out[33]: 
array([[-4,  5,  9, 30, 50, 80],
       [ 2, -6,  9, 34, 12,  7],
       [ 5, -9,  0, 32, 18,  0]])

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，2 月前
查看次数：	1262 次
最近记录：	7 年，2 月前