如何按行总和对 numpy 数组进行排序并提取前 N 行

Hef*_*efe 1 python arrays indexing numpy

例如,给定矩阵

array([[ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [ 0,  1,  2,  3,  4,  5],
       [24, 25, 26, 27, 28, 29]])
Run Code Online (Sandbox Code Playgroud)

并且 top_n=3,它应该返回

array([[24, 25, 26, 27, 28, 29],
       [18, 19, 20, 21, 22, 23],
       [12, 13, 14, 15, 16, 17]])
Run Code Online (Sandbox Code Playgroud)

给定输入 2D 矩阵 arr,此函数应返回形状为 (top_n, arr.shape[-1]) 的 np.ndarray。

这是我尝试过的:

def select_rows(arr, top_n):
    """
    This function selects the top_n rows that have the largest sum of entries
    """
    sel_rows = np.argsort(-arr,axis=1)[:top_n]
    
    return sel_rows
Run Code Online (Sandbox Code Playgroud)

我也尝试过:

sel_rows = (-arr).argsort(axis=-1)[:, :top_n]
Run Code Online (Sandbox Code Playgroud)

无济于事。

Jul*_*ien 5

您可以使用这个简单的 1-linera[np.argsort(a.sum(axis=1))[:-top_n-1:-1]]

\n

a.sum(axis=1)沿轴 1 求和

\n

np.argsort(..., axis=0)argsorts 沿轴 0(axis=0无论如何都是默认选项,因此可以省略)

\n

...[:-top_n-1:-1]top_n以相反的顺序选择最后一个索引

\n

a[...]然后抓取行

\n

%%timeit比较

\n
# data sample\na = np.random.randint(0, 101, (100000, 1000))\n\n%%timeit\na[np.argsort(a.sum(axis=1))[:-3-1:-1]]\n[out]:\n9.73 ms \xc2\xb1 122 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\n\n%%timeit\na[np.argsort(-a.sum(axis=1))[:3]]\n[out]:\n9.9 ms \xc2\xb1 303 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\n\n%%timeit\nsorted(a, key=lambda x: sum(x))[:-3-1:-1]\n[out]:\n1.04 s \xc2\xb1 36.6 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n
Run Code Online (Sandbox Code Playgroud)\n