如何按最后一列对NumPy字符串数组进行排序

Var*_*lor 1 python numpy scipy

有没有办法按最后一个元素对数组的行进行排序,在本例中是单元格ID.cell id按如下方式构建:"CellID_NumberOfCell

arr =np.array([['65.0','30.0','20.0','0.0','0_0'],
 ['2.0','29.0','24.0','0.0','1_0'],
 ['0.0','18.0','4.0','0.0','2_0'],
 ['16.0','9.0','0.0','9990.0','7_203'],
 ['16.0','9.0','0.0','9990.0','0_203'],
 ['20.0','23.0','31.0','9990.0','8_158'],
 ['65.0','30.0','20.0','0.0','0_10']])
Run Code Online (Sandbox Code Playgroud)

所以在排序后它应该看起来像:

arr =np.array([['65.0','30.0','20.0','0.0','0_0'],
 ['65.0','30.0','20.0','0.0','0_10'],
 ['16.0','9.0','0.0','9990.0','0_203'],
 ['2.0','29.0','24.0','0.0','1_0'],
 ['0.0','18.0','4.0','0.0','2_0'],
 ['16.0','9.0','0.0','9990.0','7_203'],
 ['20.0','23.0','31.0','9990.0','8_158']])
Run Code Online (Sandbox Code Playgroud)

编辑:

是否也可以在排序后删除下划线后的数字?所以我只有ID.而不是0_0只是0.

EDIT2

在对ID进行排序之后,它还应该在时间之后进行排序,以便每个具有0的ID例如也应该在时间0,1 ... 9999等之后排序.

P. *_*eri 5

np.argsort(arr[:, -1])将为您提供排列,以便arr排序最后一列的元素.

然后,arr[np.argsort(arr[:, -1])]根据这种排列重新排序arr行.

请注意,由于您的数据由字符串组成,因此使用了字典顺序,因此0_10之前提供0_2.如果这不是您想要的,您应该拆分最后一列,我建议您使用pandas.DataFrame:

import pandas as pd
df = pd.DataFrame(arr)
df['Cell'], df['CellIndex'] = df[df.columns[-1]].str.split('_', 1).str
df['Cell'] = df['Cell'].astype(int)
df['CellIndex'] = df['CellIndex'].astype(int)
df.sort_values(['Cell', 'CellIndex'])
Run Code Online (Sandbox Code Playgroud)

大熊猫真的是操纵这种数据的方法.