Jua*_*los 2 python sorting nan dataframe pandas
我正在尝试对以下Pandas DataFrame进行排序:
RHS age height shoe_size weight
0 weight NaN 0.0 0.0 1.0
1 shoe_size NaN 0.0 1.0 NaN
2 shoe_size 3.0 0.0 0.0 NaN
3 weight 3.0 0.0 0.0 1.0
4 age 3.0 0.0 0.0 1.0
Run Code Online (Sandbox Code Playgroud)
以这种方式,首先定位具有更多NaNs列数的行.更确切地说,在上面的df中,索引为1(2 Nans)的行应该在索引为0(1 NaN)的行之前.
我现在做的是:
df.sort_values(by=['age', 'height', 'shoe_size', 'weight'], na_position="first")
Run Code Online (Sandbox Code Playgroud)
cs9*_*s95 10
使用df.sort_values和loc基于访问.
df = df.iloc[df.isnull().sum(1).sort_values(ascending=0).index]
print(df)
RHS age height shoe_size weight
1 shoe_size NaN 0.0 1.0 NaN
2 shoe_size 3.0 0.0 0.0 NaN
0 weight NaN 0.0 0.0 1.0
4 age 3.0 0.0 0.0 1.0
3 weight 3.0 0.0 0.0 1.0
Run Code Online (Sandbox Code Playgroud)
df.isnull().sum(1)计算NaNs并根据此排序计数访问行.
@ayhan 对上述解决方案提供了一个很好的改进,包括pd.Series.argsort:
df = df.iloc[df.isnull().sum(axis=1).mul(-1).argsort()]
print(df)
RHS age height shoe_size weight
1 shoe_size NaN 0.0 1.0 NaN
0 weight NaN 0.0 0.0 1.0
2 shoe_size 3.0 0.0 0.0 NaN
3 weight 3.0 0.0 0.0 1.0
4 age 3.0 0.0 0.0 1.0
Run Code Online (Sandbox Code Playgroud)