Mah*_*a M 5 split nan python-3.x pandas
在 pandas 数据框中分割包含 NaN 和不包含 NaN 的行的最有效方法。
input :- ID Gender Dependants Income Education Married
1 Male 2 500 Graduate Yes
2 NaN 4 2500 Graduate No
3 Female 3 NaN NaN Yes
4 Male NaN 7000 Graduate Yes
5 Female 4 500 Graduate NaN
6 Female 2 4500 Graduate Yes
Run Code Online (Sandbox Code Playgroud)
没有 NaN 的预期输出是,
ID Gender Dependants Income Education Married
1 Male 2 500 Graduate Yes
6 Female 2 4500 Graduate Yes
Run Code Online (Sandbox Code Playgroud)
NaN 的预期输出是,
ID Gender Dependants Income Education Married
2 NaN 4 2500 Graduate No
3 Female 3 NaN NaN Yes
4 Male NaN 7000 Graduate Yes
5 Female 4 500 Graduate NaN
Run Code Online (Sandbox Code Playgroud)
用于boolean indexing
检查缺失值并用于检查每行any
至少一个:True
mask = df.isnull().any(axis=1)
df1 = df[~mask]
df2 = df[mask]
print (df1)
ID Gender Dependants Income Education Married
0 1 Male 2.0 500.0 Graduate Yes
5 6 Female 2.0 4500.0 Graduate Yes
print (df2)
ID Gender Dependants Income Education Married
1 2 NaN 4.0 2500.0 Graduate No
2 3 Female 3.0 NaN NaN Yes
3 4 Male NaN 7000.0 Graduate Yes
4 5 Female 4.0 500.0 Graduate NaN
Run Code Online (Sandbox Code Playgroud)
细节:
print (df.isnull())
ID Gender Dependants Income Education Married
0 False False False False False False
1 False True False False False False
2 False False False True True False
3 False False True False False False
4 False False False False False True
5 False False False False False False
print (mask)
0 False
1 True
2 True
3 True
4 True
5 False
dtype: bool
Run Code Online (Sandbox Code Playgroud)
并且您始终可以使用先前代码的更易读的方式,而无需反转掩码:
mask = df.notna().any(axis=1)
df1 = df[mask]
Run Code Online (Sandbox Code Playgroud)
完全相同的结果。
归档时间: |
|
查看次数: |
1822 次 |
最近记录: |