获取 pandas 数据框中包含和不包含 NaN 的所有行

Mah*_*a M 5 split nan python-3.x pandas

在 pandas 数据框中分割包含 NaN 和不包含 NaN 的行的最有效方法。

input :- ID    Gender    Dependants   Income   Education  Married
         1     Male      2            500      Graduate   Yes
         2     NaN       4            2500     Graduate   No
         3     Female    3            NaN      NaN        Yes
         4     Male      NaN          7000     Graduate   Yes
         5     Female    4            500      Graduate   NaN
         6     Female    2            4500     Graduate   Yes
Run Code Online (Sandbox Code Playgroud)

没有 NaN 的预期输出是,

ID    Gender    Dependants    Income    Education    Married
1     Male      2             500       Graduate     Yes
6     Female    2             4500      Graduate     Yes
Run Code Online (Sandbox Code Playgroud)

NaN 的预期输出是,

ID    Gender    Dependants    Income    Education    Married
2     NaN       4             2500      Graduate     No
3     Female    3             NaN       NaN          Yes
4     Male      NaN           7000      Graduate     Yes
5     Female    4             500       Graduate     NaN 
Run Code Online (Sandbox Code Playgroud)

jez*_*ael 8

用于boolean indexing检查缺失值并用于检查每行any至少一个:True

mask = df.isnull().any(axis=1)

df1 = df[~mask]
df2 = df[mask]
print (df1)
   ID  Gender  Dependants  Income Education Married
0   1    Male         2.0   500.0  Graduate     Yes
5   6  Female         2.0  4500.0  Graduate     Yes

print (df2)
   ID  Gender  Dependants  Income Education Married
1   2     NaN         4.0  2500.0  Graduate      No
2   3  Female         3.0     NaN       NaN     Yes
3   4    Male         NaN  7000.0  Graduate     Yes
4   5  Female         4.0   500.0  Graduate     NaN
Run Code Online (Sandbox Code Playgroud)

细节

print (df.isnull())
     ID  Gender  Dependants  Income  Education  Married
0  False   False       False   False      False    False
1  False    True       False   False      False    False
2  False   False       False    True       True    False
3  False   False        True   False      False    False
4  False   False       False   False      False     True
5  False   False       False   False      False    False

print (mask)
0    False
1     True
2     True
3     True
4     True
5    False
dtype: bool
Run Code Online (Sandbox Code Playgroud)

并且您始终可以使用先前代码的更易读的方式,而无需反转掩码:

mask = df.notna().any(axis=1)
df1 = df[mask]
Run Code Online (Sandbox Code Playgroud)

完全相同的结果。