Python pandas 删除具有列值“NaN”的重复行

Lam*_*llo 1 python duplicates pandas

需要包含 NaN 值但也是重复的行。例如这个表:

    A   B   C
0   foo 2   3
1   foo nan nan
2   foo 1   4
3   bar nan nan
4   foo nan nan
Run Code Online (Sandbox Code Playgroud)

应该变成这样:

    A   B   C
0   foo 2   3
2   foo 1   4
3   bar nan nan
Run Code Online (Sandbox Code Playgroud)

我怎样才能做到这一点?

jez*_*ael 6

使用boolean indexing

df = df[~df['A'].duplicated(keep=False) | df[['B','C']].notnull().any(axis=1)]
print (df)
     A    B    C
0  foo  2.0  3.0
2  foo  1.0  4.0
3  bar  NaN  NaN
Run Code Online (Sandbox Code Playgroud)

说明

测试列A不复制-duplicated~用于反转布尔面膜:

print (~df['A'].duplicated(keep=False))
0    False
1    False
2    False
3     True
4    False
Name: A, dtype: bool
Run Code Online (Sandbox Code Playgroud)

检查B,C列中的非缺失值:

print (df[['B','C']].notnull())
       B      C
0   True   True
1  False  False
2   True   True
3  False  False
4  False  False
Run Code Online (Sandbox Code Playgroud)

然后每行至少有一个 True DataFrame.any

print (df[['B','C']].notnull().any(axis=1))
0     True
1    False
2     True
3    False
4    False
dtype: bool
Run Code Online (Sandbox Code Playgroud)

|按位链接在一起OR

print (~df['A'].duplicated(keep=False) | df[['B','C']].notnull().any(axis=1))
0     True
1    False
2     True
3     True
4    False
dtype: bool
Run Code Online (Sandbox Code Playgroud)