使用drop_duplicates()方法时,我减少重复项,但也将所有NaNs项合并到一个条目中.如何在保留具有空条目的行(如np.nan, None or '')时删除重复项?
import pandas as pd
df = pd.DataFrame({'col':['one','two',np.nan,np.nan,np.nan,'two','two']})
Out[]:
col
0 one
1 two
2 NaN
3 NaN
4 NaN
5 two
6 two
df.drop_duplicates(['col'])
Out[]:
col
0 one
1 two
2 NaN
Run Code Online (Sandbox Code Playgroud)
use*_*666 12
尝试
df[(~df.duplicated()) | (df['col'].isnull())]
Run Code Online (Sandbox Code Playgroud)
结果是:
col
0 one
1 two
2 NaN
3 NaN
4 NaN
Run Code Online (Sandbox Code Playgroud)
好吧,一个不太漂亮的解决方法是首先保存NaN并将它们放回:
temp = df.iloc[pd.isnull(df).any(1).nonzero()[0]]
asd = df.drop_duplicates('col')
pd.merge(temp, asd, how='outer')
Out[81]:
col
0 one
1 two
2 NaN
3 NaN
4 NaN
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4998 次 |
| 最近记录: |