alp*_*ric 5 python dataframe pandas
当我删除John重复时指定'name'作为列名:
import pandas as pd
data = {'name':['Bill','Steve','John','John','John'], 'age':[21,28,22,30,29]}
df = pd.DataFrame(data)
df = df.drop_duplicates('name')
Run Code Online (Sandbox Code Playgroud)
pandas丢弃离开最左边的所有匹配实体:
age name
0 21 Bill
1 28 Steve
2 22 John
Run Code Online (Sandbox Code Playgroud)
相反,我想保持约翰的年龄最高的行(在这个例子中,它是30岁.如何实现这一目标?
尝试这个:
In [75]: df
Out[75]:
age name
0 21 Bill
1 28 Steve
2 22 John
3 30 John
4 29 John
In [76]: df.sort_values('age').drop_duplicates('name', keep='last')
Out[76]:
age name
0 21 Bill
1 28 Steve
3 30 John
Run Code Online (Sandbox Code Playgroud)
或者这取决于您的目标:
In [77]: df.drop_duplicates('name', keep='last')
Out[77]:
age name
0 21 Bill
1 28 Steve
4 29 John
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
48 次 |
| 最近记录: |