J. *_*ams 4 python duplicates dataframe pandas
我创建了一个DataFrame,现在需要计算每个重复的行(例如df ['Gender'].假设性别'男性'出现两次而女性出现三次,我需要这个列:
Gender Occurrence
Male 1
Male 2
Female 1
Female 2
Female 3
Run Code Online (Sandbox Code Playgroud)
有没有办法与熊猫一起做到这一点?
分组后使用cumcount方法Gender
:
df = pd.DataFrame({'Gender':['Male','Male','Female','Female','Female']})
df['Occurrence'] = df.groupby('Gender').cumcount() + 1
print(df)
Gender Occurrence
0 Male 1
1 Male 2
2 Female 1
3 Female 2
4 Female 3
Run Code Online (Sandbox Code Playgroud)
计数从0开始,所以我在+ 1
那里添加了一个.