izh*_*hak 4 python transform pandas pandas-groupby
我有一个 Python pandas 数据框,其中一些团队在几个时间段内取得了连胜,我想按时间顺序确定这些连胜。所以,我所拥有的是:
import pandas as pd
data = pd.DataFrame({'period': list(range(1,7))+list(range(1,6)),
'team_id': ['A']*6 + ['B']*5,
'win': [1,1,1,0,1,1,1,0,0,1,1],
'streak_length': [1,2,3,0,1,2,1,0,0,1,2]})
print(data)
Run Code Online (Sandbox Code Playgroud)
我想要的是:
result = pd.DataFrame({'period': list(range(1,7))+list(range(1,6)),
'team_id': ['A']*6 + ['B']*5,
'win': [1,1,1,0,1,1,1,0,0,1,1],
'streak_length': [1,2,3,0,1,2,1,0,0,1,2],
'streak_id': [1,1,1,None,2,2,1,None,None,2,2]})
print(result)
Run Code Online (Sandbox Code Playgroud)
我试图对team_id连续长度进行分组和求和,但它可以重复,所以我认为这行不通。任何帮助表示赞赏!
通过创建连续组和,只有过滤器中,并使用与在lambda函数:Series.shift Series.neSeries.cumsum1winGroupBy.transformfactorize
m = data['win'].eq(1)
g = data['win'].ne(data['win'].shift()).cumsum()
data['streak_id'] = g[m].groupby(data['team_id']).transform(
lambda x: pd.factorize(x)[0] + 1
)
print (data)
period team_id win streak_length streak_id
0 1 A 1 1 1.0
1 2 A 1 2 1.0
2 3 A 1 3 1.0
3 4 A 0 0 NaN
4 5 A 1 1 2.0
5 6 A 1 2 2.0
6 1 B 1 1 1.0
7 2 B 0 0 NaN
8 3 B 0 0 NaN
9 4 B 1 1 2.0
10 5 B 1 2 2.0
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
153 次 |
| 最近记录: |