我有一个 Python pandas 数据框,其中一些团队在几个时间段内取得了连胜,我想按时间顺序确定这些连胜。所以,我所拥有的是:
import pandas as pd
data = pd.DataFrame({'period': list(range(1,7))+list(range(1,6)),
'team_id': ['A']*6 + ['B']*5,
'win': [1,1,1,0,1,1,1,0,0,1,1],
'streak_length': [1,2,3,0,1,2,1,0,0,1,2]})
print(data)
Run Code Online (Sandbox Code Playgroud)
我想要的是:
result = pd.DataFrame({'period': list(range(1,7))+list(range(1,6)),
'team_id': ['A']*6 + ['B']*5,
'win': [1,1,1,0,1,1,1,0,0,1,1],
'streak_length': [1,2,3,0,1,2,1,0,0,1,2],
'streak_id': [1,1,1,None,2,2,1,None,None,2,2]})
print(result)
Run Code Online (Sandbox Code Playgroud)
我试图对team_id连续长度进行分组和求和,但它可以重复,所以我认为这行不通。任何帮助表示赞赏!
我有以下数据(data_current):
import pandas as pd
import numpy as np
data_current=pd.DataFrame({'medicine':['green tea','fried tomatoes','meditation','meditation'],'disease':['acne','hypertension', 'cancer','lupus']})
data_current
Run Code Online (Sandbox Code Playgroud)
我想做的是移置其中一列,这样我就不必为同一药物和不同疾病设置多行,而为每种药物分配一行,并针对疾病设置几列。保持索引尽可能简单也很重要,例如0,1,2 ...,即我不想将'medicines'分配为索引列,因为我将其合并到其他键上。所以,我需要data_needed
data_needed=pd.DataFrame({'medicine':['green tea','fried tomatoes','meditation'],'disease_1':['acne','hypertension','cancer'], 'disease_2':['np.nan','np.nan','lupus']})
data_needed
Run Code Online (Sandbox Code Playgroud)