her*_*lla 6 python numpy dataframe python-3.x pandas
给定df
df = pd.DataFrame([[1, 5, 2, 8, 2], [2, 4, 4, 20, 2], [3, 3, 1, 20, 2], [4, 2, 2, 1, 3], [5, 1, 4, -5, -4], [1, 5, 2, 2, -20],
[2, 4, 4, 3, -8], [3, 3, 1, -1, -1], [4, 2, 2, 0, 12], [5, 1, 4, 20, -2]],
columns=['A', 'B', 'C', 'D', 'E'], index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
Run Code Online (Sandbox Code Playgroud)
根据这个答案,我创建了一个计算条纹(向上,向下)的函数.
def streaks(df, column):
#Create sign column
df['sign'] = 0
df.loc[df[column] > 0, 'sign'] = 1
df.loc[df[column] < 0, 'sign'] = 0
# Downstreak
df['d_streak2'] = (df['sign'] == 0).cumsum()
df['cumsum'] = np.nan
df.loc[df['sign'] == 1, 'cumsum'] = df['d_streak2']
df['cumsum'] = df['cumsum'].fillna(method='ffill')
df['cumsum'] = df['cumsum'].fillna(0)
df['d_streak'] = df['d_streak2'] - df['cumsum']
df.drop(['d_streak2', 'cumsum'], axis=1, inplace=True)
# Upstreak
df['u_streak2'] = (df['sign'] == 1).cumsum()
df['cumsum'] = np.nan
df.loc[df['sign'] == 0, 'cumsum'] = df['u_streak2']
df['cumsum'] = df['cumsum'].fillna(method='ffill')
df['cumsum'] = df['cumsum'].fillna(0)
df['u_streak'] = df['u_streak2'] - df['cumsum']
df.drop(['u_streak2', 'cumsum'], axis=1, inplace=True)
del df['sign']
return df
Run Code Online (Sandbox Code Playgroud)
功能很好,但很长.我确信写这个有更好的方法.我尝试了另一个答案,但效果不佳.
这是所需的输出
streaks(df, 'E')
A B C D E d_streak u_streak
1 1 5 2 8 2 0.0 1.0
2 2 4 4 20 2 0.0 2.0
3 3 3 1 20 2 0.0 3.0
4 4 2 2 1 3 0.0 4.0
5 5 1 4 -5 -4 1.0 0.0
6 1 5 2 2 -20 2.0 0.0
7 2 4 4 3 -8 3.0 0.0
8 3 3 1 -1 -1 4.0 0.0
9 4 2 2 0 12 0.0 1.0
10 5 1 4 20 -2 1.0 0.0
Run Code Online (Sandbox Code Playgroud)
您可以简化功能,如下所示:
def streaks(df, col):
sign = np.sign(df[col])
s = sign.groupby((sign!=sign.shift()).cumsum()).cumsum()
return df.assign(u_streak=s.where(s>0, 0.0), d_streak=s.where(s<0, 0.0).abs())
Run Code Online (Sandbox Code Playgroud)
使用它:
streaks(df, 'E')
Run Code Online (Sandbox Code Playgroud)
首先,使用计算所考虑的列中存在的每个单元的符号np.sign
.这些将+1分配给正数,-1分配给负数.
接下来,使用sign!=sign.shift()
并确定将在分组过程中使用的累积和来识别相邻值的集合(比较当前单元格和下一个单元格).
执行将groupby
这些作为键/条件,并再次获取子组元素的累积和.
最后,将正计算cumsum
值ustreak
和负计算值(取其模数后的绝对值)分配给dstreak
.