我试图通过对“组”列进行分组来获取数据框中前一行的差异值,有几个类似的问题,但我无法正常工作。
date group value
0 2020-01-01 A 808
1 2020-01-01 B 331
2 2020-01-02 A 612
3 2020-01-02 B 1391
4 2020-01-03 A 234
5 2020-01-04 A 828
6 2020-01-04 B 820
6 2020-01-05 A 1075
8 2020-01-07 B 572
9 2020-01-10 B 736
10 2020-01-10 A 1436
df.sort_values(['group','date'], inplace=True)
df['diff'] = df['value'].diff()
print(df)
date value group diff
1 2020-01-03 234 A NaN
8 2020-01-01 331 B 97.0
2 2020-01-07 572 B 241.0
9 2020-01-02 612 A 40.0
5 2020-01-10 736 B 124.0
17 2020-01-01 808 A 72.0
14 2020-01-04 820 B 12.0
4 2020-01-04 828 A 8.0
18 2020-01-05 1075 A 247.0
7 2020-01-02 1391 B 316.0
10 2020-01-10 1436 A 45.0
Run Code Online (Sandbox Code Playgroud)
这就是我需要的结果
date group value diff
0 2020-01-01 A 808 Na
2 2020-01-02 A 612 -196
4 2020-01-03 A 234 -378
5 2020-01-04 A 828 594
6 2020-01-05 A 1075 247
10 2020-01-10 A 1436 361
1 2020-01-01 B 331 Na
3 2020-01-02 B 1391 1060
6 2020-01-04 B 820 -571
8 2020-01-07 B 572 -248
9 2020-01-10 B 736 164
Run Code Online (Sandbox Code Playgroud)
遍历每个组以创建计算列。从原始值列中减去该列以创建差异列。
df.sort_values(['group','date'], ascending=[True,True], inplace=True)
df['shift'] = df.groupby('group')['value'].shift()
df['diff'] = df['value'] - df['shift']
df = df[['date','group','value','diff']]
1
df
date group value diff
0 2020-01-01 A 808 NaN
2 2020-01-02 A 612 -196.0
4 2020-01-03 A 234 -378.0
5 2020-01-04 A 828 594.0
6 2020-01-05 A 1075 247.0
10 2020-01-10 A 1436 361.0
1 2020-01-01 B 331 NaN
3 2020-01-02 B 1391 1060.0
6 2020-01-04 B 820 -571.0
8 2020-01-07 B 572 -248.0
9 2020-01-10 B 736 164.0
Run Code Online (Sandbox Code Playgroud)