我在 pandas 中有以下数据框
ID date no start end
1 01-01-2019 10 101.23 112.23
2 02-01-2019 10 112.23 120.43
3 03-01-2019 10 121.23 130.23
4 04-01-2019 10 130.23 140.43
5 01-01-2019 11 101 112
6 02-01-2019 11 112 120
7 03-01-2019 11 130 140
8 04-01-2019 11 140 150.43
Run Code Online (Sandbox Code Playgroud)
我想检查当前行end与下一行start值分组依据no,如果存在差异,则想要设置一个标志并计算差异
以下是我想要的数据框
ID date no start end flag diff
1 01-01-2019 10 101.23 112.23 0 0
2 02-01-2019 10 112.23 120.43 0 0
3 03-01-2019 10 121.23 130.23 1 1
4 04-01-2019 10 130.23 140.43 0 0
5 01-01-2019 11 101 112 0 0
6 02-01-2019 11 112 120 0 0
7 03-01-2019 11 130 140 1 10
8 04-01-2019 11 140 150.43 0 0
Run Code Online (Sandbox Code Playgroud)
我怎样才能在熊猫中做到这一点?
您可以创建 Series byDataFrameGroupBy.shift并替换第一个NaNs by Series.fillna,比较 bySeries.ne并将掩码转换为整数,对于另一列获得差异:
s = df.groupby('no')['end'].shift().fillna(df['start'])
df['flag'] = df['start'].ne(s).astype(int)
df['diff'] = df['start'] - s
print (df)
ID date no start end flag diff
0 1 01-01-2019 10 101.23 112.23 0 0.0
1 2 02-01-2019 10 112.23 120.43 0 0.0
2 3 03-01-2019 10 121.23 130.23 1 0.8
3 4 04-01-2019 10 130.23 140.43 0 0.0
4 5 01-01-2019 11 101.00 112.00 0 0.0
5 6 02-01-2019 11 112.00 120.00 0 0.0
6 7 03-01-2019 11 130.00 140.00 1 10.0
7 8 04-01-2019 11 140.00 150.43 0 0.0
Run Code Online (Sandbox Code Playgroud)
细节:
print (s)
0 101.23
1 112.23
2 120.43
3 130.23
4 101.00
5 112.00
6 120.00
7 140.00
Name: end, dtype: float64
Run Code Online (Sandbox Code Playgroud)