我有这个数据框:
df = pd.DataFrame([{ "state": "CA", "total":2, "week": 10 },{ "state": "UT", "total": 7, "week": 10 },{ "state": "CA", "total": 14, "week": 11 },{ "state": "UT", "total":18, "week": 11 },{ "state": "CA", "total": 21, "week": 12 },{ "state": "UT", "total": 30, "week": 12 }])
Run Code Online (Sandbox Code Playgroud)
该total字段是累积的,我想按周获取差异。所以我想以这样的方式结束:
state,total,week,diff
CA,2,10,NaN
UT,7,10,NaN
CA,14,11,12
UT,18,11,11
CA,21,12,7
UT,30,12,12
Run Code Online (Sandbox Code Playgroud)
我如何从这里到达那里?我可以通过遍历行来做到这一点,但我不知道从哪里开始在熊猫中做到这一点。
你可以这样做
df['diff'] = df.groupby('state')['total'].diff()
df
Run Code Online (Sandbox Code Playgroud)
出去:
state total week diff
0 CA 2 10 NaN
1 UT 7 10 NaN
2 CA 14 11 12.0
3 UT 18 11 11.0
4 CA 21 12 7.0
5 UT 30 12 12.0
Run Code Online (Sandbox Code Playgroud)
既然pandas 0.24可以用nullable int types但是不常用
df['diff'] = df.groupby('state')['total'].diff().astype(pd.Int64Dtype())
df
Run Code Online (Sandbox Code Playgroud)
出去:
state total week diff
0 CA 2 10 <NA>
1 UT 7 10 <NA>
2 CA 14 11 12
3 UT 18 11 11
4 CA 21 12 7
5 UT 30 12 12
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
38 次 |
| 最近记录: |