熊猫:计算行之间的差异

Ric*_*ard 1 python pandas

我有这个数据框:

df = pd.DataFrame([{ "state": "CA", "total":2, "week": 10 },{ "state": "UT", "total": 7, "week": 10 },{ "state": "CA", "total": 14, "week": 11 },{ "state": "UT", "total":18, "week": 11 },{ "state": "CA", "total": 21, "week": 12 },{ "state": "UT", "total": 30, "week": 12 }])
Run Code Online (Sandbox Code Playgroud)

total字段是累积的,我想按周获取差异。所以我想以这样的方式结束:

state,total,week,diff
CA,2,10,NaN
UT,7,10,NaN
CA,14,11,12
UT,18,11,11
CA,21,12,7
UT,30,12,12
Run Code Online (Sandbox Code Playgroud)

我如何从这里到达那里?我可以通过遍历行来做到这一点,但我不知道从哪里开始在熊猫中做到这一点。

Mic*_*sny 5

你可以这样做

df['diff'] = df.groupby('state')['total'].diff()
df
Run Code Online (Sandbox Code Playgroud)

出去:

  state  total  week  diff
0    CA      2    10   NaN
1    UT      7    10   NaN
2    CA     14    11  12.0
3    UT     18    11  11.0
4    CA     21    12   7.0
5    UT     30    12  12.0
Run Code Online (Sandbox Code Playgroud)

既然pandas 0.24可以用nullable int types但是不常用

df['diff'] = df.groupby('state')['total'].diff().astype(pd.Int64Dtype())
df
Run Code Online (Sandbox Code Playgroud)

出去:

  state  total  week  diff
0    CA      2    10  <NA>
1    UT      7    10  <NA>
2    CA     14    11    12
3    UT     18    11    11
4    CA     21    12     7
5    UT     30    12    12
Run Code Online (Sandbox Code Playgroud)