相当于 SQL 窗口函数和行范围的 Pandas

Jos*_*Jos 6 range window-functions pandas google-bigquery

考虑最小的例子

customer   day  purchase
Joe        1       5
Joe        1      10
Joe        2       5
Joe        2       5       
Joe        4      10
Joe        7       5
Run Code Online (Sandbox Code Playgroud)

在 BigQuery 中,可以做类似的事情来获取客户在过去 2 天内每天花费的金额:

SELECT customer, day
, sum(purchase) OVER (PARTITION BY customer ORDER BY day ASC RANGE between 2 preceding and 1 preceding)
FROM table
Run Code Online (Sandbox Code Playgroud)

熊猫中的等价物是什么?即,预期结果

customer   day  purchase    amount_last_2d
Joe        1       5             null  -- spent days [-,-]
Joe        1      10             null  -- spent days [-,-]
Joe        2       5               15  -- spent days [-,1]
Joe        2       5               15  -- spent days [-,1]
Joe        4      10               10  -- spent days [2,3]
Joe        7       5                0  -- spent days [5,6]
Run Code Online (Sandbox Code Playgroud)

WeN*_*Ben 3

尝试然后返回groupbyshiftreindex

df['new'] = df.groupby(['customer','day']).purchase.sum().shift().reindex(pd.MultiIndex.from_frame(df[['customer','day']])).values
df
Out[259]: 
  customer  day  purchase   new
0      Joe    1         5   NaN
1      Joe    1        10   NaN
2      Joe    2        10  15.0
3      Joe    2         5  15.0
4      Joe    4        10  15.0
Run Code Online (Sandbox Code Playgroud)

更新

s = df.groupby(['customer','day']).apply(lambda x : df.loc[df.customer.isin(x['customer'].tolist()) & (df.day.isin(x['day']-1)|df.day.isin(x['day']-2)),'purchase'].sum())
df['new'] = s.reindex(pd.MultiIndex.from_frame(df[['customer','day']])).values
df
Out[271]: 
  customer  day  purchase  new
0      Joe    1         5    0
1      Joe    1        10    0
2      Joe    2         5   15
3      Joe    2         5   15
4      Joe    4        10   10
5      Joe    7         5    0
Run Code Online (Sandbox Code Playgroud)