Jos*_*Jos 6 range window-functions pandas google-bigquery
考虑最小的例子
customer day purchase
Joe 1 5
Joe 1 10
Joe 2 5
Joe 2 5
Joe 4 10
Joe 7 5
Run Code Online (Sandbox Code Playgroud)
在 BigQuery 中,可以做类似的事情来获取客户在过去 2 天内每天花费的金额:
SELECT customer, day
, sum(purchase) OVER (PARTITION BY customer ORDER BY day ASC RANGE between 2 preceding and 1 preceding)
FROM table
Run Code Online (Sandbox Code Playgroud)
熊猫中的等价物是什么?即,预期结果
customer day purchase amount_last_2d
Joe 1 5 null -- spent days [-,-]
Joe 1 10 null -- spent days [-,-]
Joe 2 5 15 -- spent days [-,1]
Joe 2 5 15 -- spent days [-,1]
Joe 4 10 10 -- spent days [2,3]
Joe 7 5 0 -- spent days [5,6]
Run Code Online (Sandbox Code Playgroud)
尝试然后返回groupby
shift
reindex
df['new'] = df.groupby(['customer','day']).purchase.sum().shift().reindex(pd.MultiIndex.from_frame(df[['customer','day']])).values
df
Out[259]:
customer day purchase new
0 Joe 1 5 NaN
1 Joe 1 10 NaN
2 Joe 2 10 15.0
3 Joe 2 5 15.0
4 Joe 4 10 15.0
Run Code Online (Sandbox Code Playgroud)
更新
s = df.groupby(['customer','day']).apply(lambda x : df.loc[df.customer.isin(x['customer'].tolist()) & (df.day.isin(x['day']-1)|df.day.isin(x['day']-2)),'purchase'].sum())
df['new'] = s.reindex(pd.MultiIndex.from_frame(df[['customer','day']])).values
df
Out[271]:
customer day purchase new
0 Joe 1 5 0
1 Joe 1 10 0
2 Joe 2 5 15
3 Joe 2 5 15
4 Joe 4 10 10
5 Joe 7 5 0
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
181 次 |
最近记录: |