use*_*761 6 python sql python-3.x pandas pandas-groupby
我有这块代码,我想把它写成SQL.有谁知道等效的SQL代码会是什么样子?
lags = range(1, 5)
df = df.assign(**{
'{}{}'.format('lag', t): df.groupby('article_id').num_views.shift(t) for t in lags
})
Run Code Online (Sandbox Code Playgroud)
更新:
我正在寻找SQL标准方言.这是一个数据集示例(部分前10行):
article_id section time num_views comments
0 abc111b A 00:00 15 0
1 abc111b A 01:00 36 0
2 abc111b A 02:00 36 0
3 bbbddd222hf A 03:00 41 0
4 bbbddd222hf B 04:00 44 0
5 nnn678www B 05:00 39 0
6 nnn678www B 06:00 38 0
7 nnn678www B 07:00 66 0
8 nnn678www C 08:00 65 0
9 nnn678www C 09:00 87 1
Run Code Online (Sandbox Code Playgroud)
你可以使用属于SQL-99 ANSI标准的LAG()函数"windowing functions"
:
select
article_id, section, time, num_views, comments,
lag(num_views, 1, 0) over(partition by article_id order by article_id, time) as lag1,
lag(num_views, 2, 0) over(partition by article_id order by article_id, time) as lag2,
lag(num_views, 3, 0) over(partition by article_id order by article_id, time) as lag3,
lag(num_views, 4, 0) over(partition by article_id order by article_id, time) as lag4
from tab;
Run Code Online (Sandbox Code Playgroud)
PS 请注意,并非所有的RDBMS系统实现"windowing functions"