cgn*_*utt 5 sql aggregate-functions window-functions google-bigquery
给出Google BigQuery中的表格:
User Timestamp
A TIMESTAMP(12/05/2015 12:05:01.8023)
B TIMESTAMP(9/29/2015 12:15:01.0323)
B TIMESTAMP(9/29/2015 13:05:01.0233)
A TIMESTAMP(9/29/2015 14:05:01.0432)
C TIMESTAMP(8/15/2015 5:05:01.0000)
B TIMESTAMP(9/29/2015 14:06:01.0233)
A TIMESTAMP(9/29/2015 14:06:01.0432)
Run Code Online (Sandbox Code Playgroud)
有一种简单的计算方法:
User Maximum_Number_of_Events_this_User_Had_in_One_Hour
A 2
B 3
C 1
Run Code Online (Sandbox Code Playgroud)
一小时的时间窗口是一个参数?
我试着通过构建LAG和分区函数来解决这两个问题:
用于28天滑动窗口聚合的BigQuery SQL(无需编写28行SQL)
但是发现那些帖子太不相似,因为我没有找到每个时间窗口的人数,而是在一个时间窗口内找到每个人的最大事件数.
这是一种有效的简洁方法,可以利用有序的时间戳结构.
SELECT
user,
MAX(per_hour) AS max_event_per_hour
FROM
(
SELECT
user,
COUNT(*) OVER (PARTITION BY user ORDER BY timestamp RANGE BETWEEN 60 * 60 * 1000000 PRECEDING AND CURRENT ROW) as per_hour,
timestamp
FROM
[dataset_example_in_question_user_timestamps]
)
GROUP BY user
Run Code Online (Sandbox Code Playgroud)