我正在查询以每天获取累积的不同 uid 计数。
示例:假设有 2 个 uids (100,200) 出现在日期 2016-11-01 并且它们也在第二天出现在 2016-11-02 的新 uid 300 (100,200,300) 此时我希望存储累积计数为 3,而不是5 as(用户 ID 100 和 200 已在过去一天出现)。
Input table:
date uid
2016-11-01 100
2016-11-01 200
2016-11-01 300
2016-11-01 400
2016-11-02 100
2016-11-02 200
2016-11-03 300
2016-11-03 400
2016-11-03 500
2016-11-03 600
2016-11-04 700
Expected query result:
date daily_cumulative_count
2016-11-01 4
2016-11-02 4
2016-11-03 6
2016-11-04 7
Run Code Online (Sandbox Code Playgroud)
到目前为止,我每天都能获得累积的不同计数,但它也包括前一天的不同 uid。
SELECT
date,
SUM(count) OVER (
ORDER BY date ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
FROM (
SELECT
date,
COUNT(DISTINCT uid) AS count
FROM sample_table
GROUP by 1
)ORDER BY date DESC;
Run Code Online (Sandbox Code Playgroud)
任何形式的帮助将不胜感激。
cak*_*aww 15
WITH firstseen AS (
SELECT uid, MIN(date) date
FROM sample_table
GROUP BY 1
)
SELECT DISTINCT date, COUNT(uid) OVER (ORDER BY date) daily_cumulative_count
FROM firstseen
ORDER BY 1
Run Code Online (Sandbox Code Playgroud)
使用SELECT DISTINCTbecause(date, COUNT(uid))会重复很多次。
说明:对于每个日期dt,它都会计算从最早日期到 的 uid dt,因为我们正在指定ORDER BY date并且它默认为BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW。
您可以用来exists检查之前的任何日期是否存在 ID。然后获取运行总和并找到每个组的最大值,这将为您提供每日不同的累积计数。
select dt, max(col) as daily_cumulative_count
from (select t1.*,
sum(case when not exists (select 1 from t where t1.dt > dt and id = t1.uid) then 1 else 0 end) over(order by dt) col
from t t1) x
group by dt
Run Code Online (Sandbox Code Playgroud)
小智 6
最简单的方法:
SELECT *, count(*) over (order by fst_date ) cum_uids
FROM (
SELECT uid, min(date) fst_date FROM t GROUP BY uid
) t
Run Code Online (Sandbox Code Playgroud)
或类似的东西