Tho*_*aux 5 sql apache-spark apache-spark-sql
使用 Spark 1.6.2。
这里的数据:
day | visitorID
-------------
1 | A
1 | B
2 | A
2 | C
3 | A
4 | A
Run Code Online (Sandbox Code Playgroud)
我想计算前一天每天 + cumul 有多少不同的访问者(我不知道确切的术语,抱歉)。
这应该给出:
day | visitors
--------------
1 | 2 (A+B)
2 | 3 (A+B+C)
3 | 3
4 | 3
Run Code Online (Sandbox Code Playgroud)
您应该能够执行以下操作:
select day, max(visitors) as visitors
from (select day,
count(distinct visitorId) over (order by day) as visitors
from t
) d
group by day;
Run Code Online (Sandbox Code Playgroud)
实际上,我认为更好的方法是仅在访客出现的第一天记录他/她:
select startday, sum(count(*)) over (order by startday) as visitors
from (select visitorId, min(day) as startday
from t
group by visitorId
) t
group by startday
order by startday;
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1258 次 |
最近记录: |