如何按周分组Cloudera impala

Lin*_*lin 1 cloudera impala

如何按周分组Impala查询结果?数据看起来像:

    userguid                 eventtime
0   66AB1405446C74F2992016E5 2014-08-01T16:43:05Z
1   66AB1405446C74F2992016E5 2014-08-02T20:12:12Z
2   4097483F53AB3C170A490D44 2014-08-03T18:08:50Z
3   4097483F53AB3C170A490D44 2014-08-04T18:10:08Z
4   4097483F53AB3C170A490D44 2014-08-05T18:14:51Z
5   4097483F53AB3C170A490D44 2014-08-06T18:15:29Z
6   4097483F53AB3C170A490D44 2014-08-07T18:17:15Z
7   4097483F53AB3C170A490D44 2014-08-08T18:18:09Z
8   4097483F53AB3C170A490D44 2014-08-09T18:18:18Z
9   4097483F53AB3C170A490D44 2014-08-10T18:23:30Z
Run Code Online (Sandbox Code Playgroud)

预期的结果是:

date                    count of different userguid
2014-08-01~2014-08-07   40
2014-08-08~2014-08-15   20
2014-08-16~2014-08-23   10
Run Code Online (Sandbox Code Playgroud)

谢谢.

Jef*_*her 5

如果eventtime存储为timestamp:

SELECT TRUNC(eventtime, "D"), COUNT(DISTINCT userguid)
FROM your_table
GROUP BY TRUNC(eventtime, "D")
ORDER BY TRUNC(eventtime, "D");
Run Code Online (Sandbox Code Playgroud)

否则,如果eventtime存储为string:

SELECT TRUNC(CAST(eventtime AS TIMESTAMP), "D"), COUNT(DISTINCT userguid)
FROM your_table
GROUP BY TRUNC(CAST(eventtime AS TIMESTAMP), "D")
ORDER BY TRUNC(CAST(eventtime AS TIMESTAMP), "D");
Run Code Online (Sandbox Code Playgroud)

有关该TRUNC功能的更多信息,请参阅有关日期和时间功能Cloudera Impala文档.