如何在ClickHouse中按时间段分组并用null / 0s填充丢失的数据

sim*_*Pod 5 sql clickhouse

假设我有一个给定的时间范围。为了进行解释,让我们考虑一些简单的事情,例如整个2018年。我想从ClickHouse查询数据作为每个季度的总和,因此结果应为4行。

问题是我只有两个季度的数据,因此使用时GROUP BY quarter,仅返回两行。

SELECT
     toStartOfQuarter(created_at) AS time,
     sum(metric) metric
 FROM mytable
 WHERE
     created_at >= toDate(1514761200) AND created_at >= toDateTime(1514761200)
    AND
     created_at <= toDate(1546210800) AND created_at <= toDateTime(1546210800)
 GROUP BY time
 ORDER BY time
Run Code Online (Sandbox Code Playgroud)

15147612002018-01-01
15462108002018-12-31

返回:

time       metric
2018-01-01 345
2018-04-01 123
Run Code Online (Sandbox Code Playgroud)

我需要:

time       metric
2018-01-01 345
2018-04-01 123
2018-07-01 0
2018-10-01 0
Run Code Online (Sandbox Code Playgroud)

这是简化的示例,但是在实际使用情况下,聚合将是例如。5分钟而不是四分之一,GROUP BY将至少具有一个以上属性,GROUP BY attribute1, time因此期望的结果是

time        metric  attribute1
2018-01-01  345     1
2018-01-01  345     2
2018-04-01  123     1
2018-04-01  123     2
2018-07-01  0       1
2018-07-01  0       2
2018-10-01  0       1
2018-10-01  0       2
Run Code Online (Sandbox Code Playgroud)

是否有办法填充整个给定间隔?就像InfluxDB fill对group或TimescaleDb的time_bucket()函数有论据,generate_series() 我试图搜索ClickHouse文档和github问题,似乎还没有实现,所以问题也许是是否有任何解决方法。

mik*_*ail 5

您可以使用“数字”函数生成零值。然后使用 UNION ALL 加入您的查询和零值,并且已经根据获得的数据创建了一个 GROUP BY。

因此,您的查询将如下所示:

SELECT SUM(metric),
       time
  FROM (
        SELECT toStartOfQuarter(toDate(1514761200+number*30*24*3600))  time,
               toUInt16(0) AS metric
          FROM numbers(30)

     UNION ALL 

          SELECT toStartOfQuarter(created_at) AS time,
               metric
          FROM mytable
         WHERE created_at >= toDate(1514761200)
           AND created_at >= toDateTime(1514761200)
           AND created_at <= toDate(1546210800)
           AND created_at <= toDateTime(1546210800)
       )
 GROUP BY time
 ORDER BY time
Run Code Online (Sandbox Code Playgroud)

注意 UInt16(0) - 零值必须与 metrics


小智 5

从 ClickHouse 19.14 开始,您可以使用该WITH FILL子句。它可以通过这种方式填充宿舍:

WITH
    (
        SELECT toRelativeQuarterNum(toDate('1970-01-01'))
    ) AS init
SELECT
    -- build the date from the relative quarter number
    toDate('1970-01-01') + toIntervalQuarter(q - init) AS time,
    metric
FROM
(
    SELECT
        toRelativeQuarterNum(created_at) AS q,
        sum(rand()) AS metric
    FROM
    (
        -- generate some dates and metrics values with gaps
        SELECT toDate(arrayJoin(range(1514761200, 1546210800, ((60 * 60) * 24) * 180))) AS created_at
    )
    GROUP BY q
    ORDER BY q ASC WITH FILL FROM toRelativeQuarterNum(toDate(1514761200)) TO toRelativeQuarterNum(toDate(1546210800)) STEP 1
)

????????time???????metric??
? 2018-01-01 ? 2950782089 ?
? 2018-04-01 ? 2972073797 ?
? 2018-07-01 ?          0 ?
? 2018-10-01 ?  179581958 ?
???????????????????????????
Run Code Online (Sandbox Code Playgroud)