TimescaleDB 中的间隙填充 OHLCV

Nyx*_*nyx 5 sql postgresql time-series timescaledb ohlc

我在 TimescaleDB 中存储了一些 OHLCV 数据,其中包含某些时间范围内的缺失数据。该数据需要重新采样到不同的时间段(即 1 天)并包含连续的、有序的时间段。

TimescaleDB 提供了time_bucket_gapfill执行此操作的功能。我目前的查询是:

SELECT 
    time_bucket_gapfill(
        '1 day', 
        "timestamp",
        '2017-07-25 00:00', 
        '2018-01-01 00:00'
    ) as date,
    FIRST(open, "timestamp") as open,
    MAX(high) as high,
    MIN(low) as low,
    LAST(close, "timestamp") as close,
    SUM(volume) as volume
FROM ohlcv
WHERE "timestamp" > '2017-07-25'
GROUP BY date ORDER BY date ASC LIMIT 10
Run Code Online (Sandbox Code Playgroud)

结果

date                    open        high        low         close       volume
2017-07-25 00:00:00+00                  
2017-07-26 00:00:00+00                  
2017-07-27 00:00:00+00  0.00992     0.010184    0.009679    0.010039    65553.5299999999
2017-07-28 00:00:00+00  0.00999     0.010059    0.009225    0.009248    43049.93
2017-07-29 00:00:00+00  
2017-07-30 00:00:00+00  0.009518    0.0098      0.009286    0.009457    40510.0599999999

...
Run Code Online (Sandbox Code Playgroud)

问题:看起来只有date列被填空了。通过修改SQL语句,是有可能也间隙填充柱openhighlowclosevolume使得我们得出结果:

date                    open        high        low         close       volume
2017-07-25 00:00:00+00  0           0           0           0           0               
2017-07-26 00:00:00+00  0           0           0           0           0               
2017-07-27 00:00:00+00  0.00992     0.010184    0.009679    0.010039    65553.5299999999
2017-07-28 00:00:00+00  0.00999     0.010059    0.009225    0.009248    43049.93
2017-07-29 00:00:00+00  0.009248    0.009248    0.009248    0.009248    0   
2017-07-30 00:00:00+00  0.009518    0.0098      0.009286    0.009457    40510.0599999999

...
Run Code Online (Sandbox Code Playgroud)

还是建议在收到查询结果后执行这个数据输入,比如在Python/Nodejs中?


如何使用 Python/pandas 完成的示例

更喜欢使用 TimescaleDB 而不是使用我的 Nodejs 应用程序执行此间隙填充/输入,因为...使用 Nodejs 执行此操作会慢得多,而且我不想将 Python 引入应用程序只是为了执行此处理

date                    open        high        low         close       volume
2017-07-25 00:00:00+00                  
2017-07-26 00:00:00+00                  
2017-07-27 00:00:00+00  0.00992     0.010184    0.009679    0.010039    65553.5299999999
2017-07-28 00:00:00+00  0.00999     0.010059    0.009225    0.009248    43049.93
2017-07-29 00:00:00+00  
2017-07-30 00:00:00+00  0.009518    0.0098      0.009286    0.009457    40510.0599999999

...
Run Code Online (Sandbox Code Playgroud)

Ant*_*huk 5

SELECT "tickerId",
       "ts",
       coalesce("open", "close")  "open",
       coalesce("high", "close")  "high",
       coalesce("low", "close")   "low",
       coalesce("close", "close") "close",
       coalesce("volume", 0)      "volume",
       coalesce("count", 0)       "count"

FROM (
     SELECT "tickerId",
            time_bucket_gapfill('1 hour', at)   "ts",
            first(price, "eId")                 "open",
            MAX(price)                          "high",
            MIN(price)                          "low",
            locf(last(price, "eId"))            "close",
            SUM(volume)                         "volume",
            COUNT(1)                            "count"
     FROM "PublicTrades"
     WHERE at >= date_trunc('day', now() - INTERVAL '1 year')
       AND at < NOW()
     GROUP BY "tickerId", "ts"
     ORDER BY "tickerId", "ts" DESC
     LIMIT 100
 ) AS P
Run Code Online (Sandbox Code Playgroud)

注意:eId是交易所公开交易ID

  • 另请注意:我发现使用“NOW()”会使对超级表的查询速度变慢。如果您将预先生成的日期作为字符串传递给此类查询,它的工作速度会更快。我没有调查其中的原因。在 Postgres 12 和 TimsScaleDB 1.7.0 上测试 (3认同)

Mik*_*man 2

您需要在每一列中指定如何执行间隙填充。我的猜测是您可能想使用locf. 看:

https://docs.timescale.com/latest/api#time_bucket_gapfill https://docs.timescale.com/latest/api#locf