如何处理具有单独日期和时间列的表中的日期时间范围?

Mar*_*777 4 postgresql datetime postgresql-9.4

在存储事件的表中,日期和时间是两个单独的列:

CREATE TABLE events (
    pk int serial, 
    detail text, 
    ev_date date, 
    ev time without time zone
);
Run Code Online (Sandbox Code Playgroud)

如果我过滤日期:

ev_date BETWEEN start_date::date AND end_date::date AND 
ev_time BETWEEN start_date::time AND end_date::time 
Run Code Online (Sandbox Code Playgroud)

我错过了每天时间范围之外发生的两个日期内的所有事件。

因此,开始时间仅与开始日期相关,与结束日期相同。

有人对如何有效地处理数百万个事件提出建议吗?

ype*_*eᵀᴹ 8

您可以使用简单的条件:

WHERE (ev_date, ev_time) BETWEEN (start_date, start_time) 
                             AND (end_date, end_time) 
Run Code Online (Sandbox Code Playgroud)

或者这个:

WHERE ev_date + ev_time BETWEEN start_date + start_time
                            AND end_date + end_time
Run Code Online (Sandbox Code Playgroud)

这两者中最好的取决于您在表上的索引。如果您有索引,请(ev_date, ev_time)使用第一个。如果您可以在 上添加过滤索引(ev_date + ev_time),请使用第二个。

*:我假设所有这些都是参数:start_date, start_time, end_date, end_time.

  • 这特别有用,因为 Postgres 可以为 *row values* 使用这样的索引(而不是其他一些 RDBMS): [SQL 语法术语 for 'WHERE (col1, col2) < (val1, val2)'](http:// stackoverflow.com/a/32982895/939860) (3认同)

shx*_*shx 5

您可以使用+运算符。

SELECT pk,ev_date,ev FROM events;

 pk |  ev_date   |    ev    
----+------------+----------
  1 | 2016-02-19 | 01:00:00
  2 | 2016-02-19 | 02:00:00
  3 | 2016-02-19 | 05:00:00
  4 | 2016-02-19 | 12:00:00
  5 | 2016-02-19 | 18:00:00
  6 | 2016-02-19 | 23:00:00
  7 | 2016-02-20 | 01:00:00
  8 | 2016-02-20 | 05:00:00
  9 | 2016-02-20 | 12:00:00
 10 | 2016-02-20 | 18:00:00
(10 rows)

SELECT pk, ev_date, ev 
FROM events 
WHERE (ev_date + ev) 
    BETWEEN ('2016-02-19 04:00:00') 
        AND ('2016-02-20 02:00:00');

 pk |  ev_date   |    ev    
----+------------+----------
  3 | 2016-02-19 | 05:00:00
  4 | 2016-02-19 | 12:00:00
  5 | 2016-02-19 | 18:00:00
  6 | 2016-02-19 | 23:00:00
  7 | 2016-02-20 | 01:00:00
(5 rows)
Run Code Online (Sandbox Code Playgroud)

不要忘记在下面创建索引:

CREATE INDEX events_ts_idx ON events ((ev_date + ev));
ANALYZE events;
Run Code Online (Sandbox Code Playgroud)

我插入了许多虚拟行,所以我显示了 EXPLAIN 的结果:

EXPLAIN ANALYZE SELECT pk, ev_date, ev FROM events  WHERE (ev_date + ev) 
    BETWEEN ('2016-02-19 23:50:00') 
        AND ('2016-02-20 00:01:00');
                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using events_ts_idx on events  (cost=0.29..8.52 rows=8 width=16) (actual time=0.014..0.029 rows=42 loops=1)
   Index Cond: (((ev_date + ev) >= '2016-02-19 23:50:00'::timestamp without time zone) AND ((ev_date + ev) <= '2016-02-20 00:01:00'::timestamp without time zone))
 Planning time: 0.082 ms
 Execution time: 0.053 ms
(4 rows)
Run Code Online (Sandbox Code Playgroud)

为了比较,我创建了其他索引并尝试了其他形式:

CREATE INDEX events_ts2_idx ON events (ev_date,ev);
ANALYZE events;

EXPLAIN ANALYZE SELECT pk, ev_date, ev FROM events  WHERE (ev_date,ev) 
    BETWEEN ('2016-02-19','23:50:00') 
        AND ('2016-02-20','0:01:00');
                             QUERY PLAN
--------------------------------------------------------------------------
 Bitmap Heap Scan on events  (cost=189.50..511.36 rows=7143 width=16) (actual time=0.027..0.042 rows=42 loops=1)
   Recheck Cond: ((ROW(ev_date, ev) >=ROW('2016-02-19'::date,'23:50:00'::time without time zone)) AND (ROW(ev_date, ev) <= ROW('2016-02-20'::date, '00:01:00'::time without time zone)))
   Heap Blocks: exact=7
   ->  Bitmap Index Scan on events_ts2_idx  (cost=0.00..187.72rows=7143 width=0) (actual time=0.019..0.019 rows=42 loops=1)
         Index Cond: ((ROW(ev_date, ev) >= ROW('2016-02-19'::date,'23:50:00'::time without time zone))AND(ROW(ev_date, ev) <= ROW('2016-02-20'::date, '00:01:00'::time without time zone)))
 Planning time: 0.079 ms
 Execution time: 0.071 ms
(7 rows)
Run Code Online (Sandbox Code Playgroud)

根据我的调查,我的方式(使用+运算符)更好。我建议在您的机器上与两种方式进行比较。

  • 你真的不需要`BETWEEN` 表达式中的`date + time` 结构,对吗? (3认同)
  • 我把`EXPLAIN ANALYZE`的结果。前一种方式只读取表“events”的一些行,因为索引“events_ts_idx”直接指向表的目标元组。当然,不需要读取多行索引。另一方面,后一种方式有点复杂,因为它必须创建位图。 (2认同)