使用索引或位图索引扫描对时间戳进行高效的 PostgreSQL 查询？

Question

使用索引或位图索引扫描对时间戳进行高效的 PostgreSQL 查询？

Don*_*ter 4 sql postgresql indexing sql-execution-plan postgresql-performance

在 PostgreSQL 中，我的表上的日期字段有一个索引tickets。当我将字段与进行比较时now()，查询非常有效：

# explain analyze select count(1) as count from tickets where updated_at > now();
                                                             QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=90.64..90.66 rows=1 width=0) (actual time=33.238..33.238 rows=1 loops=1)
   ->  Index Scan using tickets_updated_at_idx on tickets  (cost=0.01..90.27 rows=74 width=0) (actual time=0.016..29.318 rows=40250 loops=1)
         Index Cond: (updated_at > now())
Total runtime: 33.271 ms

Run Code Online (Sandbox Code Playgroud)

now()如果我尝试将其与负间隔进行比较，它会走下坡路并使用位图堆扫描。

# explain analyze select count(1) as count from tickets where updated_at > (now() - '24 hours'::interval);
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=180450.15..180450.17 rows=1 width=0) (actual time=543.898..543.898 rows=1 loops=1)
->  Bitmap Heap Scan on tickets  (cost=21296.43..175963.31 rows=897368 width=0) (actual time=251.700..457.916 rows=924373 loops=1)
     Recheck Cond: (updated_at > (now() - '24:00:00'::interval))
     ->  Bitmap Index Scan on tickets_updated_at_idx  (cost=0.00..20847.74 rows=897368 width=0)     (actual time=238.799..238.799 rows=924699 loops=1)
           Index Cond: (updated_at > (now() - '24:00:00'::interval))
Total runtime: 543.952 ms

Run Code Online (Sandbox Code Playgroud)

有没有更有效的方法来使用日期算术进行查询？

Answer 1

Erw*_*ter 5

第一个查询期望找到rows=74，但实际上找到了rows=40250。
第二个查询期望找到rows=897368并且实际找到rows=924699。

当然，处理 23 倍的行需要相当多的时间。所以你的实际时间并不奇怪。

数据的统计数据updated_at > now()已过时。跑步：

ANALYZE tickets;

Run Code Online (Sandbox Code Playgroud)

并重复您的查询。你真的有数据updated_at > now()吗？这听起来不对。

然而，最近更改的数据的统计数据已经过时也就不足为奇了。这就是事情的逻辑。如果您的查询取决于当前统计信息，则必须ANALYZE在运行查询之前运行。

还可以测试（仅在您的会话中）：

SET enable_bitmapscan = off;

Run Code Online (Sandbox Code Playgroud)

并重复第二个查询以查看没有位图索引扫描的时间。

为什么位图索引扫描更多行？

普通索引扫描按照索引中的顺序从堆中获取行。这很简单、愚蠢并且没有开销。对于几行来说速度很快，但最终可能比行数不断增加的位图索引扫描更昂贵。

位图索引扫描在查找表之前从索引中收集行。如果多行驻留在同一个数据页上，则可以节省重复访问，并且可以使处理速度显着加快。行越多，位图索引扫描节省时间的机会就越大。

对于更多行（大约表的 5%，很大程度上取决于实际数据），规划器会切换到表的顺序扫描，并且根本不使用索引。

最佳方案是Postgres 9.2 中引入的仅索引扫描。只有满足一些先决条件，这才有可能。如果索引中包含所有相关列，则索引类型支持它，并且可见性映射指示数据页上的所有行对所有事务都可见，则不必从堆（表）中获取该页，并且索引中的信息就足够了。

该决定取决于您的统计数据（Postgres 期望找到多少行及其分布）以及成本设置（最重要的是random_page_cost）cpu_index_tuple_cost和effective_cache_size。

归档时间：	11 年，7 月前
查看次数：	4022 次
最近记录：	10 年，2 月前