为什么这个限制使 postgres 规划器使用更慢的索引扫描而不是更快的位图堆/索引扫描？

Question

为什么这个限制使 postgres 规划器使用更慢的索引扫描而不是更快的位图堆/索引扫描？

我在 Postgres 9.3.3 中有一个大约 700k 行的表，它具有以下结构：

Columns:
 content_body  - text                        
 publish_date  - timestamp without time zone 
 published     - boolean       

Indexes:
    "articles_pkey" PRIMARY KEY, btree (id)
    "article_text_gin" gin (article_text)
    "articles_publish_date_id_index" btree (publish_date DESC NULLS LAST, id DESC)

Run Code Online (Sandbox Code Playgroud)

我所做的查询有全文搜索查询和限制，如下所示：

当我在我的索引中搜索具有限制和顺序的字符串时，查询速度很快：

explain analyze select * from "articles" where article_text @@ plainto_tsquery('pg_catalog.simple', 'in_index') order by id limit 10;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..1293.88 rows=10 width=1298) (actual time=2.073..9.837 rows=10 loops=1)
   ->  Index Scan using articles_pkey on articles  (cost=0.42..462150.49 rows=3573 width=1298) (actual time=2.055..9.711 rows=10 loops=1)
         Filter: (article_text @@ '''in_index'''::tsquery)
         Rows Removed by Filter: 611
 Total runtime: 9.952 ms

Run Code Online (Sandbox Code Playgroud)

但是，如果字符串不在索引中，则需要更长的时间：

explain analyze select * from "articles" where article_text @@ plainto_tsquery('pg_catalog.simple', 'not_in_index') order by id limit 10;
                                                                  QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..1293.88 rows=10 width=1298) (actual time=5633.684..5633.684 rows=0 loops=1)
   ->  Index Scan using articles_pkey on articles  (cost=0.42..462150.49 rows=3573 width=1298) (actual time=5633.672..5633.672 rows=0 loops=1)
         Filter: (article_text @@ '''not_in_index'''::tsquery)
         Rows Removed by Filter: 796146
 Total runtime: 5633.745 ms

Run Code Online (Sandbox Code Playgroud)

但是，如果我删除 order 子句，则对于任何一种情况都很快：

explain analyze select * from "articles" where article_text @@ plainto_tsquery('pg_catalog.simple', 'in_index')  limit 10;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=55.69..90.22 rows=10 width=1298) (actual time=7.748..7.853 rows=10 loops=1)
   ->  Bitmap Heap Scan on articles  (cost=55.69..12390.60 rows=3573 width=1298) (actual time=7.735..7.781 rows=10 loops=1)
         Recheck Cond: (article_text @@ '''in_index'''::tsquery)
         ->  Bitmap Index Scan on article_text_gin  (cost=0.00..54.80 rows=3573 width=0) (actual time=5.977..5.977 rows=8910 loops=1)
               Index Cond: (article_text @@ '''in_index'''::tsquery)
 Total runtime: 7.952 ms


explain analyze select * from "articles" where article_text @@ plainto_tsquery('pg_catalog.simple', 'not_in_index')  limit 10;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=55.69..90.22 rows=10 width=1298) (actual time=0.083..0.083 rows=0 loops=1)
   ->  Bitmap Heap Scan on articles  (cost=55.69..12390.60 rows=3573 width=1298) (actual time=0.065..0.065 rows=0 loops=1)
         Recheck Cond: (article_text @@ '''not_in_index'''::tsquery)
         ->  Bitmap Index Scan on article_text_gin  (cost=0.00..54.80 rows=3573 width=0) (actual time=0.047..0.047 rows=0 loops=1)
               Index Cond: (article_text @@ '''not_in_index'''::tsquery)
 Total runtime: 0.163 ms

Run Code Online (Sandbox Code Playgroud)

删除 limit 子句具有相同的效果，尽管 in index 查询明显变慢：

explain analyze select * from "articles" where article_text @@ plainto_tsquery('pg_catalog.simple', 'in_index') order by id;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=12601.46..12610.40 rows=3573 width=1298) (actual time=106.347..140.481 rows=8910 loops=1)
   Sort Key: id
   Sort Method: external merge  Disk: 12288kB
   ->  Bitmap Heap Scan on articles  (cost=55.69..12390.60 rows=3573 width=1298) (actual time=5.618..50.329 rows=8910 loops=1)
         Recheck Cond: (article_text @@ '''in_index'''::tsquery)
         ->  Bitmap Index Scan on article_text_gin  (cost=0.00..54.80 rows=3573 width=0) (actual time=4.243..4.243 rows=8910 loops=1)
               Index Cond: (article_text @@ '''in_index'''::tsquery)
 Total runtime: 170.987 ms

explain analyze select * from "articles" where article_text @@ plainto_tsquery('pg_catalog.simple', 'not_in_index') order by id;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=12601.46..12610.40 rows=3573 width=1298) (actual time=0.067..0.067 rows=0 loops=1)
   Sort Key: id
   Sort Method: quicksort  Memory: 25kB
   ->  Bitmap Heap Scan on articles  (cost=55.69..12390.60 rows=3573 width=1298) (actual time=0.044..0.044 rows=0 loops=1)
         Recheck Cond: (article_text @@ '''not_in_index'''::tsquery)
         ->  Bitmap Index Scan on article_text_gin  (cost=0.00..54.80 rows=3573 width=0) (actual time=0.026..0.026 rows=0 loops=1)
               Index Cond: (article_text @@ '''not_in_index'''::tsquery)
 Total runtime: 0.148 ms

Run Code Online (Sandbox Code Playgroud)

我能推断出的一点是，总的来说，位图索引扫描+位图堆扫描对我的查询来说总体上比索引扫描更好。我怎么能告诉查询规划器这样做呢？

Answer 1

dez*_*zso 5

前两个查询的最大区别在于，第一个是 int，它可以沿着表的主键使用的索引（和ORDER BY子句使用），然后过滤掉不符合WHERE条件的行。您可以看到它必须访问大约 621 行（返回的 10 行和过滤的 611 行）才能做好准备。

现在第二个使用相同的逻辑，但没有找到单个匹配项（更不用说 10），它必须遍历整个索引并丢弃所有行 ( Rows Removed by Filter: 796146)。

第二对，没有排序，选择了不同的计划，在这种情况下，它碰巧对返回 0 行更有效:)

第三对，知道它必须返回很多行（它计划为 3573 而不是 10），再次采用不同的计划，使用位图堆扫描（不是位图索引扫描，如第二对）。时间差异主要归因于该节点：

排序方式：外部合并磁盘：12288kB

如果你提高work_mem到一个更高的值（比如 100 MB），我猜这种差异基本上会消失。

归档时间：	11 年，6 月前
查看次数：	1574 次
最近记录：	11 年，6 月前