PostgreSQL - 多列 B 树索引如何处理第一列的 order by 和第二列的 IN 查找?

ALZ*_*ALZ 5 postgresql performance index postgresql-9.5 postgresql-performance

我创建了这样的表(类似于http://use-the-index-luke.com/sql/example-schema/postgresql/performance-testing-scalability 中的示例)

CREATE TABLE scale_data (
   section NUMERIC NOT NULL,
   id1     NUMERIC NOT NULL, -- unique values simulating ID or Timestamp
   id2     NUMERIC NOT NULL -- a kind of Type
);
Run Code Online (Sandbox Code Playgroud)

填充它:

INSERT INTO scale_data
SELECT sections.sections, sections.sections*10000 + gen.gen
     , CEIL(RANDOM()*100) 
  FROM GENERATE_SERIES(1, 300)     sections,
       GENERATE_SERIES(1, 90000) gen
 WHERE gen <= sections * 300;
Run Code Online (Sandbox Code Playgroud)

它生成了 13545000 条记录。

综合指数就可以了:

CREATE INDEX id1_id2_idx
  ON public.scale_data
  USING btree
  (id1, id2);
Run Code Online (Sandbox Code Playgroud)

并选择#1:

select id2 from scale_data 
where id2 in (50)
order by id1 desc
limit 500
Run Code Online (Sandbox Code Playgroud)

解释分析:

"Limit  (cost=0.56..1177.67 rows=500 width=11) (actual time=0.046..5.124 rows=500 loops=1)"
"  ->  Index Only Scan Backward using id1_id2_idx on scale_data  (cost=0.56..311588.74 rows=132353 width=11) (actual time=0.045..5.060 rows=500 loops=1)"
"        Index Cond: (id2 = '50'::numeric)"
"        Heap Fetches: 0"
"Planning time: 0.103 ms"
"Execution time: 5.177 ms"
Run Code Online (Sandbox Code Playgroud)

Select#2 --more values in IN - 计划已更改

select id2 from scale_data 
where id2 in (50, 52)
order by id1 desc
limit 500
Run Code Online (Sandbox Code Playgroud)

解释分析#2:

"Limit  (cost=0.56..857.20 rows=500 width=11) (actual time=0.061..8.703 rows=500 loops=1)"
"  ->  Index Only Scan Backward using id1_id2_idx on scale_data  (cost=0.56..445780.74 rows=260190 width=11) (actual time=0.059..8.648 rows=500 loops=1)"
"        Filter: (id2 = ANY ('{50,52}'::numeric[]))"
"        Rows Removed by Filter: 25030"
"        Heap Fetches: 0"
"Planning time: 0.153 ms"
"Execution time: 8.771 ms"
Run Code Online (Sandbox Code Playgroud)

为什么计划不同?为什么在 #1 中它确实显示为Index condition,但在 #2 Filter和索引扫描单元格的数量中。sql#1 不是像explain for sql#2 显示的那样遍历索引吗?

在实际/生产 DB #2 上的工作速度要慢得多,即使分别按 2 个键进行搜索也很快

PG 9.5

Eva*_*oll 0

我不会让这个困扰你。FILTER在这种情况下,我相信这只意味着索引上有多个条件语句(这就是IN数组操作如何转换为,据我所知)。无论是哪一种,它们都是在Index Only Scan Backward. 它的工作方式与OR

                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..1219.95 rows=500 width=11) (actual time=0.061..16.159 rows=500 loops=1)
   ->  Index Only Scan Backward using id1_id2_idx on scale_data  (cost=0.56..679161.56 rows=278484 width=11) (actual time=0.060..16.086 rows=500 loops=1)
         Filter: ((id2 = '50'::numeric) OR (id2 = '52'::numeric))
         Rows Removed by Filter: 24673
         Heap Fetches: 25173
 Planning time: 0.206 ms
 Execution time: 16.235 ms
(7 rows)

test=# EXPlAIN ANALYZE select id2 from scale_data 
where id2 in (50, 52)
order by id1 desc
limit 500
;
                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..1153.17 rows=500 width=11) (actual time=0.072..18.604 rows=500 loops=1)
   ->  Index Only Scan Backward using id1_id2_idx on scale_data  (cost=0.56..645299.05 rows=279930 width=11) (actual time=0.070..18.506 rows=500 loops=1)
         Filter: (id2 = ANY ('{50,52}'::numeric[]))
         Rows Removed by Filter: 24673
         Heap Fetches: 25173
 Planning time: 0.187 ms
 Execution time: 18.695 ms
(7 rows)
Run Code Online (Sandbox Code Playgroud)