当我们使用Unique索引而不是非Unique索引时,性能有什么提升吗?

D-S*_*hih 2 postgresql indexing unique-index sql-execution-plan

我知道如果数据是唯一的,理论上唯一索引会比非唯一索引更快。

因为唯一索引能够提供更多的信息,让查询优化器选择更有效的执行计划。

我正在做一些测试,想证明唯一索引可能比执行计划中的非唯一索引更好,但结果显示它们是相同的......

CREATE TABLE T3(
    ID INT NOT NULL,
    val INT NOT NULL,
    col1 UUID NOT NULL,
    col2 UUID NOT NULL,
    col3 UUID NOT NULL,
    col4 UUID NOT NULL,
    col5 UUID NOT NULL,
    col6 UUID NOT NULL
);

CREATE INDEX IX_ID_T3 ON T3 (ID);
CREATE UNIQUE INDEX UIX_ID_T3 ON T3 (ID);

INSERT INTO T3
SELECT i,
       RANDOM() * 1000000,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid
FROM generate_series(1,1000000) i;

vacuum ANALYZE T3;
Run Code Online (Sandbox Code Playgroud)

我创建了一个表和两个索引(IX_ID_T3不是唯一的,UIX_ID_T3是唯一的),然后插入了 1000000 个样本行。

插入数据后我运行vacuum ANALYZE T3;

--drop index IX_ID_T3 

EXPLAIN (ANALYZE,TIMING ON,BUFFERS ON)
SELECT DISTINCT a1.ID
FROM T3 a1 INNER JOIN T3 a2
ON a1.id = a2.id
WHERE a1.id <= 300000
Run Code Online (Sandbox Code Playgroud)

第一个查询,我尝试在UIX_ID_T3IX_ID_T3by之间进行测试Merge-Join

Buffers: shared hit执行计划没有什么不同。

这是我的执行计划

-- UIX_ID_T3 
"Unique  (cost=0.85..41457.94 rows=298372 width=4) (actual time=0.030..267.207 rows=300000 loops=1)"
"  Buffers: shared hit=1646"
"  ->  Merge Join  (cost=0.85..40712.01 rows=298372 width=4) (actual time=0.030..200.412 rows=300000 loops=1)"
"        Merge Cond: (a1.id = a2.id)"
"        Buffers: shared hit=1646"
"        ->  Index Only Scan using uix_id_t3 on t3 a1  (cost=0.42..8501.93 rows=298372 width=4) (actual time=0.017..49.237 rows=300000 loops=1)"
"              Index Cond: (id <= 300000)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"        ->  Index Only Scan using uix_id_t3 on t3 a2  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.010..40.170 rows=300000 loops=1)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"Planning Time: 0.171 ms"
"Execution Time: 282.919 ms"

---IX_ID_T3 
"Unique  (cost=0.85..41420.43 rows=297587 width=4) (actual time=0.027..230.256 rows=300000 loops=1)"
"  Buffers: shared hit=1646"
"  ->  Merge Join  (cost=0.85..40676.46 rows=297587 width=4) (actual time=0.027..173.308 rows=300000 loops=1)"
"        Merge Cond: (a1.id = a2.id)"
"        Buffers: shared hit=1646"
"        ->  Index Only Scan using ix_id_t3 on t3 a1  (cost=0.42..8476.20 rows=297587 width=4) (actual time=0.015..41.606 rows=300000 loops=1)"
"              Index Cond: (id <= 300000)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"        ->  Index Only Scan using ix_id_t3 on t3 a2  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.009..34.019 rows=300000 loops=1)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"Planning Time: 0.195 ms"
"Execution Time: 243.711 ms"
Run Code Online (Sandbox Code Playgroud)

还有另一个问题are-unique-indexes-better-for-column-search-performance-pgsql-mysql可以与此主题讨论。

我也尝试过测试问题查询的答案,但执行计划没有什么不同。

EXPLAIN (ANALYZE,TIMING ON,BUFFERS ON)
SELECT  id
FROM    T3
ORDER BY
        id
LIMIT 10;
Run Code Online (Sandbox Code Playgroud)
-- using IX_ID_T3 
"Limit  (cost=0.42..0.68 rows=10 width=4) (actual time=0.034..0.036 rows=10 loops=1)"
"  Buffers: shared hit=4"
"  ->  Index Only Scan using uix_id_t3 on t3  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.033..0.034 rows=10 loops=1)"
"        Heap Fetches: 0"
"        Buffers: shared hit=4"
"Planning Time: 0.052 ms"
"Execution Time: 0.047 ms"

-- using IX_ID_T3
"Limit  (cost=0.42..0.68 rows=10 width=4) (actual time=0.026..0.029 rows=10 loops=1)"
"  Buffers: shared hit=4"
"  ->  Index Only Scan using ix_id_t3 on t3  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.025..0.027 rows=10 loops=1)"
"        Heap Fetches: 0"
"        Buffers: shared hit=4"
"Planning Time: 0.075 ms"
"Execution Time: 0.043 ms"
Run Code Online (Sandbox Code Playgroud)

我看到了很多不同的文章,但我无法通过执行计划证明唯一索引比非唯一索引更好。

Postgres 唯一约束与索引

问题:

任何人都可以证明唯一索引比执行计划中的非唯一索引更好并向我们展示查询和执行计划吗?

据我所知,sql-server 的唯一索引不仅是一个约束,而且比非唯一索引有更好的性能。

合并连接的许多奥秘

Lau*_*lbe 7

唯一索引的扫描速度并不比非唯一索引快。查询执行速度的唯一潜在好处可能是优化器可以从唯一性中进行某些扣除,例如删除不必要的连接。

\n

唯一索引的主要用途是实现表约束,而不是提供相对于非唯一索引的性能优势。

\n

这是一个例子:

\n
CREATE TABLE parent (pid bigint PRIMARY KEY);\n\nCREATE TABLE child (\n   cid bigint PRIMARY KEY,\n   pid bigint UNIQUE REFERENCES parent\n);\n\nEXPLAIN (COSTS OFF)\nSELECT parent.pid FROM parent LEFT JOIN child USING (pid);\n\n     QUERY PLAN     \n\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\n Seq Scan on parent\n(1 row)\n
Run Code Online (Sandbox Code Playgroud)\n

如果没有唯一约束child.pid(由唯一索引实现),则无法删除连接。

\n