Sjo*_*jon 3 postgresql performance postgresql-performance
我有一个中等大小的“functionCalls”表(约 4M 行),由 2 列组成,input
并且function
(另一个表的两个 id):
Column | Type | Collation | Nullable | Default \n----------+---------+-----------+----------+---------\n input | integer | | not null | \n function | integer | | not null | \nIndexes:\n "functionCall_pkey" PRIMARY KEY, btree (input, function) CLUSTER\n "functionCallSearch" btree (function, input)\nForeign-key constraints:\n "fkey1" FOREIGN KEY (function) REFERENCES function(id) ON UPDATE CASCADE ON DELETE CASCADE\n "fkey2" FOREIGN KEY (input) REFERENCES input(id)\n\n
Run Code Online (Sandbox Code Playgroud)\n\n我想找到与某个函数匹配的所有行,这就是我添加索引的原因functionCallSearch
。这是我的查询:
Column | Type | Collation | Nullable | Default \n----------+---------+-----------+----------+---------\n input | integer | | not null | \n function | integer | | not null | \nIndexes:\n "functionCall_pkey" PRIMARY KEY, btree (input, function) CLUSTER\n "functionCallSearch" btree (function, input)\nForeign-key constraints:\n "fkey1" FOREIGN KEY (function) REFERENCES function(id) ON UPDATE CASCADE ON DELETE CASCADE\n "fkey2" FOREIGN KEY (input) REFERENCES input(id)\n\n
Run Code Online (Sandbox Code Playgroud)\n\n这需要永远(当前〜20秒),因为pg拒绝使用索引,并决定做一个Index Only Scan Backward
对主键执行 a 操作:
Limit (cost=0.71..2178.97 rows=25 width=4) (actual time=12903.294..19142.568 rows=8 loops=1)\n Output: c.input\n Buffers: shared hit=59914 read=26193 written=54\n -> Nested Loop (cost=0.71..135662.48 rows=1557 width=4) (actual time=12903.292..19142.561 rows=8 loops=1)\n Output: c.input\n Inner Unique: true\n Join Filter: (c.function = function.id)\n Rows Removed by Join Filter: 3649900\n Buffers: shared hit=59914 read=26193 written=54\n -> Index Only Scan Backward using "functionCall_pkey" on public."functionCall" c (cost=0.43..80906.80 rows=3650225 width=8) (actual time=0.040..17083.489 rows=3649908 loops=1)\n Output: c.input, c.function\n Heap Fetches: 3649909\n Buffers: shared hit=59911 read=26193 written=54\n -> Materialize (cost=0.28..2.30 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=3649908)\n Output: function.id\n Buffers: shared hit=3\n -> Index Scan using function_text on public.function (cost=0.28..2.30 rows=1 width=4) (actual time=0.023..0.026 rows=1 loops=1)\n Output: function.id\n Index Cond: ((function.text)::text = \'getmyinode\'::text)\n Buffers: shared hit=3\n Planning Time: 0.392 ms\n Execution Time: 19143.967 ms\n
Run Code Online (Sandbox Code Playgroud)\n\n当我删除LIMIT
这个查询时,速度非常快:
Sort (cost=5247.53..5251.42 rows=1557 width=4) (actual time=3.762..3.763 rows=8 loops=1)\n Output: c.input\n Sort Key: c.input DESC\n Sort Method: quicksort Memory: 25kB\n Buffers: shared hit=6 read=4\n -> Nested Loop (cost=0.71..5164.97 rows=1557 width=4) (actual time=0.099..3.739 rows=8 loops=1)\n Output: c.input\n Buffers: shared hit=6 read=4\n -> Index Scan using function_text on public.function (cost=0.28..2.30 rows=1 width=4) (actual time=0.054..0.056 rows=1 loops=1)\n Output: function.id\n Index Cond: ((function.text)::text = \'getmyinode\'::text)\n Buffers: shared hit=2 read=1\n -> Index Only Scan using "functionCallSearch" on public."functionCall" c (cost=0.43..5103.71 rows=5897 width=8) (actual time=0.039..3.670 rows=8 loops=1)\n Output: c.function, c.input\n Index Cond: (c.function = function.id)\n Heap Fetches: 8\n Buffers: shared hit=4 read=3\n Planning Time: 0.514 ms\n Execution Time: 3.819 ms\n
Run Code Online (Sandbox Code Playgroud)\n\n到目前为止我尝试过的事情:
\n\n在阅读Postgres 有时使用劣质索引 WHERE a IN (\xe2\x80\xa6) ORDER BY b LIMIT N我已经检查过n_distinct
- 但并不是那么遥远, pg_stats 说623
while SELECT COUNT(*) FROM (SELECT DISTINCT function FROM "functionCall")
returns 1065
。
我已将表增加到SET STATISTICS
10k 并运行ANALYZE
。这将时间减少了一半(9 秒),但仍然不会使用我创建的索引。
为什么是这样?我该如何解决这个问题?
\nPostgreSQL 估计将有 1557 行满足条件,因此它认为如果避免显式排序,而是按顺序扫描行ORDER BY
并执行嵌套循环连接,直到找到足够的行,速度会更快。
不幸的是,这根本行不通,PostgreSQL 必须以这种方式扫描整个表,因为总共只有 8 个匹配项。
问题似乎是估计值相当偏离:它认为public."functionCall"
for的索引扫描function
将产生 5897 行,而实际上只有 8 行。
作为第一个措施,尝试计算表上的新分布统计信息:
ANALYZE public."functionCall";
Run Code Online (Sandbox Code Playgroud)
如果仅此不能改善估计,请增加粒度:
ALTER TABLE public."functionCall" ALTER function SET STATISTICS 1000;
ANALYZE public."functionCall";
Run Code Online (Sandbox Code Playgroud)
这应该会改善估计。您也可以尝试使用高于 1000 的值。
如果全部失败,请明确告诉 PostgreSQL 不要使用该策略
ORDER BY c.input + 0 DESC
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
2029 次 |
最近记录: |