brg*_*eek 2 postgresql performance index execution-plan postgresql-performance
根据user_id
我在子句中使用的查询,我遇到了性能问题WHERE
。
这个问题描述了一个非常相似的问题,但不完全相同:
这是我的查询:
select inlineview.user_search_id, inlineview.user_id,
to_char(timezone('UTC', to_timestamp(date_part('epoch',
inlineview.last_access_date))),'YYYY-MM-DD"T"HH24:MI:SS.MSZ')
last_access_date,
to_char(timezone('UTC', to_timestamp(date_part('epoch',
inlineview.sys_creation_date))),'YYYY-MM-DD"T"HH24:MI:SS.MSZ')
sys_creation_date, b.product_id,
to_char(timezone('UTC', to_timestamp(date_part('epoch',
b.last_prov_change))),'YYYY-MM-DD"T"HH24:MI:SS.MSZ')
last_prov_change,b.product_type,b.tlc_id,b.prod_history_id,
coalesce(d.address_id,-1) address_id,coalesce(d.locality_name,'')
locality_name,
coalesce(d.post_code_zone,'') post_code_zone,
coalesce(d.road_name_concat,'')
road_name_concat,coalesce(d.street_num_concat,'') street_num_concat,
coalesce(d.address_name,'') address_name,
coalesce(d.town_name,'') town_name
from
(select a.user_search_id,
a.user_id,a.last_access_date,
a.sys_creation_date,a.product_id
from linetest.user_search a
where a.user_id = '818901'
order by a.last_access_date desc limit 10) inlineview,
linetest.prod_history b left outer join linetest.address d on d.tlc_id = b.tlc_id
where b.product_id = inlineview.product_id
and b.prod_history_id = (select c.prod_history_id
from linetest.prod_history c
where c.product_id = b.product_id
group by c.prod_history_id
order by c.prod_history_id desc
limit 1)
order by
inlineview.last_access_date desc,
inlineview.user_search_id desc;
Run Code Online (Sandbox Code Playgroud)
通过子句中的此特定user_id
('818901'),WHERE
查询大约需要 2 分钟才能运行。如果我将子句user_id
上的值更改WHERE
为任何其他值,查询将立即运行。
现在,对我来说最有趣的部分是“有问题”的查询计划user_id
与任何其他user_id
.
使用任何其他值运行查询的正常/预期查询性能(0.5 秒):
Sort (cost=2522.41..2522.41 rows=1 width=136)
Sort Key: a.last_access_date DESC, a.user_search_id DESC
-> Nested Loop Left Join (cost=572.42..2522.40 rows=1 width=136)
-> Nested Loop (cost=572.00..2517.96 rows=1 width=70)
-> Limit (cost=571.57..571.60 rows=10 width=42)
-> Sort (cost=571.57..571.97 rows=159 width=42)
Sort Key: a.last_access_date DESC
-> Bitmap Heap Scan on user_search a (cost=5.66..568.14 rows=159 width=42)
Recheck Cond: ((user_id)::text = '601401'::text)
-> Bitmap Index Scan on user_search_idx (cost=0.00..5.62 rows=159 width=0)
Index Cond: ((user_id)::text = '601401'::text)
-> Index Scan using prod_history_idx on prod_history b (cost=0.42..194.62 rows=1 width=38)
Index Cond: ((product_id)::text = (a.product_id)::text)
Filter: (prod_history_id = (SubPlan 1))
SubPlan 1
-> Limit (cost=27.81..27.81 rows=1 width=8)
-> Group (cost=27.81..27.84 rows=6 width=8)
Group Key: c.prod_history_id
-> Sort (cost=27.81..27.82 rows=6 width=8)
Sort Key: c.prod_history_id DESC
-> Index Scan using prod_history_idx on prod_history c (cost=0.42..27.73 rows=6 width=8)
Index Cond: ((product_id)::text = (b.product_id)::text)
-> Index Scan using address_idx on address d (cost=0.42..4.41 rows=1 width=75)
Index Cond: ((tlc_id)::text = (b.tlc_id)::text)
Run Code Online (Sandbox Code Playgroud)
运行有问题的查询性能缓慢(130 秒以上)user_id
:
Sort (cost=2522.41..2522.41 rows=1 width=136)
Sort Key: a.last_access_date DESC, a.user_search_id DESC
-> Nested Loop Left Join (cost=572.42..2522.40 rows=1 width=136)
-> Nested Loop (cost=572.00..2517.96 rows=1 width=70)
-> Limit (cost=571.57..571.60 rows=10 width=42)
-> Sort (cost=571.57..571.97 rows=159 width=42)
Sort Key: a.last_access_date DESC
-> Bitmap Heap Scan on user_search a (cost=5.66..568.14 rows=159 width=42)
Recheck Cond: ((user_id)::text = '818901'::text)
-> Bitmap Index Scan on user_search_idx (cost=0.00..5.62 rows=159 width=0)
Index Cond: ((user_id)::text = '818901'::text)
-> Index Scan using prod_history_idx on prod_history b (cost=0.42..194.62 rows=1 width=38)
Index Cond: ((product_id)::text = (a.product_id)::text)
Filter: (prod_history_id = (SubPlan 1))
SubPlan 1
-> Limit (cost=27.81..27.81 rows=1 width=8)
-> Group (cost=27.81..27.84 rows=6 width=8)
Group Key: c.prod_history_id
-> Sort (cost=27.81..27.82 rows=6 width=8)
Sort Key: c.prod_history_id DESC
-> Index Scan using prod_history_idx on prod_history c (cost=0.42..27.73 rows=6 width=8)
Index Cond: ((product_id)::text = (b.product_id)::text)
-> Index Scan using address_idx on address d (cost=0.42..4.41 rows=1 width=75)
Index Cond: ((tlc_id)::text = (b.tlc_id)::text)
Run Code Online (Sandbox Code Playgroud)
我尝试过使用索引来尝试强制更改查询计划,但在所有测试中,两个值的成本都是相同的,但运行查询的实际时间仍然根据值的不同而有很大不同。
在所有测试中,两个值的成本相同,但运行查询的实际时间仍然根据值的不同而有很大差异。
如果有问题的用户的行数比所有其他用户的行数多(或少),则通常会出现这种情况。数据分布不规则。
在许多情况下,Postgres 可以切换到不同的查询计划(例如使用顺序扫描user_search
而不是user_search_idx
我们当前看到的位图索引扫描。如果没有更多信息,很难判断。您可以删除索引进行测试,然后重试昂贵的查询来查看如果它变得更快。(但是,下面建议的索引应该涵盖所有情况。)
特别是,您的成本和 autovacuum 设置至关重要。增加统计目标linetest.user_search.user_id
可能会有所帮助。看:
您有两个子查询ORDER BY ... LIMIT n
。这些是特别精致的结构,具有不规则的值频率。包括与linetest.user_search
. where a.user_idf = '818901'
有关的:
最重要的建议:创建一个多列索引来(user_id, last_access_date desc)
匹配您的查询,甚至您的“快速”查询也应该更快:
CREATE INDEX foo ON linetest.user_search (user_id, last_access_date desc)
Run Code Online (Sandbox Code Playgroud)
如果last_access_date
未定义NOT NULL
,您可能需要添加NULLS LAST
查询和索引。看:
还有另一则关于:
CREATE INDEX bar ON linetest.prod_history (product_id, prod_history_id DESC)
Run Code Online (Sandbox Code Playgroud)
另外,您在子句中混合了显式和隐式连接FROM
,这可能对您不利。比较:
其他一些部分也可能会得到改进。就像group by
增加成本却没有收益一样。这个等效的查询应该更快:
SELECT ...
FROM (
SELECT user_search_id, user_id, last_access_date, sys_creation_date, product_id
FROM linetest.user_search
WHERE user_id = '818901' -- why the quotes? type?
ORDER BY last_access_date DESC -- column is NOT NULL?
LIMIT 10
) i
JOIN linetest.prod_history b USING (product_id)
JOIN LATERAL (
SELECT prod_history_id
FROM linetest.prod_history
WHERE product_id = b.product_id
-- GROUP BY prod_history_id -- pointless!
ORDER BY prod_history_id DESC -- make sure column is NOT NULL
LIMIT 1
) c USING (prod_history_id)
LEFT JOIN linetest.address d USING (tlc_id)
ORDER BY i.last_access_date DESC, i.user_search_id DESC;
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1845 次 |
最近记录: |