phl*_*egx 3 postgresql performance index index-tuning postgresql-9.4 postgresql-performance
我使用 Postgresql 9.4,我有一个名为 foo 的大表。我想搜索它,但如果搜索文本很短(例如“v”)或很长(例如“这是一个在表 foo% 上使用 gin 的搜索示例”),我的执行时间会很长。在这种情况下,我的索引被忽略。这是我的搜索查询:
EXPLAIN (ANALYZE, TIMING)
SELECT "foo".* FROM "foo" WHERE "foo"."locale" = 'de'
AND f_unaccent(foo.name) ILIKE f_unaccent('v%')
AND foo.configuration->'bar' @> '{"is":["a"]}'
LIMIT 100;
Run Code Online (Sandbox Code Playgroud)
这是我的索引:
CREATE INDEX index_foo_on_name_de_gin ON foo USING gin(f_unaccent(name) gin_trgm_ops) WHERE locale = 'de';
Run Code Online (Sandbox Code Playgroud)
为什么索引被忽略并使用seq scan和/或Bitmap heap scan?如何添加其他索引来解决此问题?
为什么它会重新检查?
Recheck Cond: ((f_unaccent((name)::text) ~~* 'v%'::text) AND ((locale)::text = 'de'::text))
Run Code Online (Sandbox Code Playgroud)
功能f_unaccent
:
CREATE OR REPLACE FUNCTION f_unaccent(text)
RETURNS text AS
$func$
SELECT unaccent('unaccent', $1)
$func$ LANGUAGE sql IMMUTABLE SET search_path = public, pg_temp;
Run Code Online (Sandbox Code Playgroud)
查询计划:
Limit (cost=24412.85..67568.91 rows=100 width=301) (actual time=21838.473..21838.473 rows=0 loops=1)
Buffers: shared hit=1 read=749976
-> Bitmap Heap Scan on foo (cost=24412.85..4595502.73 rows=10592 width=301) (actual time=21838.470..21838.470 rows=0 loops=1)
Recheck Cond: ((f_unaccent((name)::text) ~~* 'v%'::text) AND ((locale)::text = 'de'::text))
Rows Removed by Index Recheck: 5416739
Filter: ((configuration -> 'bar'::text) @> '{"is": ["a"]}'::jsonb)
Rows Removed by Filter: 2196
Heap Blocks: exact=749172
Buffers: shared hit=1 read=749976
-> Bitmap Index Scan on index_foo_on_name_de_gin (cost=0.00..24410.20 rows=10591544 width=0) (actual time=641.532..641.532 rows=5418935 loops=1)
Index Cond: (f_unaccent((name)::text) ~~* 'v%'::text)
Buffers: shared hit=1 read=804
Planning time: 0.767 ms
Execution time: 21838.549 ms
Run Code Online (Sandbox Code Playgroud)
表定义:
Column | Type | Modifiers | Storage | Stats target | Description
---------------+-----------------------------+--------------------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('foo_id_seq'::regclass) | plain | |
locale | character varying | not null | extended | |
name | character varying | not null | extended | |
configuration | jsonb | not null default '{}'::jsonb | extended | |
"index_foo_on_configuration" gin (configuration)
"index_foo_on_name_de_gin" gin (f_unaccent(name::text) gin_trgm_ops) WHERE locale::text = 'de'::text
Run Code Online (Sandbox Code Playgroud)
没有foo.configuration
过滤器,查询速度非常快(1.021 毫秒)。但我需要这个过滤器。这里没有过滤器的查询:
EXPLAIN (ANALYZE, BUFFERS)
SELECT "foo".* FROM "foo" WHERE "foo"."locale" = 'de'
AND f_unaccent(foo.name) ILIKE f_unaccent('v%')
LIMIT 100;
Run Code Online (Sandbox Code Playgroud)
有变化的结果
f_unnacent
功能CREATE INDEX index_foo_on_name_de ON foo (f_unaccent(name) text_pattern_ops) WHERE locale = 'de';
CREATE INDEX index_foo_on_configuration ON foo USING gin(configuration jsonb_path_ops);
一个问题:
EXPLAIN (ANALYZE, BUFFERS)
SELECT "foo".* FROM "foo" WHERE "foo"."locale" = 'de'
AND f_unaccent(foo.name) ILIKE f_unaccent('v%')
AND foo.configuration->'bar' @> '{"0":["s"]}'
LIMIT 100;
Run Code Online (Sandbox Code Playgroud)
A) 查询计划:
Limit (cost=0.00..121248.83 rows=100 width=301) (actual time=16319.267..16319.267 rows=0 loops=1)
Buffers: shared hit=262079 read=1449294
-> Seq Scan on foo (cost=0.00..12842675.96 rows=10592 width=301) (actual time=16319.261..16319.261 rows=0 loops=1)
Filter: (((locale)::text = 'de'::text) AND ((configuration -> 'bar'::text) @> '{"is": ["a"]}'::jsonb) AND (f_unaccent((name)::text) ~~* 'v%'::text))
Rows Removed by Filter: 41227048
Buffers: shared hit=262079 read=1449294
Planning time: 0.765 ms
Execution time: 16319.313 ms and more!!!
Run Code Online (Sandbox Code Playgroud)
B) 无配置查询:
EXPLAIN (ANALYZE, BUFFERS)
SELECT "foo".* FROM "foo" WHERE "foo"."locale" = 'de'
AND f_unaccent(foo.name) ILIKE f_unaccent('v%') LIMIT 100;
Run Code Online (Sandbox Code Playgroud)
B) 查询计划:
Limit (cost=0.00..119.31 rows=100 width=301) (actual time=0.227..2.912 rows=100 loops=1)
Buffers: shared read=31
-> Seq Scan on foo (cost=0.00..12636540.72 rows=10591544 width=301) (actual time=0.221..2.864 rows=100 loops=1)
Filter: (((locale)::text = 'de'::text) AND (f_unaccent((name)::text) ~~* 'v%'::text))
Rows Removed by Filter: 691
Buffers: shared read=31
Planning time: 0.501 ms
Execution time: 2.985 ms
Run Code Online (Sandbox Code Playgroud)
C) 无配置和限制的查询:
EXPLAIN (ANALYZE, BUFFERS)
SELECT "foo".* FROM "foo" WHERE "foo"."locale" = 'de'
AND f_unaccent(foo.name) ILIKE f_unaccent('v%');
Run Code Online (Sandbox Code Playgroud)
C) 查询计划:
Bitmap Heap Scan on foo (cost=346203.46..4864616.26 rows=10591544 width=301) (actual time=23526.443..30050.008 rows=2196 loops=1)
Recheck Cond: ((locale)::text = 'de'::text)
Rows Removed by Index Recheck: 14094842
Filter: (f_unaccent((name)::text) ~~* 'v%'::text)
Rows Removed by Filter: 10781095
Heap Blocks: exact=572873 lossy=847868
Buffers: shared read=1494015
-> Bitmap Index Scan on index_foo_on_name_de (cost=0.00..343555.58 rows=10592603 width=0) (actual time=1788.454..1788.454 rows=10783291 loops=1)
Buffers: shared read=73274
Planning time: 0.528 ms
Execution time: 30050.168 ms
Run Code Online (Sandbox Code Playgroud)
f_unaccent()
似乎您正在使用我在此处定义的函数:
请注意我刚刚进行的更新。这个更好:
CREATE OR REPLACE FUNCTION f_unaccent(text)
RETURNS text AS
$func$
SELECT public.unaccent('public.unaccent', $1) -- schema-qualify function and dictionary
$func$ LANGUAGE sql IMMUTABLE;
Run Code Online (Sandbox Code Playgroud)
详细解释在那边。
为什么它会重新检查?
“重新检查条件:”行始终在位EXPLAIN
图索引扫描的输出中。不要担心。详细解释:
为什么索引被忽略
那是一种误解。您的索引显然不会被忽略。如果 Postgres 希望找到足够多的行,以便必须多次访问主关系中的某些数据页(显然是这种情况rows=10591544
),它会从索引扫描切换到位图索引扫描——然后是“位图堆扫描” " 来获取实际的元组。细节:
使这个查询真正昂贵的是多种不幸因素的组合:
索引 (Buffers: shared hit=1 read=804) 和表 ( Buffers: shared hit=1 read=749976
) 都没有被缓存。如果重复查询向右走,这将是多快,因为这一切是由然后缓存。这是最坏的情况下可能
搜索模式f_unaccent('v%')
- 或者只是三元组索引的'v%'
一个非常糟糕的情况。不是很有选择性 - 但仍然有足够的选择性来使用它而不是实际的顺序扫描。一个text_pattern_ops
指数将是这个快得多。见下文。
更多选择性模式(更长的字符串)也会更快。
你有LIMIT 100
,所以 Postgres 开始乐观地希望能快速找到 100 行。但查询返回 0 行 ( rows=0
)。这意味着 Postgres 必须不成功地遍历所有候选行。另一个最坏的情况。你的第二个谓词是这里的罪魁祸首:
AND foo.configuration->'bar' @> '{"is":["a"]}'
Run Code Online (Sandbox Code Playgroud)
Postgres 只有非常有限的jsonb
列统计信息。它不知道这种情况会有多选择性。如果您对 有很多查询configuration->'bar'
,则可以使用另一个表达式索引大大改善这种情况...
甚至可能是多列索引。
text_pattern_ops
对于左锚定模式(“右端通配符”),您可以不使用三元组索引。但是,如果您在数据库中使用除“C”语言环境(实际上是“无语言环境”)以外的任何语言环境,那么普通的 btree 索引就行不通。否则,您需要特殊的运算符类来忽略语言环境。喜欢:
CREATE INDEX index_foo_name_pattern_ops_de ON foo (f_unaccent(name) text_pattern_ops)
WHERE locale = 'de';
Run Code Online (Sandbox Code Playgroud)
细节:
归档时间: |
|
查看次数: |
3857 次 |
最近记录: |