使用带有非重音和右端通配符的 ILIKE

phl*_*egx 3 postgresql performance index index-tuning postgresql-9.4 postgresql-performance

我使用 Postgresql 9.4，我有一个名为 foo 的大表。我想搜索它，但如果搜索文本很短（例如“v”）或很长（例如“这是一个在表 foo% 上使用 gin 的搜索示例”），我的执行时间会很长。在这种情况下，我的索引被忽略。这是我的搜索查询：

EXPLAIN (ANALYZE, TIMING)
SELECT  "foo".* FROM "foo" WHERE "foo"."locale" = 'de'
AND f_unaccent(foo.name) ILIKE f_unaccent('v%')
AND foo.configuration->'bar' @> '{"is":["a"]}'
LIMIT 100;

Run Code Online (Sandbox Code Playgroud)

这是我的索引：

CREATE INDEX index_foo_on_name_de_gin ON foo USING gin(f_unaccent(name) gin_trgm_ops) WHERE locale = 'de';

Run Code Online (Sandbox Code Playgroud)

为什么索引被忽略并使用seq scan和/或Bitmap heap scan？如何添加其他索引来解决此问题？

为什么它会重新检查？

Recheck Cond: ((f_unaccent((name)::text) ~~* 'v%'::text) AND ((locale)::text = 'de'::text))

Run Code Online (Sandbox Code Playgroud)

功能f_unaccent：

CREATE OR REPLACE FUNCTION f_unaccent(text)
         RETURNS text AS
         $func$
         SELECT unaccent('unaccent', $1)
         $func$  LANGUAGE sql IMMUTABLE SET search_path = public, pg_temp;

Run Code Online (Sandbox Code Playgroud)

查询计划：

 Limit  (cost=24412.85..67568.91 rows=100 width=301) (actual time=21838.473..21838.473 rows=0 loops=1)
   Buffers: shared hit=1 read=749976
   ->  Bitmap Heap Scan on foo  (cost=24412.85..4595502.73 rows=10592 width=301) (actual time=21838.470..21838.470 rows=0 loops=1)
         Recheck Cond: ((f_unaccent((name)::text) ~~* 'v%'::text) AND ((locale)::text = 'de'::text))
         Rows Removed by Index Recheck: 5416739
         Filter: ((configuration -> 'bar'::text) @> '{"is": ["a"]}'::jsonb)
         Rows Removed by Filter: 2196
         Heap Blocks: exact=749172
         Buffers: shared hit=1 read=749976
         ->  Bitmap Index Scan on index_foo_on_name_de_gin  (cost=0.00..24410.20 rows=10591544 width=0) (actual time=641.532..641.532 rows=5418935 loops=1)
               Index Cond: (f_unaccent((name)::text) ~~* 'v%'::text)
               Buffers: shared hit=1 read=804
 Planning time: 0.767 ms
 Execution time: 21838.549 ms

Run Code Online (Sandbox Code Playgroud)

表定义：

    Column     |            Type             |                          Modifiers                           | Storage  | Stats target | Description 
---------------+-----------------------------+--------------------------------------------------------------+----------+--------------+-------------
 id            | integer                     | not null default nextval('foo_id_seq'::regclass)             | plain    |              | 
 locale        | character varying           | not null                                                     | extended |              | 
 name          | character varying           | not null                                                     | extended |              | 
 configuration | jsonb                       | not null default '{}'::jsonb                                 | extended |              | 

"index_foo_on_configuration" gin (configuration)
"index_foo_on_name_de_gin" gin (f_unaccent(name::text) gin_trgm_ops) WHERE locale::text = 'de'::text

Run Code Online (Sandbox Code Playgroud)

没有foo.configuration过滤器，查询速度非常快（1.021 毫秒）。但我需要这个过滤器。这里没有过滤器的查询：

EXPLAIN (ANALYZE, BUFFERS)
SELECT  "foo".* FROM "foo" WHERE "foo"."locale" = 'de'
AND f_unaccent(foo.name) ILIKE f_unaccent('v%')
LIMIT 100;

Run Code Online (Sandbox Code Playgroud)

有变化的结果

更新f_unnacent功能
添加了 btree 索引 CREATE INDEX index_foo_on_name_de ON foo (f_unaccent(name) text_pattern_ops) WHERE locale = 'de';
在配置中添加了 gin 索引 CREATE INDEX index_foo_on_configuration ON foo USING gin(configuration jsonb_path_ops);
删除旧索引

一个问题：

EXPLAIN (ANALYZE, BUFFERS)
SELECT  "foo".* FROM "foo" WHERE "foo"."locale" = 'de' 
AND f_unaccent(foo.name) ILIKE f_unaccent('v%')
AND foo.configuration->'bar' @> '{"0":["s"]}' 
LIMIT 100;

Run Code Online (Sandbox Code Playgroud)

A) 查询计划：

 Limit  (cost=0.00..121248.83 rows=100 width=301) (actual time=16319.267..16319.267 rows=0 loops=1)
   Buffers: shared hit=262079 read=1449294
   ->  Seq Scan on foo  (cost=0.00..12842675.96 rows=10592 width=301) (actual time=16319.261..16319.261 rows=0 loops=1)
         Filter: (((locale)::text = 'de'::text) AND ((configuration -> 'bar'::text) @> '{"is": ["a"]}'::jsonb) AND (f_unaccent((name)::text) ~~* 'v%'::text))
         Rows Removed by Filter: 41227048
         Buffers: shared hit=262079 read=1449294
 Planning time: 0.765 ms
 Execution time: 16319.313 ms and more!!!

Run Code Online (Sandbox Code Playgroud)

B) 无配置查询：

EXPLAIN (ANALYZE, BUFFERS)
SELECT  "foo".* FROM "foo" WHERE "foo"."locale" = 'de' 
AND f_unaccent(foo.name) ILIKE f_unaccent('v%') LIMIT 100;

Run Code Online (Sandbox Code Playgroud)

B) 查询计划：

 Limit  (cost=0.00..119.31 rows=100 width=301) (actual time=0.227..2.912 rows=100 loops=1)
   Buffers: shared read=31
   ->  Seq Scan on foo  (cost=0.00..12636540.72 rows=10591544 width=301) (actual time=0.221..2.864 rows=100 loops=1)
         Filter: (((locale)::text = 'de'::text) AND (f_unaccent((name)::text) ~~* 'v%'::text))
         Rows Removed by Filter: 691
         Buffers: shared read=31
 Planning time: 0.501 ms
 Execution time: 2.985 ms

Run Code Online (Sandbox Code Playgroud)

C) 无配置和限制的查询：

EXPLAIN (ANALYZE, BUFFERS)
SELECT  "foo".* FROM "foo" WHERE "foo"."locale" = 'de' 
AND f_unaccent(foo.name) ILIKE f_unaccent('v%');

Run Code Online (Sandbox Code Playgroud)

C) 查询计划：

 Bitmap Heap Scan on foo  (cost=346203.46..4864616.26 rows=10591544 width=301) (actual time=23526.443..30050.008 rows=2196 loops=1)
   Recheck Cond: ((locale)::text = 'de'::text)
   Rows Removed by Index Recheck: 14094842
   Filter: (f_unaccent((name)::text) ~~* 'v%'::text)
   Rows Removed by Filter: 10781095
   Heap Blocks: exact=572873 lossy=847868
   Buffers: shared read=1494015
   ->  Bitmap Index Scan on index_foo_on_name_de  (cost=0.00..343555.58 rows=10592603 width=0) (actual time=1788.454..1788.454 rows=10783291 loops=1)
         Buffers: shared read=73274
 Planning time: 0.528 ms
 Execution time: 30050.168 ms

Run Code Online (Sandbox Code Playgroud)

1. `f_unaccent()`

似乎您正在使用我在此处定义的函数：

PostgreSQL 是否支持“不区分重音”的排序规则？

请注意我刚刚进行的更新。这个更好：

CREATE OR REPLACE FUNCTION f_unaccent(text)
  RETURNS text AS
$func$
SELECT public.unaccent('public.unaccent', $1)  -- schema-qualify function and dictionary
$func$  LANGUAGE sql IMMUTABLE;

Run Code Online (Sandbox Code Playgroud)

详细解释在那边。

2. 复查

为什么它会重新检查？

“重新检查条件：”行始终在位EXPLAIN图索引扫描的输出中。不要担心。详细解释：

带有位图索引扫描的查询计划中的“重新检查条件：”行

3. 索引和查询计划

为什么索引被忽略

那是一种误解。您的索引显然不会被忽略。如果 Postgres 希望找到足够多的行，以便必须多次访问主关系中的某些数据页（显然是这种情况rows=10591544），它会从索引扫描切换到位图索引扫描——然后是“位图堆扫描” " 来获取实际的元组。细节：

使这个查询真正昂贵的是多种不幸因素的组合：

索引 (Buffers: shared hit=1 read=804) 和表 ( Buffers: shared hit=1 read=749976) 都没有被缓存。如果重复查询向右走，这将是多快，因为这一切是由然后缓存。这是最坏的情况下可能
搜索模式f_unaccent('v%')- 或者只是三元组索引的'v%'一个非常糟糕的情况。不是很有选择性 - 但仍然有足够的选择性来使用它而不是实际的顺序扫描。一个text_pattern_ops指数将是这个快得多。见下文。
更多选择性模式（更长的字符串）也会更快。
你有LIMIT 100，所以 Postgres 开始乐观地希望能快速找到 100 行。但查询返回 0 行 ( rows=0)。这意味着 Postgres 必须不成功地遍历所有候选行。另一个最坏的情况。你的第二个谓词是这里的罪魁祸首：
```
AND foo.configuration->'bar' @> '{"is":["a"]}'
```
Run Code Online (Sandbox Code Playgroud)
Postgres 只有非常有限的jsonb列统计信息。它不知道这种情况会有多选择性。如果您对有很多查询configuration->'bar'，则可以使用另一个表达式索引大大改善这种情况...
- 用于在 JSON 数组中查找元素的索引
甚至可能是多列索引。

4. `text_pattern_ops`

对于左锚定模式（“右端通配符”），您可以不使用三元组索引。但是，如果您在数据库中使用除“C”语言环境（实际上是“无语言环境”）以外的任何语言环境，那么普通的 btree 索引就行不通。否则，您需要特殊的运算符类来忽略语言环境。喜欢：

CREATE INDEX index_foo_name_pattern_ops_de ON foo (f_unaccent(name) text_pattern_ops)
WHERE locale = 'de';

Run Code Online (Sandbox Code Playgroud)

细节：

PostgreSQL 中 LIKE、SIMILAR TO 或正则表达式的模式匹配

归档时间：	9 年，7 月前
查看次数：	3857 次
最近记录：	9 年，7 月前

PostgreSQL 中 LIKE、SIMILAR TO 或正则表达式的模式匹配 123

带有位图索引扫描的查询计划中的“重新检查条件：”行 26

更多相关链接

SQL Server 2016 Enterprise 性能不佳 8

为什么 Oracle 在这里使用索引？ 7

是否有必要清理 PostgreSQL 物化视图？ 6

不能使用 dropdb 删除 PostgreSQL 数据库吗？ 5

自动完成太慢：可能的优化吗？ 3

PostgreSQL 是否支持 ICU 整理的选项和设置？ 3

mysql进程列表中“正在发送数据”是什么意思？ 3

Postgres - 如果正则表达式匹配失败则返回默认值 3

查询 ANY(column) 时不搜索 GIN 索引 2

如何通过在postgresql中指定行号范围来查询行 1

SQL Server MAXDOP 设置算法 74

如何创建 Unicode 参数和变量名称 53

如何获取 SQL Server 表中每行的实际数据大小？ 44

如何跟踪数据库依赖项？ 37

什么时候适合使用 SQL Server Developer Edition？ 34

如何确定PostgreSQL中是否有[空闲连接]未提交的事务？ 33

如果 MongoDB 中插入过多会发生什么？如何确保存储所有数据？ 26

怎么重置sa密码？ 23

这个常量扫描和左外连接在一个简单的 SELECT 查询计划中来自哪里？ 23

如何证明数据库中缺少隐式顺序？ 22