尽管统计数据正确,为什么 PostgreSQL 的解释分析会错误地估计行数?

mil*_*la1 7 sql database postgresql postgresql-13

概括

\n

我正在 PostgreSQL 数据库上连接一个大表(约 600K 行)和一个较小的表(约 11K 行),并且需要通过描述性字段过滤结果集text

\n

当按bigint较小表的字段进行过滤时,优化器会正确估计结果行数,但是当按text小表的字段进行过滤时,优化器会低估结果行数数千倍,即使有 1- 1 两者之间的关系。

\n

我无法理解这种行为。

\n

整个环境,包括数据,都可以使用此 Pastebin中的说明进行设置。对于传统的在线模拟数据库来说太大了。

\n

环境

\n
select version();\n\n|version                                                                                            |\n|---------------------------------------------------------------------------------------------------|\n|PostgreSQL 13.6 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit|\n
Run Code Online (Sandbox Code Playgroud)\n

坐在 Azure 灵活服务器上。

\n

结构和数据

\n

表(11228 行):

\n
create table parent_tb as\nselect id, md5(random()::text) descr\nfrom generate_series(1::bigint,11228::bigint) as a(id);\n\nalter table parent_tb add primary key (id);\ncreate index idx_parent_tb_desc on parent_tb(descr);\n
Run Code Online (Sandbox Code Playgroud)\n

样本数据:

\n
select *\nfrom parent_tb\nlimit 3;\n\n|id |descr                           |\n|---|--------------------------------|\n|1  |a34ea09794a959c333116a83b3b77700|\n|2  |6c40f8248d28dc541724e86d17f96775|\n|3  |99f218398cd5125c803edf7eccc9c832|\n
Run Code Online (Sandbox Code Playgroud)\n
\n

表(597057 行):

\n
create table child_tb (\n    id_tenant bigint,\n    id bigint,\n    descr text,\n    id_ref bigint references parent_tb(id) null,\n    primary key (id_tenant, id)\n) partition by list(id_tenant);\n
Run Code Online (Sandbox Code Playgroud)\n

child_tb表针对从到 的每个id_tenant值进行分区:135

\n
create table child_tb_ten_1 partition of child_tb for values in (\'1\');\n-- ...\ncreate table child_tb_ten_1 partition of child_tb for values in (\'35\');\n
Run Code Online (Sandbox Code Playgroud)\n

样本数据:

\n
select *\nfrom child_tb\nlimit 3;\n\n|id_tenant|id |descr                           |id_ref|\n|---------|---|--------------------------------|------|\n|1        |1  |638d2b2aa799d4871e0e2fa73ae607de|1     |\n|1        |2  |f06668f8df7eaee3c539d2a1ba613604|1     |\n|1        |3  |f588557239d37ec301bae79ab9a61742|1     |\n
Run Code Online (Sandbox Code Playgroud)\n

收集测试用例数据

\n

收集统计数据:

\n
set default_statistics_target=10000;\nvacuum analyze parent_tb;\nvacuum analyze child_tb;\n
Run Code Online (Sandbox Code Playgroud)\n
\n

选择一个大的id_tenant

\n
select count(1) n_keys, id_tenant, count(distinct id_ref) n_le, array_agg(distinct (id_ref,p.descr))\nfrom child_tb c join parent_tb p on (c.id_ref=p.id)\ngroup by id_tenant\norder by 1 desc\nlimit 3;\n\n|n_keys|id_tenant|n_le|array_agg                                 |\n|------|---------|----|------------------------------------------|\n|475759|6        |1   |{"(53,6ea8c6d951f3c8371662509ff8a5e37e)"} |\n|18352 |14       |1   |{"(4,a0b360d0344c7018aa98d511392aa26f)"}  |\n|17102 |2        |1   |{"(17,43595b7e1b092d120bbc8a94afca9583)"} |\n
Run Code Online (Sandbox Code Playgroud)\n

租户6显然是全球范围内的异类。

\n

6统计数据显示,最常见值和最常见频率与保留了约 80% 数据的租户相关:

\n
select\n    (most_common_vals::text::numeric[])[1] mcv_1, (most_common_freqs::text::numeric[])[1] mcf_1,\n    (most_common_vals::text::numeric[])[2] mcv_2, (most_common_freqs::text::numeric[])[2] mcf_2,\n    *\nfrom pg_stats where tablename in (\'child_tb\',\'child_tb_ten_6\');\n\n|mcv_1|mcf_1    |mcv_2|mcf_2      |schemaname|tablename     |attname  |\n|-----|---------|-----|-----------|----------|--------------|---------|\n|6    |0.7968402|14   |0.030737434|argodb    |child_tb      |id_tenant|\n|     |         |     |           |argodb    |child_tb      |id       |\n|     |         |     |           |argodb    |child_tb      |descr    |\n|53   |0.7968402|4    |0.030737434|argodb    |child_tb      |id_ref   |\n|6    |1        |     |           |argodb    |child_tb_ten_6|id_tenant|\n|     |         |     |           |argodb    |child_tb_ten_6|id       |\n|     |         |     |           |argodb    |child_tb_ten_6|descr    |\n|53   |1        |     |           |argodb    |child_tb_ten_6|id_ref   |\n
Run Code Online (Sandbox Code Playgroud)\n
\n

从租户获取数据6,该数据将用于过滤最终查询:

\n
select distinct c.id_ref,p.descr\nfrom child_tb c join parent_tb p on (c.id_ref=p.id)\nwhere id_tenant=6;\n\n|id_ref|descr                           |\n|------|--------------------------------|\n|53    |6ea8c6d951f3c8371662509ff8a5e37e|\n
Run Code Online (Sandbox Code Playgroud)\n

行为

\n

请注意:此处显示的每个执行计划都是结果explain analyze,因此代表实际情况。

\n

第一次尝试\xe2\x80\x94 通过父 ID 访问。

\n

结果 OK \xe2\x80\x94 优化器正确识别 JOIN 产生的行数。

\n
explain analyze\nselect * from child_tb c join parent_tb p on (c.id_ref=p.id)\nwhere p.id=53 --fill with the appropriate value at point (1)\nand c.id_tenant=6;\n\n|QUERY PLAN                                                                                                                       |\n|---------------------------------------------------------------------------------------------------------------------------------|\n|Nested Loop  (cost=0.29..17305.28 rows=475759 width=98) (actual time=0.025..88.812 rows=475759 loops=1)                          |\n|  ->  Index Scan using parent_tb_pkey on parent_tb p  (cost=0.29..4.30 rows=1 width=41) (actual time=0.012..0.014 rows=1 loops=1)|\n|        Index Cond: (id = 53)                                                                                                    |\n|  ->  Seq Scan on child_tb_ten_6 c  (cost=0.00..12543.38 rows=475759 width=57) (actual time=0.011..52.590 rows=475759 loops=1)   |\n|        Filter: ((id_ref = 53) AND (id_tenant = 6))                                                                              |\n|Planning Time: 0.183 ms                                                                                                          |\n|Execution Time: 103.176 ms                                                                                                       |\n
Run Code Online (Sandbox Code Playgroud)\n
\n

第二次尝试\xe2\x80\x94 通过父级的描述字段访问。

\n

结果 KO \xe2\x80\x94 优化器将连接产生的记录低估了 1000 倍!

\n
explain analyze\nselect * from child_tb c join parent_tb p on (c.id_ref=p.id)\nwhere p.descr=\'6ea8c6d951f3c8371662509ff8a5e37e\' --fill with the appropriate value at point (1)\nand c.id_tenant=6;\n\n|QUERY PLAN                                                                                                                                       |\n|-------------------------------------------------------------------------------------------------------------------------------------------------|\n|Gather  (cost=1004.32..9413.99 rows=42 width=98) (actual time=0.409..119.637 rows=475759 loops=1)                                                |\n|  Workers Planned: 2                                                                                                                             |\n|  Workers Launched: 2                                                                                                                            |\n|  ->  Hash Join  (cost=4.32..8409.79 rows=18 width=98) (actual time=0.074..44.309 rows=158586 loops=3)                                           |\n|        Hash Cond: (c.id_ref = p.id)                                                                                                             |\n|        ->  Parallel Seq Scan on child_tb_ten_6 c  (cost=0.00..7884.91 rows=198233 width=57) (actual time=0.007..16.501 rows=158586 loops=3)     |\n|              Filter: (id_tenant = 6)                                                                                                            |\n|        ->  Hash  (cost=4.30..4.30 rows=1 width=41) (actual time=0.020..0.021 rows=1 loops=3)                                                    |\n|              Buckets: 1024  Batches: 1  Memory Usage: 9kB                                                                                       |\n|              ->  Index Scan using idx_parent_tb_desc on parent_tb p  (cost=0.29..4.30 rows=1 width=41) (actual time=0.016..0.017 rows=1 loops=3)|\n|                    Index Cond: (descr = \'6ea8c6d951f3c8371662509ff8a5e37e\'::text)                                                               |\n|Planning Time: 0.620 ms                                                                                                                          |\n|Execution Time: 134.484 ms                                                                                                                       |\n
Run Code Online (Sandbox Code Playgroud)\n
\n

其他尝试\xe2\x80\x94 使用物化 CTE 混淆父级的访问值。

\n

优化器假设一个平坦的数据分布,将租户分区的总行数6除以全局的不同数量id_ref。此行为是预期的。

\n
select \n(select count(1) from child_tb where id_tenant=6) cnt_child_6,\n(select count(distinct id_ref) from child_tb) cnt_child_dist_refs,\n(select count(1) from child_tb where id_tenant=6)/(select count(distinct id_ref) from child_tb) cnt_flat_6_distr\n\n|cnt_child_6|cnt_child_dist_refs|cnt_flat_6_distr|\n|-----------|-------------------|----------------|\n|475759     |61                 |7799            |\n
Run Code Online (Sandbox Code Playgroud)\n

结果 OK \xe2\x80\x94 优化器正确假定 JOIN 产生的行数。

\n
explain analyze\nwith par as materialized (\n    select * \n    from parent_tb\n    where descr=\'6ea8c6d951f3c8371662509ff8a5e37e\' --fill with the appropriate value at point (1)\n)\nselect * from child_tb c join par p on (c.id_ref=p.id)\nwhere c.id_tenant=6;\n\n|QUERY PLAN                                                                                                                           |\n|-------------------------------------------------------------------------------------------------------------------------------------|\n|Hash Join  (cost=4.33..13220.41 rows=7799 width=97) (actual time=0.031..124.799 rows=475759 loops=1)                                 |\n|  Hash Cond: (c.id_ref = p.id)                                                                                                       |\n|  CTE par                                                                                                                            |\n|    ->  Index Scan using idx_parent_tb_desc on parent_tb  (cost=0.29..4.30 rows=1 width=41) (actual time=0.014..0.015 rows=1 loops=1)|\n|          Index Cond: (descr = \'6ea8c6d951f3c8371662509ff8a5e37e\'::text)                                                             |\n|  ->  Seq Scan on child_tb_ten_6 c  (cost=0.00..11353.99 rows=475759 width=57) (actual time=0.009..45.301 rows=475759 loops=1)       |\n|        Filter: (id_tenant = 6)                                                                                                      |\n|  ->  Hash  (cost=0.02..0.02 rows=1 width=40) (actual time=0.017..0.018 rows=1 loops=1)                                              |\n|        Buckets: 1024  Batches: 1  Memory Usage: 9kB                                                                                 |\n|        ->  CTE Scan on par p  (cost=0.00..0.02 rows=1 width=40) (actual time=0.015..0.015 rows=1 loops=1)                           |\n|Planning Time: 0.150 ms                                                                                                              |\n|Execution Time: 138.972 ms                                                                                                           |\n\n
Run Code Online (Sandbox Code Playgroud)\n

结论

\n

我无法理解第二次尝试42中优化器估计的行数。我希望它能够实现英国和朝鲜之间的 1-1 关系并使用该关系,就像第一次尝试一样。此外,该值对我来说看起来是任意的,因为我无法弄清楚它的来源,这与额外尝试的估计不同。427799

\n