尽管统计数据正确，为什么 PostgreSQL 的解释分析会错误地估计行数？

mil*_*la1 7 sql database postgresql postgresql-13

概括

我正在 PostgreSQL 数据库上连接一个大表（约 600K 行）和一个较小的表（约 11K 行），并且需要通过描述性字段过滤结果集text。

当按bigint较小表的字段进行过滤时，优化器会正确估计结果行数，但是当按text小表的字段进行过滤时，优化器会低估结果行数数千倍，即使有 1- 1 两者之间的关系。

我无法理解这种行为。

整个环境，包括数据，都可以使用此 Pastebin中的说明进行设置。对于传统的在线模拟数据库来说太大了。

环境

select version();\n\n|version                                                                                            |\n|---------------------------------------------------------------------------------------------------|\n|PostgreSQL 13.6 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit|\n

Run Code Online (Sandbox Code Playgroud)\n

坐在 Azure 灵活服务器上。

结构和数据

父表（11228 行）：

create table parent_tb as\nselect id, md5(random()::text) descr\nfrom generate_series(1::bigint,11228::bigint) as a(id);\n\nalter table parent_tb add primary key (id);\ncreate index idx_parent_tb_desc on parent_tb(descr);\n

Run Code Online (Sandbox Code Playgroud)\n

样本数据：

select *\nfrom parent_tb\nlimit 3;\n\n|id |descr                           |\n|---|--------------------------------|\n|1  |a34ea09794a959c333116a83b3b77700|\n|2  |6c40f8248d28dc541724e86d17f96775|\n|3  |99f218398cd5125c803edf7eccc9c832|\n

Run Code Online (Sandbox Code Playgroud)\n

子表（597057 行）：

create table child_tb (\n    id_tenant bigint,\n    id bigint,\n    descr text,\n    id_ref bigint references parent_tb(id) null,\n    primary key (id_tenant, id)\n) partition by list(id_tenant);\n

Run Code Online (Sandbox Code Playgroud)\n

该child_tb表针对从到的每个id_tenant值进行分区：135

create table child_tb_ten_1 partition of child_tb for values in (\'1\');\n-- ...\ncreate table child_tb_ten_1 partition of child_tb for values in (\'35\');\n

Run Code Online (Sandbox Code Playgroud)\n

样本数据：

select *\nfrom child_tb\nlimit 3;\n\n|id_tenant|id |descr                           |id_ref|\n|---------|---|--------------------------------|------|\n|1        |1  |638d2b2aa799d4871e0e2fa73ae607de|1     |\n|1        |2  |f06668f8df7eaee3c539d2a1ba613604|1     |\n|1        |3  |f588557239d37ec301bae79ab9a61742|1     |\n

Run Code Online (Sandbox Code Playgroud)\n

收集测试用例数据

收集统计数据：

set default_statistics_target=10000;\nvacuum analyze parent_tb;\nvacuum analyze child_tb;\n

Run Code Online (Sandbox Code Playgroud)\n

选择一个大的id_tenant：

select count(1) n_keys, id_tenant, count(distinct id_ref) n_le, array_agg(distinct (id_ref,p.descr))\nfrom child_tb c join parent_tb p on (c.id_ref=p.id)\ngroup by id_tenant\norder by 1 desc\nlimit 3;\n\n|n_keys|id_tenant|n_le|array_agg                                 |\n|------|---------|----|------------------------------------------|\n|475759|6        |1   |{"(53,6ea8c6d951f3c8371662509ff8a5e37e)"} |\n|18352 |14       |1   |{"(4,a0b360d0344c7018aa98d511392aa26f)"}  |\n|17102 |2        |1   |{"(17,43595b7e1b092d120bbc8a94afca9583)"} |\n

Run Code Online (Sandbox Code Playgroud)\n

租户6显然是全球范围内的异类。

6统计数据显示，最常见值和最常见频率与保留了约 80% 数据的租户相关：

select\n    (most_common_vals::text::numeric[])[1] mcv_1, (most_common_freqs::text::numeric[])[1] mcf_1,\n    (most_common_vals::text::numeric[])[2] mcv_2, (most_common_freqs::text::numeric[])[2] mcf_2,\n    *\nfrom pg_stats where tablename in (\'child_tb\',\'child_tb_ten_6\');\n\n|mcv_1|mcf_1    |mcv_2|mcf_2      |schemaname|tablename     |attname  |\n|-----|---------|-----|-----------|----------|--------------|---------|\n|6    |0.7968402|14   |0.030737434|argodb    |child_tb      |id_tenant|\n|     |         |     |           |argodb    |child_tb      |id       |\n|     |         |     |           |argodb    |child_tb      |descr    |\n|53   |0.7968402|4    |0.030737434|argodb    |child_tb      |id_ref   |\n|6    |1        |     |           |argodb    |child_tb_ten_6|id_tenant|\n|     |         |     |           |argodb    |child_tb_ten_6|id       |\n|     |         |     |           |argodb    |child_tb_ten_6|descr    |\n|53   |1        |     |           |argodb    |child_tb_ten_6|id_ref   |\n

Run Code Online (Sandbox Code Playgroud)\n

从租户获取数据6，该数据将用于过滤最终查询：

select distinct c.id_ref,p.descr\nfrom child_tb c join parent_tb p on (c.id_ref=p.id)\nwhere id_tenant=6;\n\n|id_ref|descr                           |\n|------|--------------------------------|\n|53    |6ea8c6d951f3c8371662509ff8a5e37e|\n

Run Code Online (Sandbox Code Playgroud)\n

行为

请注意：此处显示的每个执行计划都是结果explain analyze，因此代表实际情况。

第一次尝试\xe2\x80\x94 通过父 ID 访问。

结果 OK \xe2\x80\x94 优化器正确识别 JOIN 产生的行数。

explain analyze\nselect * from child_tb c join parent_tb p on (c.id_ref=p.id)\nwhere p.id=53 --fill with the appropriate value at point (1)\nand c.id_tenant=6;\n\n|QUERY PLAN                                                                                                                       |\n|---------------------------------------------------------------------------------------------------------------------------------|\n|Nested Loop  (cost=0.29..17305.28 rows=475759 width=98) (actual time=0.025..88.812 rows=475759 loops=1)                          |\n|  ->  Index Scan using parent_tb_pkey on parent_tb p  (cost=0.29..4.30 rows=1 width=41) (actual time=0.012..0.014 rows=1 loops=1)|\n|        Index Cond: (id = 53)                                                                                                    |\n|  ->  Seq Scan on child_tb_ten_6 c  (cost=0.00..12543.38 rows=475759 width=57) (actual time=0.011..52.590 rows=475759 loops=1)   |\n|        Filter: ((id_ref = 53) AND (id_tenant = 6))                                                                              |\n|Planning Time: 0.183 ms                                                                                                          |\n|Execution Time: 103.176 ms                                                                                                       |\n

Run Code Online (Sandbox Code Playgroud)\n

第二次尝试\xe2\x80\x94 通过父级的描述字段访问。

结果 KO \xe2\x80\x94 优化器将连接产生的记录低估了 1000 倍！

explain analyze\nselect * from child_tb c join parent_tb p on (c.id_ref=p.id)\nwhere p.descr=\'6ea8c6d951f3c8371662509ff8a5e37e\' --fill with the appropriate value at point (1)\nand c.id_tenant=6;\n\n|QUERY PLAN                                                                                                                                       |\n|-------------------------------------------------------------------------------------------------------------------------------------------------|\n|Gather  (cost=1004.32..9413.99 rows=42 width=98) (actual time=0.409..119.637 rows=475759 loops=1)                                                |\n|  Workers Planned: 2                                                                                                                             |\n|  Workers Launched: 2                                                                                                                            |\n|  ->  Hash Join  (cost=4.32..8409.79 rows=18 width=98) (actual time=0.074..44.309 rows=158586 loops=3)                                           |\n|        Hash Cond: (c.id_ref = p.id)                                                                                                             |\n|        ->  Parallel Seq Scan on child_tb_ten_6 c  (cost=0.00..7884.91 rows=198233 width=57) (actual time=0.007..16.501 rows=158586 loops=3)     |\n|              Filter: (id_tenant = 6)                                                                                                            |\n|        ->  Hash  (cost=4.30..4.30 rows=1 width=41) (actual time=0.020..0.021 rows=1 loops=3)                                                    |\n|              Buckets: 1024  Batches: 1  Memory Usage: 9kB                                                                                       |\n|              ->  Index Scan using idx_parent_tb_desc on parent_tb p  (cost=0.29..4.30 rows=1 width=41) (actual time=0.016..0.017 rows=1 loops=3)|\n|                    Index Cond: (descr = \'6ea8c6d951f3c8371662509ff8a5e37e\'::text)                                                               |\n|Planning Time: 0.620 ms                                                                                                                          |\n|Execution Time: 134.484 ms                                                                                                                       |\n

Run Code Online (Sandbox Code Playgroud)\n

其他尝试\xe2\x80\x94 使用物化 CTE 混淆父级的访问值。

优化器假设一个平坦的数据分布，将租户分区的总行数6除以全局的不同数量id_ref。此行为是预期的。

select \n(select count(1) from child_tb where id_tenant=6) cnt_child_6,\n(select count(distinct id_ref) from child_tb) cnt_child_dist_refs,\n(select count(1) from child_tb where id_tenant=6)/(select count(distinct id_ref) from child_tb) cnt_flat_6_distr\n\n|cnt_child_6|cnt_child_dist_refs|cnt_flat_6_distr|\n|-----------|-------------------|----------------|\n|475759     |61                 |7799            |\n

Run Code Online (Sandbox Code Playgroud)\n

结果 OK \xe2\x80\x94 优化器正确假定 JOIN 产生的行数。

explain analyze\nwith par as materialized (\n    select * \n    from parent_tb\n    where descr=\'6ea8c6d951f3c8371662509ff8a5e37e\' --fill with the appropriate value at point (1)\n)\nselect * from child_tb c join par p on (c.id_ref=p.id)\nwhere c.id_tenant=6;\n\n|QUERY PLAN                                                                                                                           |\n|-------------------------------------------------------------------------------------------------------------------------------------|\n|Hash Join  (cost=4.33..13220.41 rows=7799 width=97) (actual time=0.031..124.799 rows=475759 loops=1)                                 |\n|  Hash Cond: (c.id_ref = p.id)                                                                                                       |\n|  CTE par                                                                                                                            |\n|    ->  Index Scan using idx_parent_tb_desc on parent_tb  (cost=0.29..4.30 rows=1 width=41) (actual time=0.014..0.015 rows=1 loops=1)|\n|          Index Cond: (descr = \'6ea8c6d951f3c8371662509ff8a5e37e\'::text)                                                             |\n|  ->  Seq Scan on child_tb_ten_6 c  (cost=0.00..11353.99 rows=475759 width=57) (actual time=0.009..45.301 rows=475759 loops=1)       |\n|        Filter: (id_tenant = 6)                                                                                                      |\n|  ->  Hash  (cost=0.02..0.02 rows=1 width=40) (actual time=0.017..0.018 rows=1 loops=1)                                              |\n|        Buckets: 1024  Batches: 1  Memory Usage: 9kB                                                                                 |\n|        ->  CTE Scan on par p  (cost=0.00..0.02 rows=1 width=40) (actual time=0.015..0.015 rows=1 loops=1)                           |\n|Planning Time: 0.150 ms                                                                                                              |\n|Execution Time: 138.972 ms                                                                                                           |\n\n

Run Code Online (Sandbox Code Playgroud)\n

结论

我无法理解第二次尝试42中优化器估计的行数。我希望它能够实现英国和朝鲜之间的 1-1 关系并使用该关系，就像第一次尝试一样。此外，该值对我来说看起来是任意的，因为我无法弄清楚它的来源，这与额外尝试的估计不同。427799

归档时间：	3 年，6 月前
查看次数：	729 次
最近记录：	3 年，6 月前