mil*_*la1 7 sql database postgresql postgresql-13
我正在 PostgreSQL 数据库上连接一个大表(约 600K 行)和一个较小的表(约 11K 行),并且需要通过描述性字段过滤结果集text。
当按bigint较小表的字段进行过滤时,优化器会正确估计结果行数,但是当按text小表的字段进行过滤时,优化器会低估结果行数数千倍,即使有 1- 1 两者之间的关系。
我无法理解这种行为。
\n整个环境,包括数据,都可以使用此 Pastebin中的说明进行设置。对于传统的在线模拟数据库来说太大了。
\nselect version();\n\n|version |\n|---------------------------------------------------------------------------------------------------|\n|PostgreSQL 13.6 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit|\nRun Code Online (Sandbox Code Playgroud)\n坐在 Azure 灵活服务器上。
\n父表(11228 行):
\ncreate table parent_tb as\nselect id, md5(random()::text) descr\nfrom generate_series(1::bigint,11228::bigint) as a(id);\n\nalter table parent_tb add primary key (id);\ncreate index idx_parent_tb_desc on parent_tb(descr);\nRun Code Online (Sandbox Code Playgroud)\n样本数据:
\nselect *\nfrom parent_tb\nlimit 3;\n\n|id |descr |\n|---|--------------------------------|\n|1 |a34ea09794a959c333116a83b3b77700|\n|2 |6c40f8248d28dc541724e86d17f96775|\n|3 |99f218398cd5125c803edf7eccc9c832|\nRun Code Online (Sandbox Code Playgroud)\n子表(597057 行):
\ncreate table child_tb (\n id_tenant bigint,\n id bigint,\n descr text,\n id_ref bigint references parent_tb(id) null,\n primary key (id_tenant, id)\n) partition by list(id_tenant);\nRun Code Online (Sandbox Code Playgroud)\n该child_tb表针对从到 的每个id_tenant值进行分区:135
create table child_tb_ten_1 partition of child_tb for values in (\'1\');\n-- ...\ncreate table child_tb_ten_1 partition of child_tb for values in (\'35\');\nRun Code Online (Sandbox Code Playgroud)\n样本数据:
\nselect *\nfrom child_tb\nlimit 3;\n\n|id_tenant|id |descr |id_ref|\n|---------|---|--------------------------------|------|\n|1 |1 |638d2b2aa799d4871e0e2fa73ae607de|1 |\n|1 |2 |f06668f8df7eaee3c539d2a1ba613604|1 |\n|1 |3 |f588557239d37ec301bae79ab9a61742|1 |\nRun Code Online (Sandbox Code Playgroud)\n收集统计数据:
\nset default_statistics_target=10000;\nvacuum analyze parent_tb;\nvacuum analyze child_tb;\nRun Code Online (Sandbox Code Playgroud)\n选择一个大的id_tenant:
select count(1) n_keys, id_tenant, count(distinct id_ref) n_le, array_agg(distinct (id_ref,p.descr))\nfrom child_tb c join parent_tb p on (c.id_ref=p.id)\ngroup by id_tenant\norder by 1 desc\nlimit 3;\n\n|n_keys|id_tenant|n_le|array_agg |\n|------|---------|----|------------------------------------------|\n|475759|6 |1 |{"(53,6ea8c6d951f3c8371662509ff8a5e37e)"} |\n|18352 |14 |1 |{"(4,a0b360d0344c7018aa98d511392aa26f)"} |\n|17102 |2 |1 |{"(17,43595b7e1b092d120bbc8a94afca9583)"} |\nRun Code Online (Sandbox Code Playgroud)\n租户6显然是全球范围内的异类。
6统计数据显示,最常见值和最常见频率与保留了约 80% 数据的租户相关:
select\n (most_common_vals::text::numeric[])[1] mcv_1, (most_common_freqs::text::numeric[])[1] mcf_1,\n (most_common_vals::text::numeric[])[2] mcv_2, (most_common_freqs::text::numeric[])[2] mcf_2,\n *\nfrom pg_stats where tablename in (\'child_tb\',\'child_tb_ten_6\');\n\n|mcv_1|mcf_1 |mcv_2|mcf_2 |schemaname|tablename |attname |\n|-----|---------|-----|-----------|----------|--------------|---------|\n|6 |0.7968402|14 |0.030737434|argodb |child_tb |id_tenant|\n| | | | |argodb |child_tb |id |\n| | | | |argodb |child_tb |descr |\n|53 |0.7968402|4 |0.030737434|argodb |child_tb |id_ref |\n|6 |1 | | |argodb |child_tb_ten_6|id_tenant|\n| | | | |argodb |child_tb_ten_6|id |\n| | | | |argodb |child_tb_ten_6|descr |\n|53 |1 | | |argodb |child_tb_ten_6|id_ref |\nRun Code Online (Sandbox Code Playgroud)\n从租户获取数据6,该数据将用于过滤最终查询:
select distinct c.id_ref,p.descr\nfrom child_tb c join parent_tb p on (c.id_ref=p.id)\nwhere id_tenant=6;\n\n|id_ref|descr |\n|------|--------------------------------|\n|53 |6ea8c6d951f3c8371662509ff8a5e37e|\nRun Code Online (Sandbox Code Playgroud)\n请注意:此处显示的每个执行计划都是结果explain analyze,因此代表实际情况。
第一次尝试\xe2\x80\x94 通过父 ID 访问。
\n结果 OK \xe2\x80\x94 优化器正确识别 JOIN 产生的行数。
\nexplain analyze\nselect * from child_tb c join parent_tb p on (c.id_ref=p.id)\nwhere p.id=53 --fill with the appropriate value at point (1)\nand c.id_tenant=6;\n\n|QUERY PLAN |\n|---------------------------------------------------------------------------------------------------------------------------------|\n|Nested Loop (cost=0.29..17305.28 rows=475759 width=98) (actual time=0.025..88.812 rows=475759 loops=1) |\n| -> Index Scan using parent_tb_pkey on parent_tb p (cost=0.29..4.30 rows=1 width=41) (actual time=0.012..0.014 rows=1 loops=1)|\n| Index Cond: (id = 53) |\n| -> Seq Scan on child_tb_ten_6 c (cost=0.00..12543.38 rows=475759 width=57) (actual time=0.011..52.590 rows=475759 loops=1) |\n| Filter: ((id_ref = 53) AND (id_tenant = 6)) |\n|Planning Time: 0.183 ms |\n|Execution Time: 103.176 ms |\nRun Code Online (Sandbox Code Playgroud)\n第二次尝试\xe2\x80\x94 通过父级的描述字段访问。
\n结果 KO \xe2\x80\x94 优化器将连接产生的记录低估了 1000 倍!
\nexplain analyze\nselect * from child_tb c join parent_tb p on (c.id_ref=p.id)\nwhere p.descr=\'6ea8c6d951f3c8371662509ff8a5e37e\' --fill with the appropriate value at point (1)\nand c.id_tenant=6;\n\n|QUERY PLAN |\n|-------------------------------------------------------------------------------------------------------------------------------------------------|\n|Gather (cost=1004.32..9413.99 rows=42 width=98) (actual time=0.409..119.637 rows=475759 loops=1) |\n| Workers Planned: 2 |\n| Workers Launched: 2 |\n| -> Hash Join (cost=4.32..8409.79 rows=18 width=98) (actual time=0.074..44.309 rows=158586 loops=3) |\n| Hash Cond: (c.id_ref = p.id) |\n| -> Parallel Seq Scan on child_tb_ten_6 c (cost=0.00..7884.91 rows=198233 width=57) (actual time=0.007..16.501 rows=158586 loops=3) |\n| Filter: (id_tenant = 6) |\n| -> Hash (cost=4.30..4.30 rows=1 width=41) (actual time=0.020..0.021 rows=1 loops=3) |\n| Buckets: 1024 Batches: 1 Memory Usage: 9kB |\n| -> Index Scan using idx_parent_tb_desc on parent_tb p (cost=0.29..4.30 rows=1 width=41) (actual time=0.016..0.017 rows=1 loops=3)|\n| Index Cond: (descr = \'6ea8c6d951f3c8371662509ff8a5e37e\'::text) |\n|Planning Time: 0.620 ms |\n|Execution Time: 134.484 ms |\nRun Code Online (Sandbox Code Playgroud)\n其他尝试\xe2\x80\x94 使用物化 CTE 混淆父级的访问值。
\n优化器假设一个平坦的数据分布,将租户分区的总行数6除以全局的不同数量id_ref。此行为是预期的。
select \n(select count(1) from child_tb where id_tenant=6) cnt_child_6,\n(select count(distinct id_ref) from child_tb) cnt_child_dist_refs,\n(select count(1) from child_tb where id_tenant=6)/(select count(distinct id_ref) from child_tb) cnt_flat_6_distr\n\n|cnt_child_6|cnt_child_dist_refs|cnt_flat_6_distr|\n|-----------|-------------------|----------------|\n|475759 |61 |7799 |\nRun Code Online (Sandbox Code Playgroud)\n结果 OK \xe2\x80\x94 优化器正确假定 JOIN 产生的行数。
\nexplain analyze\nwith par as materialized (\n select * \n from parent_tb\n where descr=\'6ea8c6d951f3c8371662509ff8a5e37e\' --fill with the appropriate value at point (1)\n)\nselect * from child_tb c join par p on (c.id_ref=p.id)\nwhere c.id_tenant=6;\n\n|QUERY PLAN |\n|-------------------------------------------------------------------------------------------------------------------------------------|\n|Hash Join (cost=4.33..13220.41 rows=7799 width=97) (actual time=0.031..124.799 rows=475759 loops=1) |\n| Hash Cond: (c.id_ref = p.id) |\n| CTE par |\n| -> Index Scan using idx_parent_tb_desc on parent_tb (cost=0.29..4.30 rows=1 width=41) (actual time=0.014..0.015 rows=1 loops=1)|\n| Index Cond: (descr = \'6ea8c6d951f3c8371662509ff8a5e37e\'::text) |\n| -> Seq Scan on child_tb_ten_6 c (cost=0.00..11353.99 rows=475759 width=57) (actual time=0.009..45.301 rows=475759 loops=1) |\n| Filter: (id_tenant = 6) |\n| -> Hash (cost=0.02..0.02 rows=1 width=40) (actual time=0.017..0.018 rows=1 loops=1) |\n| Buckets: 1024 Batches: 1 Memory Usage: 9kB |\n| -> CTE Scan on par p (cost=0.00..0.02 rows=1 width=40) (actual time=0.015..0.015 rows=1 loops=1) |\n|Planning Time: 0.150 ms |\n|Execution Time: 138.972 ms |\n\nRun Code Online (Sandbox Code Playgroud)\n我无法理解第二次尝试42中优化器估计的行数。我希望它能够实现英国和朝鲜之间的 1-1 关系并使用该关系,就像第一次尝试一样。此外,该值对我来说看起来是任意的,因为我无法弄清楚它的来源,这与额外尝试的估计不同。427799