我发现奇怪的事情,查询:
SELECT *
FROM progress as pp
ALL LEFT JOIN links as ll USING (viewId)
WHERE viewId = 'a776a2f2-16ad-448a-858d-891e68bec9a8'
Run Code Online (Sandbox Code Playgroud)
结果:0 rows in set. Elapsed: 5.267 sec. Processed 8.62 million rows, 484.94 MB (1.64 million rows/s., 92.08 MB/s.)
这里修改了查询:
SELECT *
FROM
(SELECT *
FROM progress
WHERE viewId = 'a776a2f2-16ad-448a-858d-891e68bec9a8') AS p ALL
LEFT JOIN
(SELECT *
FROM links
WHERE viewId = toUUID('a776a2f2-16ad-448a-858d-891e68bec9a8')) AS l ON p.viewId = l.viewId;
Run Code Online (Sandbox Code Playgroud)
结果 :0 rows in set. Elapsed: 0.076 sec. Processed 4.48 million rows, 161.35 MB (58.69 million rows/s., 2.12 GB/s.)
但看起来很脏。
难道不应该考虑where条件来优化查询吗?
在此处编写查询的正确方法是什么?如果在何处编写查询又如何?
然后我尝试添加另一个连接:
SELECT *
FROM
(SELECT videoUuid AS contentUuid,
viewId
FROM
(SELECT *
FROM progress
WHERE viewId = 'a776a2f2-16ad-448a-858d-891e68bec9a8') p ALL
LEFT JOIN
(SELECT *
FROM links
WHERE viewId = toUUID('a776a2f2-16ad-448a-858d-891e68bec9a8')) USING `viewId`) ALL
LEFT JOIN `metaInfo` USING `viewId`,
`contentUuid`;
Run Code Online (Sandbox Code Playgroud)
考虑到我只想将 3 个表与条件选择一行连接起来,结果又非常慢:
0 rows in set. Elapsed: 1.747 sec. Processed 9.13 million rows, 726.55 MB (5.22 million rows/s., 415.85 MB/s.)
目前,CH 还不能很好地处理多连接查询(DB 星型模式),并且查询优化器还不够好,无法完全依赖它。
\n因此它需要明确说明如何使用子查询而不是联接来“执行”查询。
\n考虑测试查询:
\nSELECT table_01.number AS r\nFROM numbers(87654321) AS table_01\n INNER JOIN numbers(7654321) AS table_02 ON (table_01.number = table_02.number)\n INNER JOIN numbers(654321) AS table_03 ON (table_02.number = table_03.number)\n INNER JOIN numbers(54321) AS table_04 ON (table_03.number = table_04.number)\nWHERE r = 54320\n/*\n\xe2\x94\x8c\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80r\xe2\x94\x80\xe2\x94\x90\n\xe2\x94\x82 54320 \xe2\x94\x82\n\xe2\x94\x94\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x98\n\n1 rows in set. Elapsed: 6.261 sec. Processed 96.06 million rows, 768.52 MB (15.34 million rows/s., 122.74 MB/s.)\n*/\nRun Code Online (Sandbox Code Playgroud)\n让我们使用子查询重写它以显着加快速度。
\nSELECT number AS r\nFROM numbers(87654321)\nWHERE r = 54320 AND number IN (\n SELECT number AS r\n FROM numbers(7654321)\n WHERE r = 54320 AND number IN (\n SELECT number AS r\n FROM numbers(654321)\n WHERE r = 54320 AND number IN (\n SELECT number AS r\n FROM numbers(54321)\n WHERE r = 54320\n )\n )\n)\n/*\n\xe2\x94\x8c\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80r\xe2\x94\x80\xe2\x94\x90\n\xe2\x94\x82 54320 \xe2\x94\x82\n\xe2\x94\x94\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x98\n\n1 rows in set. Elapsed: 0.481 sec. Processed 96.06 million rows, 768.52 MB (199.69 million rows/s., 1.60 GB/s.)\n*/\nRun Code Online (Sandbox Code Playgroud)\n还有其他方法可以优化JOIN:
\n使用外部字典摆脱“小”表上的连接
\n使用连接表引擎
\n使用ANY严格性
\n使用特定设置,如join_algorithm、partial_merge_join_optimizations等
\n一些有用的参考:
\nAltinity 网络研讨会:每个 ClickHouse 用户都应该知道的提示和技巧
\nAltinity 网络研讨会:ClickHouse 查询性能的秘密
\n