Sam*_*Sam 7 relational-division redshift
我有一个执行大量重复工作的查询:
SELECT visitor_id, '1'::text AS filter
FROM events
WHERE id IN (SELECT event_id FROM params
WHERE key = 'utm_campaign' AND value = 'campaign_one')
AND id IN (SELECT event_id FROM params
WHERE key = 'utm_source' AND value = 'facebook')
GROUP BY visitor_id
UNION ALL
SELECT visitor_id, '2'::text AS filter
FROM events
WHERE id IN (SELECT event_id FROM params
WHERE key = 'utm_campaign' AND value = 'campaign_two')
AND id IN (SELECT event_id FROM params
WHERE key = 'utm_source' AND value = 'facebook')
GROUP BY visitor_id
Run Code Online (Sandbox Code Playgroud)
如您所见,它对 params 表执行了 4 次不同的过滤。我正在使用 Redshift,虽然它扫描这个表的速度非常快,但我有很多这样的语句UNION
。有没有办法使用CASE
/IF
语句重写 SQL ?
该示例key = 'utm_source' AND value = 'facebook'
在两者中都使用,但这不一定适用于所有选择。
我发现使用 CTE(在 Redshift 中可用)可以对具有相同谓词的重复子查询进行细微的简化:
WITH p2 AS (
SELECT event_id
FROM params
WHERE key = 'utm_source' AND value = 'facebook'
)
SELECT e.visitor_id, '1'::text AS filter
FROM p2
JOIN params p1 USING (event_id)
JOIN events e ON e.id = p2.event_id
WHERE p1.key = 'utm_campaign' AND p1.value = 'campaign_one'
GROUP BY e.visitor_id
UNION ALL
SELECT e.visitor_id, '2'::text AS filter
FROM p2
JOIN params p1 USING (event_id)
JOIN events e ON e.id = p2.event_id
WHERE p1.key = 'utm_campaign' AND p1.value = 'campaign_two'
GROUP BY e.visitor_id;
Run Code Online (Sandbox Code Playgroud)
普通连接也可能比多个IN
半连接更快。
此多列索引应允许仅索引扫描params
:
CREATE INDEX foo_idx ON params (key, value, event_id)
Run Code Online (Sandbox Code Playgroud)
(event_id)
如果您还没有索引,请添加另一个索引。
在这个相关问题下,有关关系划分的可用查询技术的阿森纳:
正如@Andriy 所评论的,我们可以挤出更多:
WITH p2 AS ( -- repeated, immutable filter
SELECT event_id
FROM params
WHERE key = 'utm_source' AND value = 'facebook'
)
, p3 (value, filter) AS ( -- values for variable filter
SELECT text 'campaign_one', text '1'
UNION ALL SELECT 'campaign_two', '2'
)
SELECT e.visitor_id, p3.filter
FROM p3
JOIN params p1 USING (value)
JOIN p2 USING (event_id)
JOIN events e ON e.id = p2.event_id
WHERE p1.key = 'utm_campaign' -- repeated for p1
GROUP BY 1, 2;
Run Code Online (Sandbox Code Playgroud)
在 Postgres 中,我们可以使用更短、更快的VALUES
表达式,但Redshift 目前不支持该功能:
...
, p3 (value, filter) AS (
VALUES
(text 'campaign_one', text '1')
, ( 'campaign_two', '2')
)
...
Run Code Online (Sandbox Code Playgroud)
对于两个人来说SELECT
,UNION
这买不了太多。但这应该是一个实质性的改进——就像你提到的那样。
第二个查询不需要CTE 。您可以简化为:
SELECT e.visitor_id, p3.filter
FROM (
SELECT text 'campaign_one' AS value, text '1' AS filter
UNION ALL SELECT 'campaign_two', '2'
) p3 -- values for variable filter
JOIN params p1 USING (value)
JOIN params p2 USING (event_id)
JOIN events e ON e.id = p2.event_id
WHERE p1.key = 'utm_campaign' -- repeated, immutable filters
AND p2.key = 'utm_source'
AND p2.value = 'facebook'
GROUP BY 1, 2;
Run Code Online (Sandbox Code Playgroud)
通用表表达式的资源(根据评论中的要求):
数据修改 CTE特别有用。例子:
基础知识以及关于此相关答案的添加高级示例:
归档时间: |
|
查看次数: |
351 次 |
最近记录: |