array_agg 包含另一个 array_agg

Question

array_agg 包含另一个 array_agg

Mar*_*cin 3 sql arrays postgresql relational-division

t1
id|entity_type
9|3
9|4
9|5
2|3
2|5
           
t2  
id|entity_type
1|3
1|4
1|5

Run Code Online (Sandbox Code Playgroud)

SELECT t1.id, array_agg(t1.entity_type)
    FROM t1
GROUP BY
    t1.id
HAVING ARRAY_AGG(t1.entity_type ORDER BY t1.entity_type) = 
    (SELECT ARRAY_AGG(t2.entity_type ORDER BY t2.entity_type) 
        FROM t2
    WHERE t2.id = 1
    GROUP BY t2.id);

Run Code Online (Sandbox Code Playgroud)

结果：

t1.id = 9|array_agg{3,4,5}

Run Code Online (Sandbox Code Playgroud)

我有两张桌子t1和t2。我想获取t1.id数组t1.entity_type等于t2.entity_type数组的值。

在这种情况下一切正常。因为t2.id = 1我收到了t1.id = 9。两者都有相同的数组entity_type：{3,4,5}

现在我不仅想要获得t1.id相等的集合，还想要获得较小的集合。如果我t2这样修改：

t2  
id|entity_type
1|3
1|4

Run Code Online (Sandbox Code Playgroud)

并以这种方式修改查询：

SELECT t1.id, array_agg(t1.entity_type)
    FROM t1
GROUP BY
    t1.id
HAVING ARRAY_AGG(t1.entity_type ORDER BY t1.entity_type) >= /*MODIFICATION*/
    (SELECT ARRAY_AGG(t2.entity_type ORDER BY t2.entity_type) 
        FROM t2
    WHERE t2.id = 1
    GROUP BY t2.id);

Run Code Online (Sandbox Code Playgroud)

我没有收到预期的结果：

t1.id = 1 has {3, 4, 5}     
t2.id = 1 has {3, 4}

Run Code Online (Sandbox Code Playgroud)

包含数组 int1的数组应该符合资格。我希望收到第一种情况的结果，但没有得到任何行。有没有类似的方法：包含另一个？t2
ARRAY_AGGARRAY_AGG

Answer 1

Erw*_*ter 5

清理

使用两个不同的调用是低效的array_agg()。使用相同的（ORDER BY在SELECT列表和HAVING子句中）：

SELECT id, array_agg(entity_type ORDER BY entity_type) AS arr
FROM   t1
GROUP  BY 1
HAVING array_agg(entity_type ORDER BY entity_type) = (
   SELECT array_agg(entity_type ORDER BY entity_type)
   FROM   t2
   WHERE  id = 1
   -- GROUP  BY id   -- not needed
   );

Run Code Online (Sandbox Code Playgroud)

手册中的语法基础知识。

“包含”运算符`@>`

就像尼克评论的那样，您的第二个查询将与“数组包含”运算符一起使用@>

SELECT id, array_agg(entity_type ORDER BY entity_type) AS arr
FROM   t1
GROUP  BY 1
HAVING array_agg(entity_type ORDER BY entity_type) @> (
   SELECT array_agg(entity_type ORDER BY entity_type)
   FROM   t2
   WHERE  id = 1
   );

Run Code Online (Sandbox Code Playgroud)

但这对于大表来说效率非常低。

查询速度更快

这个问题可以看作是关系划分的情况。根据您的表定义，有更有效的技术。我们在这个相关问题下收集了整个库：

如何在多通关系中过滤 SQL 结果

假设在两个表中都是唯一的，对于大表来说(id, entity_type)这应该要快得多，特别是因为它可以使用索引t1（与原始查询相反）：

SELECT t1.id
FROM   t2
JOIN   t1 USING (entity_type)
WHERE  t2.id = 1
GROUP  BY 1
HAVING count(*) = (SELECT count(*) FROM t2 WHERE id = 1);

Run Code Online (Sandbox Code Playgroud)

您需要两个索引：

首先t2(id)，通常由主键覆盖。
第二：

CREATE INDEX t1_foo_idx ON t1 (entity_type, id);

Run Code Online (Sandbox Code Playgroud)

添加的id列是可选的，以允许仅索引扫描。列的顺序至关重要：

复合索引也适合第一个字段的查询吗？

小提琴
_老sqlfiddle

归档时间：	11 年，1 月前
查看次数：	1994 次
最近记录：	2 年，6 月前

array_agg 包含另一个 array_agg

清理

“包含”运算符@>

查询速度更快

“包含”运算符`@>`