Mig*_*ell 28 sql postgresql json distinct aggregate-functions
对于我的问题,我们有一个架构,其中一张照片有很多标签和许多评论.因此,如果我有一个查询,我想要所有的注释和标记,它会将行相乘.因此,如果一张照片有2个标签和13条评论,我会为这张照片获得26行:
SELECT
tag.name,
comment.comment_id
FROM
photo
LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id
LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
Run Code Online (Sandbox Code Playgroud)

这对大多数事情来说都很好,但这意味着如果我GROUP BY和那时json_agg(tag.*),我得到第一个标签的13个副本和第二个标签的13个副本.
SELECT json_agg(tag.name) as tags
FROM
photo
LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id
LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
GROUP BY photo.photo_id
Run Code Online (Sandbox Code Playgroud)

相反,我想要一个只有'郊区'和'城市'的数组,如下所示:
[
{"tag_id":1,"name":"suburban"},
{"tag_id":2,"name":"city"}
]
Run Code Online (Sandbox Code Playgroud)
我可以json_agg(DISTINCT tag.name),但是当我想要整个行作为json时,这只会产生一个标签名称数组.我想json_agg(DISTINCT ON(tag.name) tag.*),但显然这不是有效的SQL.
那么我怎样才能DISTINCT ON在Postgres中的聚合函数内部进行模拟?
Pau*_*rth 19
每当你有一个中心表并希望将它左连接到表A中的许多行并且还将它连接到表B中的许多行时,就会出现重复行的这些问题.它可以特别抛弃聚合函数COUNT,SUM如果你不小心!因此,我认为您需要分别为每张照片构建标签并为每张照片添加评论,然后将它们连接在一起:
WITH tags AS (
SELECT photo.photo_id, json_agg(row_to_json(tag.*)) AS tags
FROM photo
LEFT OUTER JOIN photo_tag on photo_tag.photo_id = photo.photo_id
LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
GROUP BY photo.photo_id
),
comments AS (
SELECT photo.photo_id, json_agg(row_to_json(comment.*)) AS comments
FROM photo
LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
GROUP BY photo.photo_id
)
SELECT COALESCE(tags.photo_id, comments.photo_id) AS photo_id,
tags.tags,
comments.comments
FROM tags
FULL OUTER JOIN comments
ON tags.photo_id = comments.photo_id
Run Code Online (Sandbox Code Playgroud)
编辑:如果你真的想在没有CTE的情况下加入所有东西,看起来它给出了正确的结果:
SELECT photo.photo_id,
to_json(array_agg(DISTINCT tag.*)) AS tags,
to_json(array_agg(DISTINCT comment.*)) AS comments
FROM photo
LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
LEFT OUTER JOIN photo_tag on photo_tag.photo_id = photo.photo_id
LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
GROUP BY photo.photo_id
Run Code Online (Sandbox Code Playgroud)
Eug*_*lev 18
我发现的最简单的事情是使用DISTINCTover jsonb(不是 json!)。(jsonb_build_object创建 jsonb 对象)
SELECT
JSON_AGG(
DISTINCT jsonb_build_object('tag_id', photo_tag.tag_id,
'name', tag.name)) AS tags
FROM photo
LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id
LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
GROUP BY photo.photo_id
Run Code Online (Sandbox Code Playgroud)
Erw*_*ter 15
最便宜和最简单的DISTINCT操作是......首先不要在"代理交叉连接"中乘以行.首先聚合,然后加入.看到:
假设您实际上不想检索整个表,而是一次只检索一个或几个选定的照片,并使用聚合的详细信息,最优雅且可能最快的方法是使用LATERAL子查询:
SELECT *
FROM photo p
CROSS JOIN LATERAL (
SELECT json_agg(c) AS comments
FROM comment c
WHERE photo_id = p.photo_id
) c1
CROSS JOIN LATERAL (
SELECT json_agg(t) AS tags
FROM photo_tag pt
JOIN tag t USING (tag_id)
WHERE pt.photo_id = p.photo_id
) t
WHERE p.photo_id = 2; -- arbitrary selection
Run Code Online (Sandbox Code Playgroud)
这将返回整个行,comment并分别tag聚合到JSON数组中.行不是像您尝试中那样的倍数,但它们只是与基表中的"不同"一样.
要在基础数据中另外折叠重复项,请参见下文.
笔记:
LATERAL并json_agg()要求Postgres 9.3或更高版本.
json_agg(c)是的缩写json_agg(c.*).
我们不需要LEFT JOIN因为聚合函数json_agg()总是返回一行.
通常,您只需要列的子集 - 至少排除冗余photo_id:
SELECT *
FROM photo p
CROSS JOIN LATERAL (
SELECT json_agg(json_build_object('comment_id', comment_id
, 'comment', comment)) AS comments
FROM comment
WHERE photo_id = p.photo_id
) c
CROSS JOIN LATERAL (
SELECT json_agg(t) AS tags
FROM photo_tag pt
JOIN tag t USING (tag_id)
WHERE pt.photo_id = p.photo_id
) t
WHERE p.photo_id = 2;Run Code Online (Sandbox Code Playgroud)
json_build_object()与Postgres 9.4一起介绍.以前在旧版本中很麻烦,因为ROW构造函数不保留列名.但是有一些通用的解决方法:
还允许自由选择JSON密钥名称,您不必坚持列名称.
要返回所有行,这样更有效:
SELECT p.*
, COALESCE(c1.comments, '[]') AS comments
, COALESCE(t.tags, '[]') AS tags
FROM photo p
LEFT JOIN (
SELECT photo_id
, json_agg(json_build_object('comment_id', comment_id
, 'comment', comment)) AS comments
FROM comment c
GROUP BY 1
) c1 USING (photo_id)
LEFT JOIN LATERAL (
SELECT photo_id , json_agg(t) AS tags
FROM photo_tag pt
JOIN tag t USING (tag_id)
GROUP BY 1
) t USING (photo_id);
Run Code Online (Sandbox Code Playgroud)
一旦我们检索到足够的行,这比LATERAL子查询便宜.适用于Postgres 9.3+.
请注意USING连接条件中的子句.这样我们可以方便地SELECT *在外部查询中使用而不会获得重复的列photo_id.我没有SELECT *在这里使用,因为你删除的答案表明你想要空JSON数组而不是NULL没有标签/没有评论.
您不能仅仅json_agg(DISTINCT json_build_object(...))因为数据类型没有相等运算符json.看到:
有各种更好的方法:
SELECT *
FROM photo p
CROSS JOIN LATERAL (
SELECT json_agg(to_json(c1.comment)) AS comments1
, json_agg(json_build_object('comment', c1.comment)) AS comments2
, json_agg(to_json(c1)) AS comments3
FROM (
SELECT DISTINCT c.comment -- folding dupes here
FROM comment c
WHERE c.photo_id = p.photo_id
-- ORDER BY comment -- any particular order?
) c1
) c2
CROSS JOIN LATERAL (
SELECT jsonb_agg(DISTINCT t) AS tags -- demonstrating jsonb_agg
FROM photo_tag pt
JOIN tag t USING (tag_id)
WHERE pt.photo_id = p.photo_id
) t
WHERE p.photo_id = 2;
Run Code Online (Sandbox Code Playgroud)
展示4项不同的技术中comments1,comments2,comments3(冗余)和tags.
db <> fiddle here
Old SQL Fiddle backatch to Postgres 9.3
Old SQL Fiddle for Postgres 9.6
如注释中所述, json_agg 不会将行序列化为对象,而是构建您传递给它的值的 JSON 数组。您需要row_to_json将行转换为 JSON 对象,然后json_agg对数组执行聚合:
SELECT json_agg(DISTINCT row_to_json(comment)) as tags
FROM
photo
LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id
LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
GROUP BY photo.photo_id
Run Code Online (Sandbox Code Playgroud)