Str*_*667 13 postgresql join aggregate
架构:
CREATE TABLE "items" (
"id" SERIAL NOT NULL PRIMARY KEY,
"country" VARCHAR(2) NOT NULL,
"created" TIMESTAMP WITH TIME ZONE NOT NULL,
"price" NUMERIC(11, 2) NOT NULL
);
CREATE TABLE "payments" (
"id" SERIAL NOT NULL PRIMARY KEY,
"created" TIMESTAMP WITH TIME ZONE NOT NULL,
"amount" NUMERIC(11, 2) NOT NULL,
"item_id" INTEGER NULL
);
CREATE TABLE "extras" (
"id" SERIAL NOT NULL PRIMARY KEY,
"created" TIMESTAMP WITH TIME ZONE NOT NULL,
"amount" NUMERIC(11, 2) NOT NULL,
"item_id" INTEGER NULL
);
Run Code Online (Sandbox Code Playgroud)
数据:
INSERT INTO items VALUES
(1, 'CZ', '2016-11-01', 100),
(2, 'CZ', '2016-11-02', 100),
(3, 'PL', '2016-11-03', 20),
(4, 'CZ', '2016-11-04', 150)
;
INSERT INTO payments VALUES
(1, '2016-11-01', 60, 1),
(2, '2016-11-01', 60, 1),
(3, '2016-11-02', 100, 2),
(4, '2016-11-03', 25, 3),
(5, '2016-11-04', 150, 4)
;
INSERT INTO extras VALUES
(1, '2016-11-01', 5, 1),
(2, '2016-11-02', 1, 2),
(3, '2016-11-03', 2, 3),
(4, '2016-11-03', 3, 3),
(5, '2016-11-04', 5, 4)
;
Run Code Online (Sandbox Code Playgroud)
所以,我们有:
现在我想得到以下问题的答案:
使用以下查询(SQLFiddle):
SELECT
country AS "group_by",
COUNT(DISTINCT items.id) AS "item_count",
SUM(items.price) AS "cost",
SUM(payments.amount) AS "earned",
SUM(extras.amount) AS "extra_earned"
FROM items
LEFT OUTER JOIN payments ON (items.id = payments.item_id)
LEFT OUTER JOIN extras ON (items.id = extras.item_id)
GROUP BY 1;
Run Code Online (Sandbox Code Playgroud)
结果是错误的:
group_by | item_count | cost | earned | extra_earned
----------+------------+--------+--------+--------------
CZ | 3 | 450.00 | 370.00 | 16.00
PL | 1 | 40.00 | 50.00 | 5.00
Run Code Online (Sandbox Code Playgroud)
CZ 的成本和额外收入无效 - 450 而非 350 和 16 而非 11。PL 的成本和收入也是无效的 - 它们加倍。
我知道,如果LEFT OUTER JOIN
items.id = 1(其他匹配项依此类推),项目会有2行,但我不知道如何构建正确的查询。
问题:
PostgreSQL 版本:9.6.1
Erw*_*ter 12
由于可以有多个payments
和多个extras
per item
,您会在这两个表之间遇到“代理交叉连接”。item_id
在加入之前聚合行item
,它应该都是正确的:
SELECT i.country AS group_by
, COUNT(*) AS item_count
, SUM(i.price) AS cost
, SUM(p.sum_amount) AS earned
, SUM(e.sum_amount) AS extra_earned
FROM items i
LEFT JOIN (
SELECT item_id, SUM(amount) AS sum_amount
FROM payments
GROUP BY 1
) p ON p.item_id = i.id
LEFT JOIN (
SELECT item_id, SUM(amount) AS sum_amount
FROM extras
GROUP BY 1
) e ON e.item_id = i.id
GROUP BY 1;
Run Code Online (Sandbox Code Playgroud)
考虑“鱼市”的例子:
准确地说,SUM(i.price)
加入单个 n 表后将是不正确的,该表将每个价格乘以相关行的数量。做两次只会让情况变得更糟 - 而且可能会导致计算成本高昂。
哦,因为我们现在不乘以行items
,所以我们可以使用更便宜的count(*)
而不是count(DISTINCT i.id)
. (id
是NOT NULL PRIMARY KEY
。)
items.created
?这取决于。我们可以对payments.created
和应用相同的过滤器extras.created
吗?
如果是,那么只需在子查询中添加过滤器。(在这种情况下似乎不太可能。)
如果没有,但我们仍在选择大多数项目,则上述查询仍然是最有效的。连接中消除了子查询中的一些聚合,但这仍然比更复杂的查询便宜。
如果不是,并且我们选择了一小部分项目,我建议使用相关子查询或LATERAL
连接。例子:
归档时间: |
|
查看次数: |
29764 次 |
最近记录: |