use*_*730 8 sql postgresql pivot crosstab
这是我的表'tab_test':
year animal price
2000 kittens 79
2000 kittens 93
2000 kittens 100
2000 puppies 15
2000 puppies 32
2001 kittens 31
2001 kittens 17
2001 puppies 65
2001 puppies 48
2002 kittens 84
2002 kittens 86
2002 puppies 15
2002 puppies 95
2003 kittens 62
2003 kittens 24
2003 puppies 36
2003 puppies 41
2004 kittens 65
2004 kittens 85
2004 puppies 58
2004 puppies 95
2005 kittens 45
2005 kittens 25
2005 puppies 15
2005 puppies 35
2006 kittens 50
2006 kittens 80
2006 puppies 95
2006 puppies 49
2007 kittens 40
2007 kittens 19
2007 puppies 81
2007 puppies 38
2008 kittens 37
2008 kittens 51
2008 puppies 29
2008 puppies 72
2009 kittens 84
2009 kittens 26
2009 puppies 49
2009 puppies 34
2010 kittens 75
2010 kittens 96
2010 puppies 18
2010 puppies 26
2011 kittens 35
2011 kittens 21
2011 puppies 90
2011 puppies 18
2012 kittens 12
2012 kittens 23
2012 puppies 74
2012 puppies 79
Run Code Online (Sandbox Code Playgroud)
这里有一些转换行和列的代码,所以我得到了'小猫'和'小狗'的平均值:
SELECT
year,
AVG(CASE WHEN animal = 'kittens' THEN price END) AS "kittens",
AVG(CASE WHEN animal = 'puppies' THEN price END) AS "puppies"
FROM tab_test
GROUP BY year
ORDER BY year;
Run Code Online (Sandbox Code Playgroud)
上面代码的输出是:
year kittens puppies
2000 90.6666666666667 23.5
2001 24.0 56.5
2002 85.0 55.0
2003 43.0 38.5
2004 75.0 76.5
2005 35.0 25.0
2006 65.0 72.0
2007 29.5 59.5
2008 44.0 50.5
2009 55.0 41.5
2010 85.5 22.0
2011 28.0 54.0
2012 17.5 76.5
Run Code Online (Sandbox Code Playgroud)
我想要的是一个像第二个表的表,但它只包含COUNT()第一个表中至少有3个的项.换句话说,目标是将其作为输出:
year kittens
2000 90.6666666666667
Run Code Online (Sandbox Code Playgroud)
第一张表中至少有3个'小猫'个体.
这在PostgreSQL中是否可行?
Erw*_*ter 12
CASE如果您的案例与演示一样简单,则CASE声明将执行以下操作:
SELECT year
, sum(CASE WHEN animal = 'kittens' THEN price END) AS kittens
, sum(CASE WHEN animal = 'puppies' THEN price END) AS puppies
FROM (
SELECT year, animal, avg(price) AS price
FROM tab_test
GROUP BY year, animal
HAVING count(*) > 2
) t
GROUP BY year
ORDER BY year;
Run Code Online (Sandbox Code Playgroud)
不要紧,你是否使用sum(),max()或min()作为外部查询聚合函数.在这种情况下,它们都会产生相同的值.
crosstab()使用更多类别,crosstab()查询会更简单.对于更大的表,这也应该更快.
您需要安装附加模块tablefunc(每个数据库一次).从Postgres 9.1开始,就像这样简单:
CREATE EXTENSION tablefunc;
Run Code Online (Sandbox Code Playgroud)
这个相关答案的细节:
SELECT * FROM crosstab(
'SELECT year, animal, avg(price) AS price
FROM tab_test
GROUP BY animal, year
HAVING count(*) > 2
ORDER BY 1,2'
,$$VALUES ('kittens'::text), ('puppies')$$)
AS ct ("year" text, "kittens" numeric, "puppies" numeric);
Run Code Online (Sandbox Code Playgroud)
这个没有sqlfiddle,因为该站点不允许使用其他模块.
为了验证我的声明,我在我的小型测试数据库中运行了接近真实数据的快速基准测试.PostgreSQL 9.1.6.测试EXPLAIN ANALYZE,最好的10:
使用10020行测试设置:
CREATE TABLE tab_test (year int, animal text, price numeric);
-- years with lots of rows
INSERT INTO tab_test
SELECT 2000 + ((g + random() * 300))::int/1000
, CASE WHEN (g + (random() * 1.5)::int) %2 = 0 THEN 'kittens' ELSE 'puppies' END
, (random() * 200)::numeric
FROM generate_series(1,10000) g;
-- .. and some years with only few rows to include cases with count < 3
INSERT INTO tab_test
SELECT 2010 + ((g + random() * 10))::int/2
, CASE WHEN (g + (random() * 1.5)::int) %2 = 0 THEN 'kittens' ELSE 'puppies' END
, (random() * 200)::numeric
FROM generate_series(1,20) g;
Run Code Online (Sandbox Code Playgroud)
结果:
@bluefeet
总运行时间:95.401毫秒
@wildplasser(不同的结果,包括行count <= 3)
总运行时间:64.497毫秒
@Andreiy(+ ORDER BY)
&@ Erwin1 - CASE(两者表现相同)
总运行时间:39.105毫秒
@ Erwin2 - crosstab()
总运行时间:17.644毫秒
大比例(但不相关)的结果只有20行.只有@wildplasser的CTE有更多的开销和尖峰.
凭借超过少数行,crosstab()快速领先.@ Andreiy的查询外执行大约相同的我的简化版,聚合函数SELECT(min(),max(),sum())是没有可测量的差异(每组只是两行).
一切都如预期,没有惊喜,采取我的设置并尝试@home.
这是@bluefeet's suggest的替代方案,它有点相似,但避免了连接(相反,上层分组应用于已经分组的结果集):
SELECT
year,
MAX(CASE animal WHEN 'kittens' THEN avg_price END) AS "kittens",
MAX(CASE animal WHEN 'puppies' THEN avg_price END) AS "puppies"
FROM (
SELECT
animal,
year,
COUNT(*) AS cnt,
AVG(Price) AS avg_price
FROM tab_test
GROUP BY
animal,
year
) s
WHERE cnt >= 3
GROUP BY
year
;
Run Code Online (Sandbox Code Playgroud)