如下表所示。
make | model | engine | cars_checked | avg_mileage
---------------------------------------|--------
suzuki | sx4 | petrol | 11 | 12
suzuki | sx4 | diesel | 150 | 16
suzuki | swift | petrol | 140 | 15
suzuki | swift | diesel | 18 | 19
toyota | prius | petrol | 16 | 17
toyota | prius | hybrid | 250 | 24
Run Code Online (Sandbox Code Playgroud)
所需的输出是
无法做到简单,group by因为cars_checked需要考虑每条记录()的样本数权重,以避免平均值平均值的问题。
什么是实现它的正确方法?有没有办法考虑样本数量以进行加权平均group by?
更新 -为上面的#1添加了输出格式作为示例
engine | mileage_by_engine
--------------------------
petrol | xx.z
diesel | yy.z
Run Code Online (Sandbox Code Playgroud)
SELECT engine, SUM(cars_checked * avg_mileage) / SUM(cars_checked) AS avgMilageByEngine
FROM [YOUR_TABLE]
GROUP BY engine
SELECT make, SUM(cars_checked * avg_mileage) / SUM(cars_checked) AS avgMilageByMake
FROM [YOUR_TABLE]
GROUP BY make
SELECT model, SUM(cars_checked * avg_mileage) / SUM(cars_checked) AS avgMilageByModel
FROM [YOUR_TABLE]
GROUP BY model
Run Code Online (Sandbox Code Playgroud)
简化查询的一种方法是使用grouping sets:
select engine, make, model,
sum(cars_check * avg_mileage) / sum(cars_checked) as avgMilage
from t
group by grouping sets ((engine), (make), (model));
Run Code Online (Sandbox Code Playgroud)
输出格式仅在未聚合的列中具有非 NULL 值。
有更好的方法。创建聚合函数。操作方法如下。
CREATE OR REPLACE FUNCTION public.numeric_weighted_average_accum(
"Previous" numeric[],
"ThisDatum" numeric,
"ThisWeight" numeric)
RETURNS numeric[]
LANGUAGE 'sql'
COST 100
VOLATILE STRICT PARALLEL UNSAFE
AS $BODY$
SELECT ARRAY["Previous"[1] + ("ThisDatum" * "ThisWeight"), "Previous"[2] + "ThisWeight"];
$BODY$;
CREATE OR REPLACE FUNCTION numeric_weighted_average_final(
"NWA" numeric[])
RETURNS numeric
LANGUAGE 'sql'
COST 100
VOLATILE STRICT PARALLEL UNSAFE
AS $BODY$
SELECT "NWA"[1] / "NWA"[2];
$BODY$;
CREATE OR REPLACE AGGREGATE weighted_average(datum numeric, weight numeric) (
SFUNC = numeric_weighted_average_accum,
STYPE = numeric[] ,
FINALFUNC = numeric_weighted_average_final,
FINALFUNC_MODIFY = READ_ONLY,
INITCOND = '{0,0}',
MFINALFUNC_MODIFY = READ_ONLY
);
Run Code Online (Sandbox Code Playgroud)
然后,你可以做
SELECT name, weighted_average(avgcolumn, weightcolumn) AS "WeightedAverage" GROUP BY name;
Run Code Online (Sandbox Code Playgroud)
我确信这里还有提高效率的空间,并且很高兴听到这些信息。
哈特哈,
| 归档时间: |
|
| 查看次数: |
1823 次 |
| 最近记录: |