我正在使用SQL查询来确定几列的z得分(x - μ/σ).
特别是,我有一个如下表:
my_table
id col_a col_b col_c
1 3 6 5
2 5 3 3
3 2 2 9
4 9 8 2
Run Code Online (Sandbox Code Playgroud)
...并且我想根据其列的平均值和标准偏差选择每行中每个数字的z得分.
所以结果看起来像这样:
id col_d col_e col_f
1 -0.4343 1.0203 ...
2 0.1434 -0.8729
3 -0.8234 -1.2323
4 1.889 1.5343
Run Code Online (Sandbox Code Playgroud)
目前我的代码计算两列的分数,如下所示:
select id,
(my_table.col_a - avg(mya.col_a)) / stddev(mya.col_a) as col_d,
(my_table.col_b - avg(myb.col_b)) / stddev(myb.col_b) as col_e,
from my_table,
select col_a from my_table)mya,
select col_b from my_table)myb
group by id;
Run Code Online (Sandbox Code Playgroud)
但是,这非常慢.我一直在等待分钟进行三列查询.
有没有更好的方法来实现这一目标?我正在使用postgres,但任何一般语言都会对我有帮助.谢谢!
Rom*_*kar 15
你可以使用这样的窗口函数:
select
t.id,
(t.col_a - avg(t.col_a) over()) / stdev(t.col_a) over() as col_d,
(t.col_b - avg(t.col_b) over()) / stdev(t.col_b) over() as col_e
from my_table as t
Run Code Online (Sandbox Code Playgroud)
或交叉与预先计算的加入avg和stdev:
select
t.id,
(t.col_a - tt.col_a_avg) / tt.col_a_stdev as col_d,
(t.col_b - tt.col_b_avg) / tt.col_b_stdev as col_e
from my_table as t
cross join (
select
avg(tt.col_a) as col_a_avg,
avg(tt.col_b) as col_b_avg,
stdev(tt.col_a) as col_a_stdev,
stdev(tt.col_b) as col_b_stdev
from my_table as tt
) as tt
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
8153 次 |
| 最近记录: |