计算几列的相应z得分

dmc*_*c7z 5 sql postgresql

我正在使用SQL查询来确定几列的z得分(x - μ/σ).

特别是,我有一个如下表:

my_table
id    col_a  col_b  col_c
1     3      6      5
2     5      3      3
3     2      2      9
4     9      8      2
Run Code Online (Sandbox Code Playgroud)

...并且我想根据其列的平均值和标准偏差选择每行中每个数字的z得分.

所以结果看起来像这样:

id    col_d     col_e     col_f
1    -0.4343    1.0203    ...
2     0.1434   -0.8729
3    -0.8234   -1.2323
4     1.889     1.5343
Run Code Online (Sandbox Code Playgroud)

目前我的代码计算两列的分数,如下所示:

select id,
   (my_table.col_a - avg(mya.col_a)) / stddev(mya.col_a) as col_d,
   (my_table.col_b - avg(myb.col_b)) / stddev(myb.col_b) as col_e, 
from my_table,
select col_a from my_table)mya,
select col_b from my_table)myb
group by id;
Run Code Online (Sandbox Code Playgroud)

但是,这非常慢.我一直在等待分钟进行三列查询.

有没有更好的方法来实现这一目标?我正在使用postgres,但任何一般语言都会对我有帮助.谢谢!

Rom*_*kar 15

你可以使用这样的窗口函数:

select
    t.id,
    (t.col_a - avg(t.col_a) over()) / stdev(t.col_a) over() as col_d,
    (t.col_b - avg(t.col_b) over()) / stdev(t.col_b) over() as col_e
from my_table as t
Run Code Online (Sandbox Code Playgroud)

或交叉与预先计算的加入avgstdev:

select
    t.id,
    (t.col_a - tt.col_a_avg) / tt.col_a_stdev as col_d,
    (t.col_b - tt.col_b_avg) / tt.col_b_stdev as col_e
from my_table as t
    cross join (
        select 
            avg(tt.col_a) as col_a_avg,
            avg(tt.col_b) as col_b_avg,
            stdev(tt.col_a) as col_a_stdev,
            stdev(tt.col_b) as col_b_stdev
        from my_table as tt
   ) as tt
Run Code Online (Sandbox Code Playgroud)

  • 窗口功能.正是我在寻找什么.谢谢! (2认同)