Wil*_*iam 0 sql t-sql aggregate-functions percentile sql-server-2008-r2
我正在使用SQL Server 2008 R2.我需要计算每组的百分位值,例如:
SELECT id,
PCTL(0.9, x) -- for the 90th percentile
FROM my_table
GROUP BY id
ORDER BY id
Run Code Online (Sandbox Code Playgroud)
例如,给定这个DDL(小提琴)---
CREATE TABLE my_table (id INT, x REAL);
INSERT INTO my_table
VALUES (7, 0.164595), (5, 0.671311), (7, 0.0118385), (6, 0.704592), (3, 0.633521), (3, 0.337268), (0, 0.54739), (6, 0.312282), (0, 0.220618), (7, 0.214973), (6, 0.410768), (7, 0.151572), (7, 0.0639506), (5, 0.339075), (1, 0.284094), (2, 0.126722), (2, 0.870079), (3, 0.369366), (1, 0.6687), (5, 0.199456), (5, 0.0296715), (1, 0.330339), (9, 0.0000459612), (5, 0.391947), (3, 0.753965), (8, 0.334207), (7, 0.583357), (3, 0.326951), (4, 0.207057), (2, 0.258463), (2, 0.0532811), (1, 0.751584), (7, 0.592624), (7, 0.673506), (5, 0.44764), (6, 0.733737), (5, 0.141215), (7, 0.222452), (3, 0.597019), (1, 0.293901), (4, 0.516213), (7, 0.498336), (6, 0.410461), (2, 0.32211), (1, 0.466735), (5, 0.720456), (8, 0.000428383), (3, 0.46085), (0, 0.402963), (7, 0.677002), (0, 0.400122), (1, 0.762357), (9, 0.158455), (7, 0.359723), (4, 0.225914), (7, 0.795345), (6, 0.902261), (2, 0.69533), (8, 0.593605), (6, 0.266233), (0, 0.917188), (9, 0.96353), (2, 0.577035), (8, 0.945236), (3, 0.257776), (4, 0.560569), (0, 0.838326), (2, 0.660338), (2, 0.537372), (8, 0.33806), (0, 0.545107), (1, 0.616673), (5, 0.30411), (0, 0.434737), (2, 0.588249), (9, 0.991362), (8, 0.772253), (6, 0.705396), (5, 0.323255), (8, 0.830319), (3, 0.679546), (4, 0.399748), (4, 0.440115), (6, 0.938154), (8, 0.333143), (9, 0.923541), (7, 0.19552), (4, 0.869822), (7, 0.620006), (4, 0.833529), (4, 0.297515), (4, 0.19906), (5, 0.540905), (9, 0.33313), (5, 0.200515), (5, 0.900481), (6, 0.02665), (3, 0.495421), (0, 0.96582), (9, 0.847218);
Run Code Online (Sandbox Code Playgroud)
---我想要大致(在常见百分位方法的变化范围内)以下内容:
id x
----------
0 0.9658
1 0.7624
2 0.6953
3 0.6795
4 0.8335
5 0.7205
6 0.9023
7 0.677
8 0.9452
9 0.9914
Run Code Online (Sandbox Code Playgroud)
实际输入集有大约两百万行,每个实际id
组有几十到几百(或可能更多)行.
我已经探索了SO和其他网站的解决方案,但似乎我检查的几十个页面的解决方案仅适用于计算整个行集的百分位数而不是行集的每个组/分区.(我对SQL相对缺乏经验,所以我可能忽略了一些东西.)
我还查看了排名函数的文档,但是我无法将一个可行的查询粘合在一起.
我想使用PERCENTILE_DISC或PERCENTILE_CONT,但我现在仍然坚持使用2008 R2.
我喜欢使用row_number()
/ rank()
和窗口函数直接进行这些计算.内置函数很有用,但它们实际上并没有节省太多的精力:
SELECT id,
MIN(CASE WHEN seqnum >= 0.9 * cnt THEN x END) as percentile_90
FROM (select t.*,
row_number() over (partition by id order by x) as seqnum,
count(*) over (partition by id) as cnt
from my_table t
) t
GROUP BY id
ORDER BY id;
Run Code Online (Sandbox Code Playgroud)
这将获取第90个百分点或更高的第一个值.这有变化可以做连续版本(取最小值小于或等于,最小值大于和插值).