选择第 n 个百分位的行

Question

选择第 n 个百分位的行

我有两张桌子，table1和table2。让这两个表包含日期、ID 和延迟列。

我有一个简单的查询，它对两个表执行连接并返回一组行：

Select table1.date,(table2.latency - table1.latency) as ans from table1, table2
where table1.id = table2.id order by ans;

Run Code Online (Sandbox Code Playgroud)

我需要从返回的行集中找到第 n 个百分位行，假设我需要从数据中找到 90%、99% 和 99.9% 的百分位行。

我需要以这样的形式显示数据：

    date       |   percentile  | ans
    01-12-1995 |    90         | 0.001563
    02-12-1999 |    99         | 0.0015
    05-12-2000 |    99.9       | 0.012

Run Code Online (Sandbox Code Playgroud)

这是我第一次接触 PostgreSQL。我很困惑我应该如何进行。

我正在看PERCENT_RANK()功能。请指导我正确的方向。

Answer 1

Erw*_*ter 5

ntile()在子查询中使用窗口函数（需要 Postgres 8.4 或更高版本）。
然后选择您感兴趣的段（对应于百分位数）并从中选择具有最低值的行：

SELECT DISTINCT ON (segment)
       the_date, to_char((segment - 1)/ 10.0, '99.9') AS percentile, ans
FROM  (
    SELECT t1.the_date 
          ,ntile(1000) OVER (ORDER BY (t2.latency - t1.latency)) AS segment
          ,(t2.latency - t1.latency) AS ans
    FROM   table1 t1
    JOIN   table2 t2 ON t1.id = t2.id
   ) sub
WHERE  segment IN (601, 901, 991, 1000)
ORDER  BY segment, ans;

Run Code Online (Sandbox Code Playgroud)

Postgres-specificDISTINCT ON在最后一步派上用场。在 SO 上的相关答案中进行了详细说明：
选择每个 GROUP BY 组中的第一行？

为了获得90,99和99.9百分位数，我选择了与ntile(1000). 并60根据评论添加了一个百分位数。

此算法选择等于或高于精确值的行。此外，您还可以向子查询添加一行以percent_rank()获取选择行的确切相对排名：

 percent_rank() OVER (ORDER BY (t2.latency - t1.latency)) AS pct_rank

Run Code Online (Sandbox Code Playgroud)

旁白：我将列名替换为date，the_date因为我习惯于避免将保留的 SQL 关键字作为标识符，即使 Postgres 允许它们。

归档时间：	11 年，9 月前
查看次数：	3438 次
最近记录：	11 年，9 月前