如何从具有重复SQL的SQL表中的每个组中选择前3个值

Pix*_*ech 14 sql

假设我们有一个包含两列的表,一列包含一些人的名字,另一列包含与每个人相关的一些值.一个人可以拥有多个价值.每个值都有一个数字类型.问题是我们要从表中为每个人选择前3个值.如果一个人的值少于3,我们会选择该人的所有值.

如果本文中提供的查询在表中没有重复项,则可以解决此问题.使用SQL从表中的每个组中选择前3个值.但如果有重复,那么解决方案是什么?

例如,如果对于一个名字John,他有5个与他相关的值.它们是20,7,7,7,4.我需要按以下顺序返回名称/值对每个名称的降序值:

-----------+-------+
| name     | value |
-----------+-------+
| John     |    20 |
| John     |     7 |
| John     |     7 |
-----------+-------+
Run Code Online (Sandbox Code Playgroud)

John应该只返回3行,即使John有3个7.

ype*_*eᵀᴹ 28

在许多现代DBMS(例如Postgres,Oracle,SQL-Server,DB2和许多其他)中,以下内容都可以正常工作.它使用CTE和排名功能ROW_NUMBER(),它是最新SQL标准的一部分:

 WITH cte AS
  ( SELECT name, value,
           ROW_NUMBER() OVER (PARTITION BY name
                              ORDER BY value DESC
                             )
             AS rn
    FROM t
  )
SELECT name, value, rn
FROM cte
WHERE rn <= 3
ORDER BY name, rn ;
Run Code Online (Sandbox Code Playgroud)

没有CTE,只有ROW_NUMBER():

SELECT name, value, rn
FROM 
  ( SELECT name, value,
           ROW_NUMBER() OVER (PARTITION BY name
                              ORDER BY value DESC
                             )
             AS rn
    FROM t
  ) tmp 
WHERE rn <= 3
ORDER BY name, rn ; 
Run Code Online (Sandbox Code Playgroud)

测试中:


在MySQL和其他没有排名功能的DBMS中,必须使用派生表,相关子查询或自连接GROUP BY.

(tid)被假定为表中的主键:

SELECT t.tid, t.name, t.value,              -- self join and GROUP BY
       COUNT(*) AS rn
FROM t
  JOIN t AS t2
    ON  t2.name = t.name
    AND ( t2.value > t.value
        OR  t2.value = t.value
        AND t2.tid <= t.tid
        )
GROUP BY t.tid, t.name, t.value
HAVING COUNT(*) <= 3
ORDER BY name, rn ;


SELECT t.tid, t.name, t.value, rn
FROM
  ( SELECT t.tid, t.name, t.value,
           ( SELECT COUNT(*)                -- inline, correlated subquery
             FROM t AS t2
             WHERE t2.name = t.name
              AND ( t2.value > t.value
                 OR  t2.value = t.value
                 AND t2.tid <= t.tid
                  )
           ) AS rn
    FROM t
  ) AS t
WHERE rn <= 3
ORDER BY name, rn ;
Run Code Online (Sandbox Code Playgroud)

MySQL中测试过


Gor*_*off 0

我本来打算否决这个问题。然而,我意识到它可能真的需要一个跨数据库的解决方案。

假设您正在寻找一种独立于数据库的方法来执行此操作,我能想到的唯一方法是使用相关子查询(或非等值连接)。这是一个例子:

select distinct t.personid, val, rank
from (select t.*,
             (select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
             ) as rank
      from t
     ) t
where rank in (1, 2, 3)
Run Code Online (Sandbox Code Playgroud)

但是,您提到的每个数据库(我注意到,Hadoop 不是数据库)都有更好的方法来执行此操作。不幸的是,它们都不是标准 SQL。

下面是它在 SQL Server 中工作的示例:

with t as (
      select 1 as personid, 5 as val union all
      select 1 as personid, 6 as val union all
      select 1 as personid, 6 as val union all
      select 1 as personid, 7 as val union all
      select 1 as personid, 8 as val
     )
select distinct t.personid, val, rank
from (select t.*,
             (select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
             ) as rank
      from t
     ) t
where rank in (1, 2, 3);
Run Code Online (Sandbox Code Playgroud)