假设我们有一个包含两列的表,一列包含一些人的名字,另一列包含与每个人相关的一些值.一个人可以拥有多个价值.每个值都有一个数字类型.问题是我们要从表中为每个人选择前3个值.如果一个人的值少于3,我们会选择该人的所有值.
如果本文中提供的查询在表中没有重复项,则可以解决此问题.使用SQL从表中的每个组中选择前3个值.但如果有重复,那么解决方案是什么?
例如,如果对于一个名字John,他有5个与他相关的值.它们是20,7,7,7,4.我需要按以下顺序返回名称/值对每个名称的降序值:
-----------+-------+
| name | value |
-----------+-------+
| John | 20 |
| John | 7 |
| John | 7 |
-----------+-------+
Run Code Online (Sandbox Code Playgroud)
John应该只返回3行,即使John有3个7.
ype*_*eᵀᴹ 28
在许多现代DBMS(例如Postgres,Oracle,SQL-Server,DB2和许多其他)中,以下内容都可以正常工作.它使用CTE和排名功能ROW_NUMBER(),它是最新SQL标准的一部分:
WITH cte AS
( SELECT name, value,
ROW_NUMBER() OVER (PARTITION BY name
ORDER BY value DESC
)
AS rn
FROM t
)
SELECT name, value, rn
FROM cte
WHERE rn <= 3
ORDER BY name, rn ;
Run Code Online (Sandbox Code Playgroud)
没有CTE,只有ROW_NUMBER():
SELECT name, value, rn
FROM
( SELECT name, value,
ROW_NUMBER() OVER (PARTITION BY name
ORDER BY value DESC
)
AS rn
FROM t
) tmp
WHERE rn <= 3
ORDER BY name, rn ;
Run Code Online (Sandbox Code Playgroud)
测试中:
在MySQL和其他没有排名功能的DBMS中,必须使用派生表,相关子查询或自连接GROUP BY.
将(tid)被假定为表中的主键:
SELECT t.tid, t.name, t.value, -- self join and GROUP BY
COUNT(*) AS rn
FROM t
JOIN t AS t2
ON t2.name = t.name
AND ( t2.value > t.value
OR t2.value = t.value
AND t2.tid <= t.tid
)
GROUP BY t.tid, t.name, t.value
HAVING COUNT(*) <= 3
ORDER BY name, rn ;
SELECT t.tid, t.name, t.value, rn
FROM
( SELECT t.tid, t.name, t.value,
( SELECT COUNT(*) -- inline, correlated subquery
FROM t AS t2
WHERE t2.name = t.name
AND ( t2.value > t.value
OR t2.value = t.value
AND t2.tid <= t.tid
)
) AS rn
FROM t
) AS t
WHERE rn <= 3
ORDER BY name, rn ;
Run Code Online (Sandbox Code Playgroud)
在MySQL中测试过
我本来打算否决这个问题。然而,我意识到它可能真的需要一个跨数据库的解决方案。
假设您正在寻找一种独立于数据库的方法来执行此操作,我能想到的唯一方法是使用相关子查询(或非等值连接)。这是一个例子:
select distinct t.personid, val, rank
from (select t.*,
(select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
) as rank
from t
) t
where rank in (1, 2, 3)
Run Code Online (Sandbox Code Playgroud)
但是,您提到的每个数据库(我注意到,Hadoop 不是数据库)都有更好的方法来执行此操作。不幸的是,它们都不是标准 SQL。
下面是它在 SQL Server 中工作的示例:
with t as (
select 1 as personid, 5 as val union all
select 1 as personid, 6 as val union all
select 1 as personid, 6 as val union all
select 1 as personid, 7 as val union all
select 1 as personid, 8 as val
)
select distinct t.personid, val, rank
from (select t.*,
(select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
) as rank
from t
) t
where rank in (1, 2, 3);
Run Code Online (Sandbox Code Playgroud)