如何通过SQL中的另一列选择MAX(列值),DISTINCT的行?

Kap*_*tah 727 mysql sql max distinct greatest-n-per-group

我的表是:

id  home  datetime     player   resource
---|-----|------------|--------|---------
1  | 10  | 04/03/2009 | john   | 399 
2  | 11  | 04/03/2009 | juliet | 244
5  | 12  | 04/03/2009 | borat  | 555
3  | 10  | 03/03/2009 | john   | 300
4  | 11  | 03/03/2009 | juliet | 200
6  | 12  | 03/03/2009 | borat  | 500
7  | 13  | 24/12/2008 | borat  | 600
8  | 13  | 01/01/2009 | borat  | 700
Run Code Online (Sandbox Code Playgroud)

我需要选择每个不同的home持有最大值datetime.

结果将是:

id  home  datetime     player   resource 
---|-----|------------|--------|---------
1  | 10  | 04/03/2009 | john   | 399
2  | 11  | 04/03/2009 | juliet | 244
5  | 12  | 04/03/2009 | borat  | 555
8  | 13  | 01/01/2009 | borat  | 700
Run Code Online (Sandbox Code Playgroud)

我试过了:

-- 1 ..by the MySQL manual: 

SELECT DISTINCT
  home,
  id,
  datetime AS dt,
  player,
  resource
FROM topten t1
WHERE datetime = (SELECT
  MAX(t2.datetime)
FROM topten t2
GROUP BY home)
GROUP BY datetime
ORDER BY datetime DESC
Run Code Online (Sandbox Code Playgroud)

不行.结果集有130行,尽管数据库保持187.结果包括一些副本home.

-- 2 ..join

SELECT
  s1.id,
  s1.home,
  s1.datetime,
  s1.player,
  s1.resource
FROM topten s1
JOIN (SELECT
  id,
  MAX(datetime) AS dt
FROM topten
GROUP BY id) AS s2
  ON s1.id = s2.id
ORDER BY datetime 
Run Code Online (Sandbox Code Playgroud)

不.提供所有记录.

-- 3 ..something exotic: 
Run Code Online (Sandbox Code Playgroud)

有各种结果.

Mic*_*oie 890

你真是太近了!您需要做的就是选择房屋及其最长日期时间,然后再加入topten两个字段的表格:

SELECT tt.*
FROM topten tt
INNER JOIN
    (SELECT home, MAX(datetime) AS MaxDateTime
    FROM topten
    GROUP BY home) groupedtt 
ON tt.home = groupedtt.home 
AND tt.datetime = groupedtt.MaxDateTime
Run Code Online (Sandbox Code Playgroud)

  • 如果有两行具有相同的'home'和'datetime'字段值呢? (29认同)
  • 如果两个相等的最大日期时间在同一个家中(与不同的玩家),则测试它是否为不同的 (5认同)
  • 我认为这样做的经典方法是使用自然连接:"SELECT tt.*FROM topten tt NATURAL JOIN(SELECT home,MAX(datetime)AS datetime FROM topten GROUP BY home)mostrecent;" 完全相同的查询,但可以说更具可读性 (5认同)
  • @Young您的查询的问题是,它可能返回给定家庭的非最大行的"id","player"和"resource",即对于home = 10,您可能得到:`3 | 10 | 04/03/2009 | 约翰| 300`换句话说,它不保证resultset中一行的所有列都属于给定home的max(datetime). (3认同)
  • @me1111您的查询的问题是它可能/可能不会返回给定家庭的第 i max(datetime) 行。原因是 GROUP BY 将获取每个家庭的任何随机行,而 ORDER BY 将仅对 GROUP BY 生成的总体结果进行排序 (2认同)
  • 关于上面@KemalDuran 的评论,如果有两行具有相同的 home 和 datetime 字段,您需要做的是采用 Michael La Voie 的解决方案,并将 `MAX(id) AS MaxID` 添加到内部的 `SELECT`语句,然后在末尾添加另一行“AND tt.id = groupedtt.MaxID”。 (2认同)
  • @IstiaqueAhmed 当我使用这个解决方案并写下评论时,我花了很短的时间才记住我在做什么。@KemalDuran 问_如果有两行具有相同的 'home' 和 'datetime' 字段值呢?_ 我上面的评论添加了第三个分组来解决这个问题,围绕 `id` 字段,它是**绝对**独特的不像说日期时间或播放器。@MichaelLaVoie 的解决方案有 2 个分组,我们中的一些人需要这个解决方案来解决这还不够的情况。 (2认同)

Mak*_*tar 71

这里是T-SQL版本:

-- Test data
DECLARE @TestTable TABLE (id INT, home INT, date DATETIME, 
  player VARCHAR(20), resource INT)
INSERT INTO @TestTable
SELECT 1, 10, '2009-03-04', 'john', 399 UNION
SELECT 2, 11, '2009-03-04', 'juliet', 244 UNION
SELECT 5, 12, '2009-03-04', 'borat', 555 UNION
SELECT 3, 10, '2009-03-03', 'john', 300 UNION
SELECT 4, 11, '2009-03-03', 'juliet', 200 UNION
SELECT 6, 12, '2009-03-03', 'borat', 500 UNION
SELECT 7, 13, '2008-12-24', 'borat', 600 UNION
SELECT 8, 13, '2009-01-01', 'borat', 700

-- Answer
SELECT id, home, date, player, resource 
FROM (SELECT id, home, date, player, resource, 
    RANK() OVER (PARTITION BY home ORDER BY date DESC) N
    FROM @TestTable
)M WHERE N = 1

-- and if you really want only home with max date
SELECT T.id, T.home, T.date, T.player, T.resource 
    FROM @TestTable T
INNER JOIN 
(   SELECT TI.id, TI.home, TI.date, 
        RANK() OVER (PARTITION BY TI.home ORDER BY TI.date) N
    FROM @TestTable TI
    WHERE TI.date IN (SELECT MAX(TM.date) FROM @TestTable TM)
)TJ ON TJ.N = 1 AND T.id = TJ.id
Run Code Online (Sandbox Code Playgroud)

编辑
不幸的是,MySQL中没有RANK()OVER函数.
但它可以模拟,请参阅使用MySQL模拟分析(AKA排名)函数.
所以这是MySQL版本:

SELECT id, home, date, player, resource 
FROM TestTable AS t1 
WHERE 
    (SELECT COUNT(*) 
            FROM TestTable AS t2 
            WHERE t2.home = t1.home AND t2.date > t1.date
    ) = 0
Run Code Online (Sandbox Code Playgroud)

  • 啊,所以你正在使用MySQL.这就是你应该从中开始的!我会尽快更新答案. (2认同)
  • BUG:将"RANK()"替换为"ROW_NUMBER()".如果你有一个平局(由一个重复的日期值引起)你将有两个记录,其中"1"代表N. (2认同)

axi*_*iac 70

最快的MySQL解决方案,没有内部查询,没有GROUP BY:

SELECT m.*                    -- get the row that contains the max value
FROM topten m                 -- "m" from "max"
    LEFT JOIN topten b        -- "b" from "bigger"
        ON m.home = b.home    -- match "max" row with "bigger" row by `home`
        AND m.datetime < b.datetime           -- want "bigger" than "max"
WHERE b.datetime IS NULL      -- keep only if there is no bigger than max
Run Code Online (Sandbox Code Playgroud)

说明:

使用home列加入表格.使用LEFT JOIN确保表m中的所有行都出现在结果集中.那些在表中没有匹配的那些b将具有NULLs的列b.

要求的另一个条件是JOIN仅匹配列b中具有更大值datetime的行而不是来自行的行m.

使用问题中发布的数据,LEFT JOIN将产生这样的对:

+------------------------------------------+--------------------------------+
|              the row from `m`            |    the matching row from `b`   |
|------------------------------------------|--------------------------------|
| id  home  datetime     player   resource | id    home   datetime      ... |
|----|-----|------------|--------|---------|------|------|------------|-----|
| 1  | 10  | 04/03/2009 | john   | 399     | NULL | NULL | NULL       | ... | *
| 2  | 11  | 04/03/2009 | juliet | 244     | NULL | NULL | NULL       | ... | *
| 5  | 12  | 04/03/2009 | borat  | 555     | NULL | NULL | NULL       | ... | *
| 3  | 10  | 03/03/2009 | john   | 300     | 1    | 10   | 04/03/2009 | ... |
| 4  | 11  | 03/03/2009 | juliet | 200     | 2    | 11   | 04/03/2009 | ... |
| 6  | 12  | 03/03/2009 | borat  | 500     | 5    | 12   | 04/03/2009 | ... |
| 7  | 13  | 24/12/2008 | borat  | 600     | 8    | 13   | 01/01/2009 | ... |
| 8  | 13  | 01/01/2009 | borat  | 700     | NULL | NULL | NULL       | ... | *
+------------------------------------------+--------------------------------+
Run Code Online (Sandbox Code Playgroud)

最后,该WHERE子句仅保留NULL在列中具有s的对b(它们*在上表中标记); 这意味着,由于该JOIN子句的第二个条件,从中选择的行在列中m具有最大值datetime.

阅读SQL Antipatterns:避免数据库编程的陷阱,以获取其他SQL技巧.

  • 这是最好的答案,如果您显示执行计划,您将看到此查询少一步 (7认同)

Qua*_*noi 27

这将工作,即使你有两个或多个行的每个home具有相同DATETIME的:

SELECT id, home, datetime, player, resource
FROM   (
       SELECT (
              SELECT  id
              FROM    topten ti
              WHERE   ti.home = t1.home
              ORDER BY
                      ti.datetime DESC
              LIMIT 1
              ) lid
       FROM   (
              SELECT  DISTINCT home
              FROM    topten
              ) t1
       ) ro, topten t2
WHERE  t2.id = ro.lid
Run Code Online (Sandbox Code Playgroud)


Ric*_*ras 25

我想这会给你想要的结果:

SELECT   home, MAX(datetime)
FROM     my_table
GROUP BY home
Run Code Online (Sandbox Code Playgroud)

如果您还需要其他列,只需与原始表进行连接(查看Michael La Voie答案)

最好的祝福.

  • 他还需要其他专栏. (8认同)
  • id,home,datetime,player,resource (4认同)

小智 16

由于人们似乎继续遇到这个帖子(评论日期范围从1.5年)不是这么简单:

SELECT * FROM (SELECT * FROM topten ORDER BY datetime DESC) tmp GROUP BY home

不需要聚合功能......

干杯.

  • 这似乎不起作用.错误消息:列'x'在选择列表中无效,因为它不包含在聚合函数或GROUP BY子句中. (5认同)
  • 如果您有非聚合列(在 MySQL 中),则这种直接方法不起作用。 (2认同)

Shi*_*iva 10

您也可以尝试这一个,对于大型表,查询性能会更好.它适用于每个家庭的记录不超过两个,并且它们的日期不同.更好的一般MySQL查询是上面的Michael La Voie的一个.

SELECT t1.id, t1.home, t1.date, t1.player, t1.resource
FROM   t_scores_1 t1 
INNER JOIN t_scores_1 t2
   ON t1.home = t2.home
WHERE t1.date > t2.date
Run Code Online (Sandbox Code Playgroud)

或者在Postgres或那些提供分析功能的dbs的情况下尝试

SELECT t.* FROM 
(SELECT t1.id, t1.home, t1.date, t1.player, t1.resource
  , row_number() over (partition by t1.home order by t1.date desc) rw
 FROM   topten t1 
 INNER JOIN topten t2
   ON t1.home = t2.home
 WHERE t1.date > t2.date 
) t
WHERE t.rw = 1
Run Code Online (Sandbox Code Playgroud)


Fer*_*anB 8

这适用于Oracle:

with table_max as(
  select id
       , home
       , datetime
       , player
       , resource
       , max(home) over (partition by home) maxhome
    from table  
)
select id
     , home
     , datetime
     , player
     , resource
  from table_max
 where home = maxhome
Run Code Online (Sandbox Code Playgroud)

  • 这是如何选择最大日期时间的?他要求按家庭分组,并选择最大日期时间。我不明白这是怎么做到的。 (3认同)

Kap*_*tah 7

SELECT  tt.*
FROM    TestTable tt 
INNER JOIN 
        (
        SELECT  coord, MAX(datetime) AS MaxDateTime 
        FROM    rapsa 
        GROUP BY
                krd 
        ) groupedtt
ON      tt.coord = groupedtt.coord
        AND tt.datetime = groupedtt.MaxDateTime
Run Code Online (Sandbox Code Playgroud)


Sys*_*gon 7

试试这个SQL Server:

WITH cte AS (
   SELECT home, MAX(year) AS year FROM Table1 GROUP BY home
)
SELECT * FROM Table1 a INNER JOIN cte ON a.home = cte.home AND a.year = cte.year
Run Code Online (Sandbox Code Playgroud)


Jr.*_*Jr. 5

SELECT c1, c2, c3, c4, c5 FROM table1 WHERE c3 = (select max(c3) from table)

SELECT * FROM table1 WHERE c3 = (select max(c3) from table1)
Run Code Online (Sandbox Code Playgroud)


Jas*_*Heo 5

这是 MySQL 版本,它只打印一个条目,其中一组中有重复的 MAX(datetime)。

你可以在这里测试http://www.sqlfiddle.com/#!2/0a4ae/1

样本数据

mysql> SELECT * from topten;
+------+------+---------------------+--------+----------+
| id   | home | datetime            | player | resource |
+------+------+---------------------+--------+----------+
|    1 |   10 | 2009-04-03 00:00:00 | john   |      399 |
|    2 |   11 | 2009-04-03 00:00:00 | juliet |      244 |
|    3 |   10 | 2009-03-03 00:00:00 | john   |      300 |
|    4 |   11 | 2009-03-03 00:00:00 | juliet |      200 |
|    5 |   12 | 2009-04-03 00:00:00 | borat  |      555 |
|    6 |   12 | 2009-03-03 00:00:00 | borat  |      500 |
|    7 |   13 | 2008-12-24 00:00:00 | borat  |      600 |
|    8 |   13 | 2009-01-01 00:00:00 | borat  |      700 |
|    9 |   10 | 2009-04-03 00:00:00 | borat  |      700 |
|   10 |   11 | 2009-04-03 00:00:00 | borat  |      700 |
|   12 |   12 | 2009-04-03 00:00:00 | borat  |      700 |
+------+------+---------------------+--------+----------+
Run Code Online (Sandbox Code Playgroud)

带有用户变量的 MySQL 版本

SELECT *
FROM (
    SELECT ord.*,
        IF (@prev_home = ord.home, 0, 1) AS is_first_appear,
        @prev_home := ord.home
    FROM (
        SELECT t1.id, t1.home, t1.player, t1.resource
        FROM topten t1
        INNER JOIN (
            SELECT home, MAX(datetime) AS mx_dt
            FROM topten
            GROUP BY home
          ) x ON t1.home = x.home AND t1.datetime = x.mx_dt
        ORDER BY home
    ) ord, (SELECT @prev_home := 0, @seq := 0) init
) y
WHERE is_first_appear = 1;
+------+------+--------+----------+-----------------+------------------------+
| id   | home | player | resource | is_first_appear | @prev_home := ord.home |
+------+------+--------+----------+-----------------+------------------------+
|    9 |   10 | borat  |      700 |               1 |                     10 |
|   10 |   11 | borat  |      700 |               1 |                     11 |
|   12 |   12 | borat  |      700 |               1 |                     12 |
|    8 |   13 | borat  |      700 |               1 |                     13 |
+------+------+--------+----------+-----------------+------------------------+
4 rows in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)

接受的答案的输出

SELECT tt.*
FROM topten tt
INNER JOIN
    (
    SELECT home, MAX(datetime) AS MaxDateTime
    FROM topten
    GROUP BY home
) groupedtt ON tt.home = groupedtt.home AND tt.datetime = groupedtt.MaxDateTime
+------+------+---------------------+--------+----------+
| id   | home | datetime            | player | resource |
+------+------+---------------------+--------+----------+
|    1 |   10 | 2009-04-03 00:00:00 | john   |      399 |
|    2 |   11 | 2009-04-03 00:00:00 | juliet |      244 |
|    5 |   12 | 2009-04-03 00:00:00 | borat  |      555 |
|    8 |   13 | 2009-01-01 00:00:00 | borat  |      700 |
|    9 |   10 | 2009-04-03 00:00:00 | borat  |      700 |
|   10 |   11 | 2009-04-03 00:00:00 | borat  |      700 |
|   12 |   12 | 2009-04-03 00:00:00 | borat  |      700 |
+------+------+---------------------+--------+----------+
7 rows in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)


M K*_*aid 5

另一种使用子查询来计算每组最近行的方法,该子查询基本上计算每组每行的排名,然后像 rank = 1 一样过滤掉最近的行

select a.*
from topten a
where (
  select count(*)
  from topten b
  where a.home = b.home
  and a.`datetime` < b.`datetime`
) +1 = 1
Run Code Online (Sandbox Code Playgroud)

演示

这是每行排名 no的视觉演示,以便更好地理解

通过阅读一些评论,如果有两行具有相同的“home”和“datetime”字段值呢?

上述查询将失败,并会在上述情况下返回超过 1 行。为了掩盖这种情况,将需要另一个标准/参数/列来决定应采用哪一行,哪一行属于上述情况。通过查看示例数据集,我假设有一个主键列id应该设置为自动递增。因此,我们可以使用此列通过在CASE语句的帮助下调整相同的查询来选择最近的行

select a.*
from topten a
where (
  select count(*)
  from topten b
  where a.home = b.home
  and  case 
       when a.`datetime` = b.`datetime`
       then a.id < b.id
       else a.`datetime` < b.`datetime`
       end
) + 1 = 1
Run Code Online (Sandbox Code Playgroud)

演示

以上查询将在相同的datetime值中选择具有最高 id 的行

每行排名的视觉演示