选择具有每个用户最近日期的行

Kei*_*ith 115 mysql sql greatest-n-per-group

我有一个用户登记和退出时间表("lms_attendance"),如下所示:

id  user    time    io (enum)
1   9   1370931202  out
2   9   1370931664  out
3   6   1370932128  out
4   12  1370932128  out
5   12  1370933037  in
Run Code Online (Sandbox Code Playgroud)

我正在尝试创建一个这个表的视图,它只输出每个用户ID的最新记录,同时给我"in"或"out"值,如下所示:

id  user    time    io
2   9   1370931664  out
3   6   1370932128  out
5   12  1370933037  in
Run Code Online (Sandbox Code Playgroud)

我很接近,到目前为止,但我意识到,意见将不接受subquerys,这使其成为了很多困难.我得到的最接近的查询是:

select 
    `lms_attendance`.`id` AS `id`,
    `lms_attendance`.`user` AS `user`,
    max(`lms_attendance`.`time`) AS `time`,
    `lms_attendance`.`io` AS `io` 
from `lms_attendance` 
group by 
    `lms_attendance`.`user`, 
    `lms_attendance`.`io`
Run Code Online (Sandbox Code Playgroud)

但我得到的是:

id  user    time    io
3   6   1370932128  out
1   9   1370931664  out
5   12  1370933037  in
4   12  1370932128  out
Run Code Online (Sandbox Code Playgroud)

哪个很接近,但并不完美.我知道通过最后一组不应该存在,但是没有它,它返回最近的时间,而不是与它的相对价值IO.

有任何想法吗?谢谢!

Jus*_*tin 181

查询:

SQLFIDDLEExample

SELECT t1.*
FROM lms_attendance t1
WHERE t1.time = (SELECT MAX(t2.time)
                 FROM lms_attendance t2
                 WHERE t2.user = t1.user)
Run Code Online (Sandbox Code Playgroud)

结果:

| ID | USER |       TIME |  IO |
--------------------------------
|  2 |    9 | 1370931664 | out |
|  3 |    6 | 1370932128 | out |
|  5 |   12 | 1370933037 |  in |
Run Code Online (Sandbox Code Playgroud)

每次都能解决的解决方案:

SQLFIDDLEExample

SELECT t1.*
FROM lms_attendance t1
WHERE t1.id = (SELECT t2.id
                 FROM lms_attendance t2
                 WHERE t2.user = t1.user            
                 ORDER BY t2.id DESC
                 LIMIT 1)
Run Code Online (Sandbox Code Playgroud)

  • 不需要子查询!此外,这个解决方案[如果有两个记录具有完全相同的时间则不起作用](http://sqlfiddle.com/#!2/dbb2d/2).没有必要每次尝试重新发明轮子,因为这是常见的问题 - 相反,去找已经测试和优化的解决方案 - @Prodikl看到我的答案. (4认同)
  • @TMS如果记录具有完全相同的时间,则此解决方案可以工作,因为查询正在查找具有最大id的记录.这意味着表中的时间是插入时间,这可能不是一个好的假设.您的解决方案会比较时间戳,并且当两个时间戳相同时,您也会返回具有最大ID的行.因此,您的解决方案还假定此表中的时间戳与插入顺序相关,这是两个查询的最大缺陷. (3认同)
  • 哇!这不仅有效,我被允许使用此查询创建一个视图,即使它包含子查询.之前,当我试图创建一个包含子查询的视图时,它没有让我.是否存在规则,为什么允许这样做但另一个不允许? (2认同)

TMS*_*TMS 68

不需要尝试重新发明轮子,因为这是常见的最大n组问题.提出了非常好的解决方案.

I prefer the most simplistic solution (see SQLFiddle, updated Justin's) without subqueries (thus easy to use in views):

SELECT t1.*
FROM lms_attendance AS t1
LEFT OUTER JOIN lms_attendance AS t2
  ON t1.user = t2.user 
        AND (t1.time < t2.time 
         OR (t1.time = t2.time AND t1.Id < t2.Id))
WHERE t2.user IS NULL
Run Code Online (Sandbox Code Playgroud)

This also works in a case where there are two different records with the same greatest value within the same group - thanks to the trick with (t1.time = t2.time AND t1.Id < t2.Id). All I am doing here is to assure that in case when two records of the same user have same time only one is chosen. Doesn't actually matter if the criteria is Id or something else - basically any criteria that is guaranteed to be unique would make the job here.

  • `WHERE t2.user IS NULL`有点奇怪.这条线扮演什么角色? (4认同)
  • 贾斯汀发布的已接受答案可能更优化。接受的答案对表的主键使用向后索引扫描,然后是限制,然后是表的顺序扫描。因此,可以通过附加索引来极大地优化已接受的答案。该查询也可以通过索引进行优化,因为它执行两次序列扫描,但还包括序列扫描结果的哈希和“哈希反连接”以及另一个序列扫描的哈希。我对哪种方法真正更优化的解释感兴趣。 (2认同)

whm*_*hme 6

如果您使用 MySQL 8.0 或更高版本,您可以使用窗口函数

询问:

DBFiddle示例

SELECT DISTINCT
FIRST_VALUE(ID) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS ID,
FIRST_VALUE(USER) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS USER,
FIRST_VALUE(TIME) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS TIME,
FIRST_VALUE(IO) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS IO
FROM lms_attendance;
Run Code Online (Sandbox Code Playgroud)

结果:

| ID | USER |       TIME |  IO |
--------------------------------
|  2 |    9 | 1370931664 | out |
|  3 |    6 | 1370932128 | out |
|  5 |   12 | 1370933037 |  in |
Run Code Online (Sandbox Code Playgroud)

我认为使用Justin 提出的解决方案的优点是,它使您能够从子查询中选择包含每个用户(或每个 id 或其他任何内容)最新数据的行,而无需中间视图或表。

如果您运行 HANA,速度也会快 7 倍:D


dav*_*mos 5

已经解决了,但只是为了记录,另一种方法是创建两个视图......

CREATE TABLE lms_attendance
(id int, user int, time int, io varchar(3));

CREATE VIEW latest_all AS
SELECT la.user, max(la.time) time
FROM lms_attendance la 
GROUP BY la.user;

CREATE VIEW latest_io AS
SELECT la.* 
FROM lms_attendance la
JOIN latest_all lall 
    ON lall.user = la.user
    AND lall.time = la.time;

INSERT INTO lms_attendance 
VALUES
(1, 9, 1370931202, 'out'),
(2, 9, 1370931664, 'out'),
(3, 6, 1370932128, 'out'),
(4, 12, 1370932128, 'out'),
(5, 12, 1370933037, 'in');

SELECT * FROM latest_io;
Run Code Online (Sandbox Code Playgroud)

单击此处查看它在 SQL Fiddle 上的运行情况


use*_*210 5

基于@TMS答案,我喜欢它,因为不需要子查询,但我认为省略该'OR'部分将足以理解和阅读更简单.

SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
  ON t1.user = t2.user 
        AND t1.time < t2.time
WHERE t2.user IS NULL
Run Code Online (Sandbox Code Playgroud)

如果您对具有null时间的行不感兴趣,可以在WHERE子句中过滤它们:

SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
  ON t1.user = t2.user 
        AND t1.time < t2.time
WHERE t2.user IS NULL and t1.time IS NOT NULL
Run Code Online (Sandbox Code Playgroud)