Kei*_*ith 115 mysql sql greatest-n-per-group
我有一个用户登记和退出时间表("lms_attendance"),如下所示:
id user time io (enum)
1 9 1370931202 out
2 9 1370931664 out
3 6 1370932128 out
4 12 1370932128 out
5 12 1370933037 in
Run Code Online (Sandbox Code Playgroud)
我正在尝试创建一个这个表的视图,它只输出每个用户ID的最新记录,同时给我"in"或"out"值,如下所示:
id user time io
2 9 1370931664 out
3 6 1370932128 out
5 12 1370933037 in
Run Code Online (Sandbox Code Playgroud)
我很接近,到目前为止,但我意识到,意见将不接受subquerys,这使其成为了很多困难.我得到的最接近的查询是:
select
`lms_attendance`.`id` AS `id`,
`lms_attendance`.`user` AS `user`,
max(`lms_attendance`.`time`) AS `time`,
`lms_attendance`.`io` AS `io`
from `lms_attendance`
group by
`lms_attendance`.`user`,
`lms_attendance`.`io`
Run Code Online (Sandbox Code Playgroud)
但我得到的是:
id user time io
3 6 1370932128 out
1 9 1370931664 out
5 12 1370933037 in
4 12 1370932128 out
Run Code Online (Sandbox Code Playgroud)
哪个很接近,但并不完美.我知道通过最后一组不应该存在,但是没有它,它返回最近的时间,而不是与它的相对价值IO.
有任何想法吗?谢谢!
Jus*_*tin 181
查询:
SELECT t1.*
FROM lms_attendance t1
WHERE t1.time = (SELECT MAX(t2.time)
FROM lms_attendance t2
WHERE t2.user = t1.user)
Run Code Online (Sandbox Code Playgroud)
结果:
| ID | USER | TIME | IO |
--------------------------------
| 2 | 9 | 1370931664 | out |
| 3 | 6 | 1370932128 | out |
| 5 | 12 | 1370933037 | in |
Run Code Online (Sandbox Code Playgroud)
每次都能解决的解决方案:
SELECT t1.*
FROM lms_attendance t1
WHERE t1.id = (SELECT t2.id
FROM lms_attendance t2
WHERE t2.user = t1.user
ORDER BY t2.id DESC
LIMIT 1)
Run Code Online (Sandbox Code Playgroud)
TMS*_*TMS 68
不需要尝试重新发明轮子,因为这是常见的最大n组问题.提出了非常好的解决方案.
I prefer the most simplistic solution (see SQLFiddle, updated Justin's) without subqueries (thus easy to use in views):
SELECT t1.*
FROM lms_attendance AS t1
LEFT OUTER JOIN lms_attendance AS t2
ON t1.user = t2.user
AND (t1.time < t2.time
OR (t1.time = t2.time AND t1.Id < t2.Id))
WHERE t2.user IS NULL
Run Code Online (Sandbox Code Playgroud)
This also works in a case where there are two different records with the same greatest value within the same group - thanks to the trick with (t1.time = t2.time AND t1.Id < t2.Id). All I am doing here is to assure that in case when two records of the same user have same time only one is chosen. Doesn't actually matter if the criteria is Id or something else - basically any criteria that is guaranteed to be unique would make the job here.
如果您使用 MySQL 8.0 或更高版本,您可以使用窗口函数:
询问:
SELECT DISTINCT
FIRST_VALUE(ID) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS ID,
FIRST_VALUE(USER) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS USER,
FIRST_VALUE(TIME) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS TIME,
FIRST_VALUE(IO) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS IO
FROM lms_attendance;
Run Code Online (Sandbox Code Playgroud)
结果:
| ID | USER | TIME | IO |
--------------------------------
| 2 | 9 | 1370931664 | out |
| 3 | 6 | 1370932128 | out |
| 5 | 12 | 1370933037 | in |
Run Code Online (Sandbox Code Playgroud)
我认为使用Justin 提出的解决方案的优点是,它使您能够从子查询中选择包含每个用户(或每个 id 或其他任何内容)最新数据的行,而无需中间视图或表。
如果您运行 HANA,速度也会快 7 倍:D
已经解决了,但只是为了记录,另一种方法是创建两个视图......
CREATE TABLE lms_attendance
(id int, user int, time int, io varchar(3));
CREATE VIEW latest_all AS
SELECT la.user, max(la.time) time
FROM lms_attendance la
GROUP BY la.user;
CREATE VIEW latest_io AS
SELECT la.*
FROM lms_attendance la
JOIN latest_all lall
ON lall.user = la.user
AND lall.time = la.time;
INSERT INTO lms_attendance
VALUES
(1, 9, 1370931202, 'out'),
(2, 9, 1370931664, 'out'),
(3, 6, 1370932128, 'out'),
(4, 12, 1370932128, 'out'),
(5, 12, 1370933037, 'in');
SELECT * FROM latest_io;
Run Code Online (Sandbox Code Playgroud)
基于@TMS答案,我喜欢它,因为不需要子查询,但我认为省略该'OR'部分将足以理解和阅读更简单.
SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
ON t1.user = t2.user
AND t1.time < t2.time
WHERE t2.user IS NULL
Run Code Online (Sandbox Code Playgroud)
如果您对具有null时间的行不感兴趣,可以在WHERE子句中过滤它们:
SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
ON t1.user = t2.user
AND t1.time < t2.time
WHERE t2.user IS NULL and t1.time IS NOT NULL
Run Code Online (Sandbox Code Playgroud)