在准备面试时,我遇到了一个SQL问题,我希望能够对如何更好地回答它有所了解.
给定时间戳,userid,如何确定一周内每天活跃的用户数量?
这个问题很少,但这就是我面前的问题.
我将基于对我最有意义的内容以及如果问题与此处相同的方式我会回复的方式来展示这样的想法:
首先,让我们假设一个数据集,我们将命名表logins:
+---------+---------------------+
| user_id | login_timestamp |
+---------+---------------------+
| 1 | 2015-09-29 14:05:05 |
| 2 | 2015-09-29 14:05:08 |
| 1 | 2015-09-29 14:05:12 |
| 4 | 2015-09-22 14:05:18 |
| ... | ... |
+---------+---------------------+
Run Code Online (Sandbox Code Playgroud)
可能还有其他列,但我们不介意.
首先,我们应该确定那周的边界,我们可以使用ADDDATE().结合今天的日期 - 今天的工作日(MySQL DAYOFWEEK())的想法,是星期日的日期.
例如:如果今天是星期三Wed - 3 = Sun,那么10 - 3 = 7,我们可以预期星期日是第7天.
我们可以得到WeekStart和WeekEnd时间戳是这样的:
SELECT
DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 1-DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 00:00:00") WeekStart,
DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 7-DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 23:59:59") WeekEnd;
Run Code Online (Sandbox Code Playgroud)
注意:在PostgreSQL中有一个DATE_TRUNC()函数,它返回指定时间单位的开头,给定一个日期,如周开始,月,小时等.但这在MySQL中不可用.
接下来,让我们使用WeekStart和weekEnd来克隆我们的数据集,在这个例子中,我将展示如何使用硬编码日期进行过滤:
SELECT *
FROM `logins`
WHERE login_timestamp BETWEEN '2015-09-29 14:05:07' AND '2015-09-29 14:05:13'
Run Code Online (Sandbox Code Playgroud)
这应该返回我们的数据集切片,只有相关的结果:
+---------+---------------------+
| user_id | login_timestamp |
+---------+---------------------+
| 2 | 2015-09-29 14:05:08 |
| 1 | 2015-09-29 14:05:12 |
+---------+---------------------+
Run Code Online (Sandbox Code Playgroud)
然后我们可以将结果集减少到只有user_ids,并过滤掉重复项.然后算一算,这样:
SELECT COUNT(DISTINCT user_id)
FROM `logins`
WHERE login_timestamp BETWEEN '2015-09-29 14:05:07' AND '2015-09-29 14:05:13'
Run Code Online (Sandbox Code Playgroud)
DISTINCT 将过滤掉重复项,而count将仅返回金额.
结合起来,这变为:
SELECT COUNT(DISTINCT user_id)
FROM `logins`
WHERE login_timestamp
BETWEEN DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 1- DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 00:00:00")
AND DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 7- DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 23:59:59")
Run Code Online (Sandbox Code Playgroud)
替换CURDATE()为任何时间戳,以获得该周的用户登录计数.
但我需要把它打破几天,我听到你哭了.当然!这是如何:
首先,让我们将过度信息化的时间戳转换为日期数据.我们添加DISTINCT是因为我们不介意同一天用户登录两次.我们统计用户,而不是登录,对吧?(注意我们回到这里):
SELECT DISTINCT user_id, DATE_FORMAT(login_timestamp, "%Y-%m-%d")
FROM `logins`
Run Code Online (Sandbox Code Playgroud)
这会产生:
+---------+-----------------+
| user_id | login_timestamp |
+---------+-----------------+
| 1 | 2015-09-29 |
| 2 | 2015-09-29 |
| 4 | 2015-09-22 |
| ... | ... |
+---------+-----------------+
Run Code Online (Sandbox Code Playgroud)
这个查询,我们将用第二个包装,以计算每个日期的外观:
SELECT `login_timestamp`, count(*) AS 'count'
FROM (SELECT DISTINCT user_id, DATE_FORMAT(login_timestamp, "%Y-%m-%d") AS `login_timestamp` FROM `logins`) `loginsMod`
GROUP BY `login_timestamp`
Run Code Online (Sandbox Code Playgroud)
我们使用count和分组来获取按日期列出的列表,它返回:
+-----------------+-------+
| login_timestamp | count |
+-----------------+-------+
| 2015-09-29 | 1 +
| 2015-09-22 | 2 +
+-----------------+-------+
Run Code Online (Sandbox Code Playgroud)
经过艰苦的努力,两者结合起来:
SELECT `login_timestamp`, COUNT(*)
FROM (
SELECT DISTINCT user_id, DATE_FORMAT(login_timestamp, "%Y-%m-%d") AS `login_timestamp`
FROM `logins`
WHERE login_timestamp BETWEEN DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 1- DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 00:00:00") AND DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 7- DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 23:59:59")) `loginsMod`
GROUP BY `login_timestamp`;
Run Code Online (Sandbox Code Playgroud)
将在本周为您提供每日登录的每日细目.再次,替换CURDATE()以获得不同的一周.
至于登录的用户自己,让我们以不同的顺序组合相同的东西:
SELECT `user_id`
FROM (
SELECT `user_id`, COUNT(*) AS `login_count`
FROM (
SELECT DISTINCT `user_id`, DATE_FORMAT(`login_timestamp`, "%Y-%m-%d")
FROM `logins`) `logins`
GROUP BY `user_id`) `logincounts`
WHERE `login_count` > 6
Run Code Online (Sandbox Code Playgroud)
我有两个内部查询,第一个是logins:
SELECT DISTINCT `user_id`, DATE_FORMAT(`login_timestamp`, "%Y-%m-%d")
FROM `logins`
Run Code Online (Sandbox Code Playgroud)
将提供用户列表以及他们登录的日期,没有重复.
然后我们有logincounts:
SELECT `user_id`, COUNT(*) AS `login_count`
FROM `logins` -- See previous subquery.
GROUP BY `user_id`) `logincounts`
Run Code Online (Sandbox Code Playgroud)
将返回相同的列表,并计算每个用户登录的次数.
最后:SELECT user_id
FROM logincounts- 查看上一个子查询.在哪里login_count> 6
过滤我们未登录7次的人,并删除日期列.
这种方式很长,但我认为它充满了想法,我认为它肯定有助于在工作面试中以有趣的方式回答.:)
小智 5
create table fbuser(id integer, date date);
insert into fbuser(id,date)values(1,'2012-01-01');
insert into fbuser(id,date)values(1,'2012-01-02');
insert into fbuser(id,date)values(1,'2012-01-01');
insert into fbuser(id,date)values(1,'2012-01-01');
insert into fbuser(id,date)values(1,'2012-01-01');
insert into fbuser(id,date)values(1,'2012-01-01');
insert into fbuser(id,date)values(1,'2012-01-02');
insert into fbuser(id,date)values(1,'2012-01-03');
insert into fbuser(id,date)values(1,'2012-01-04');
insert into fbuser(id,date)values(1,'2012-01-05');
insert into fbuser(id,date)values(1,'2012-01-06');
insert into fbuser(id,date)values(1,'2012-01-07');
insert into fbuser(id,date)values(4,'2012-01-08');
insert into fbuser(id,date)values(4,'2012-01-08');
insert into fbuser(id,date)values(1,'2012-01-08');
insert into fbuser(id,date)values(1,'2012-01-09');
select * from fbuser;
id | date
----+------------
1 | 2012-01-01
1 | 2012-01-02
1 | 2012-01-01
1 | 2012-01-01
1 | 2012-01-01
1 | 2012-01-01
1 | 2012-01-02
1 | 2012-01-03
1 | 2012-01-04
1 | 2012-01-05
1 | 2012-01-06
1 | 2012-01-07
2 | 2012-01-07
3 | 2012-01-07
4 | 2012-01-07
4 | 2012-01-08
4 | 2012-01-08
1 | 2012-01-08
1 | 2012-01-09
select id,count(DISTINCT date) from fbuser
where date BETWEEN '2012-01-01' and '2012-01-07'
group by id having count(DISTINCT date)=7
id | count
----+-------
1 | 7
(1 row)
Run Code Online (Sandbox Code Playgroud)
查询对给定时间段内用户登录的唯一日期进行计数,并返回出现 7 次的 id。如果您的日期也有时间,则可以使用 date_format。