想象一个包含以下列的简单表:
item_id, date
和值:
CREATE TABLE foo (item_id int, date date);
INSERT INTO foo(item_id, date)
VALUES
( 1, '2017-02-10' ),
( 2, '2017-02-10' ),
( 1, '2017-02-11' ),
( 1, '2017-02-12' ),
( 1, '2017-02-13' ),
( 2, '2017-02-13' ),
( 1, '2017-02-14' );
Run Code Online (Sandbox Code Playgroud)
如何选择item_id表中连续7天记录的s?
开始和结束日期未知。它应该从任何开始日期到之后的连续 7 天都可用。
MySQL 8 提供了窗口函数...
SELECT item_id
FROM (
SELECT
item_id,
date,
count(coalesce(diff, 1)=1 OR null) OVER (PARTITION BY item_id ORDER BY date) seq
FROM (
SELECT
item_id,
date,
date - lag(date) OVER (PARTITION BY item_id ORDER BY date) AS diff
FROM foo
) AS t
) AS t2
GROUP BY item_id
HAVING max(seq) > 7;
Run Code Online (Sandbox Code Playgroud)
这就是我们在内部所做的事情。
SELECT
item_id,
date,
date - lag(date) OVER (PARTITION BY item_id ORDER BY date) AS diff
FROM foo
item_id | date | diff
---------+------------+------
1 | 2017-02-10 |
1 | 2017-02-11 | 1
1 | 2017-02-12 | 1
1 | 2017-02-13 | 1
1 | 2017-02-14 | 1
2 | 2017-02-10 |
2 | 2017-02-13 | 3
(7 rows)
Run Code Online (Sandbox Code Playgroud)
在这里,我们返回差异。我们现在需要做的是隔离日期差为 1 的那些。我们在这里假设如果差异的结果为空,那是因为没有要减去的前一个日期,所以我们将其设置为 1。然后我们如果我们没有 1,我们将值设置为null因此count()跳过它。
SELECT
item_id,
date,
count(coalesce(diff, 1)=1 OR null) OVER (PARTITION BY item_id ORDER BY date) seq
FROM (
SELECT
item_id,
date,
date - lag(date) OVER (PARTITION BY item_id ORDER BY date) AS diff
FROM foo
) AS t;
item_id | date | seq
---------+------------+-----
1 | 2017-02-10 | 1
1 | 2017-02-11 | 2
1 | 2017-02-12 | 3
1 | 2017-02-13 | 4
1 | 2017-02-14 | 5
2 | 2017-02-10 | 1
2 | 2017-02-13 | 1
(7 rows)
Run Code Online (Sandbox Code Playgroud)
从这一点来看,它只是一个GROUP BYand HAVING。
这在 PostgreSQL 中进行了测试,因为 MySQL 8 尚未发布。如果您还没有使用过PostgreSQL,请免费下载并查看。它就像 MySQL,但在各个方面都更好。