用于确定最小连续访问天数的SQL?

Jef*_*ood 124 sql sql-server date gaps-and-islands

以下用户历史记录表包含给定用户访问网站的每一天的一条记录(在24小时UTC时间段内).它有数千条记录,但每个用户每天只有一条记录.如果用户当天没有访问过该网站,则不会生成任何记录.

Id      UserId   CreationDate
------  ------   ------------
750997      12   2009-07-07 18:42:20.723
750998      15   2009-07-07 18:42:20.927
751000      19   2009-07-07 18:42:22.283

我正在寻找的是这个表上的SQL查询具有良好的性能,它告诉我哪些用户组连续几天访问了网站而没有错过一天.

换句话说,有多少用户在此表中有(n)个记录,包括顺序(前一天或后一天)日期?如果序列中缺少任何一天,则序列被破坏并应在1处重新开始; 我们正在寻找在这里连续几天没有差距的用户.

此查询与特定Stack Overflow徽章之间的任何相似之处纯属巧合,当然.. :)

Rob*_*ley 147

怎么样(请确保前面的语句以分号结尾):

WITH numberedrows
     AS (SELECT ROW_NUMBER() OVER (PARTITION BY UserID 
                                       ORDER BY CreationDate)
                - DATEDIFF(day,'19000101',CreationDate) AS TheOffset,
                CreationDate,
                UserID
         FROM   tablename)
SELECT MIN(CreationDate),
       MAX(CreationDate),
       COUNT(*) AS NumConsecutiveDays,
       UserID
FROM   numberedrows
GROUP  BY UserID,
          TheOffset  
Run Code Online (Sandbox Code Playgroud)

我们的想法是,如果我们有天数列表(作为数字)和row_number,那么错过的天数会使这两个列表之间的偏差略大一些.所以我们正在寻找具有一致偏移的范围.

您可以在此末尾使用"ORDER BY NumConsecutiveDays DESC",或者说"HAVING count(*)> 14"表示阈值...

我没有测试过这个 - 只是把它写在我的头顶.希望在SQL2005中运行.

...并且会对tablename上的索引(UserID,CreationDate)提供很大帮助

编辑:结果偏移是一个保留字,所以我使用了TheOffset.

编辑:使用COUNT(*)的建议是非常有效的 - 我应该首先做到这一点,但并没有真正思考.以前它使用的是datediff(day,min(CreationDate),max(CreationDate)).

  • 哦,你还应该添加;之前与 -> ;与 (2认同)
  • Mladen - 不,你应该用分号结束之前的陈述.;)杰夫 - 好的,改为[偏移].我猜偏移是一个保留字.就像我说的,我没有测试过. (2认同)

Spe*_*ort 69

答案显然是:

SELECT DISTINCT UserId
FROM UserHistory uh1
WHERE (
       SELECT COUNT(*) 
       FROM UserHistory uh2 
       WHERE uh2.CreationDate 
       BETWEEN uh1.CreationDate AND DATEADD(d, @days, uh1.CreationDate)
      ) = @days OR UserId = 52551
Run Code Online (Sandbox Code Playgroud)

编辑:

好的,这是我认真的答案:

DECLARE @days int
DECLARE @seconds bigint
SET @days = 30
SET @seconds = (@days * 24 * 60 * 60) - 1
SELECT DISTINCT UserId
FROM (
    SELECT uh1.UserId, Count(uh1.Id) as Conseq
    FROM UserHistory uh1
    INNER JOIN UserHistory uh2 ON uh2.CreationDate 
        BETWEEN uh1.CreationDate AND 
            DATEADD(s, @seconds, DATEADD(dd, DATEDIFF(dd, 0, uh1.CreationDate), 0))
        AND uh1.UserId = uh2.UserId
    GROUP BY uh1.Id, uh1.UserId
    ) as Tbl
WHERE Conseq >= @days
Run Code Online (Sandbox Code Playgroud)

编辑:

[Jeff Atwood]这是一个非常快速的解决方案,值得被接受,但Rob Farley的解决方案也非常出色,可以说更快(!).请检查一下!

  • 使用DATEADD(dd,DATEDIFF(dd,0,CreationDate),0)将CreateionDate截断到所有这些测试中的天数(仅在右侧或者您杀死SARG),这通过从零减去提供的日期来工作 - Microsoft SQL Server解释为1900-01-01 00:00:00并给出天数.然后将该值重新添加到零日期,从而产生截断时间的相同日期. (4认同)
  • 此查询有可能错过23:59:59.5的访问 - 如何将其更改为:`ON uh2.CreationDate> = uh1.CreationDate AND uh2.CreationDate <DATEADD(dd,DATEDIFF(dd,0,uh1) .CreationDate)+ @days,0)`,表示"尚未在第31天之后".也意味着您可以跳过@seconds计算. (3认同)

Meh*_*ari 18

如果您可以更改表模式,我建议您在表中添加一个列LongestStreak,该列设置为以结尾的连续天数CreationDate.这很容易更新在登录时表(类似于你在做什么已经,如果没有行当天的存在,你会如果任何行存在前一天的检查.如果属实,你会递增LongestStreak中新行,否则,您将其设置为1.)

添加此列后,查询将显而易见:

if exists(select * from table
          where LongestStreak >= 30 and UserId = @UserId)
   -- award the Woot badge.
Run Code Online (Sandbox Code Playgroud)

  • 我们不会为此更改架构 (7认同)
  • 它绝对是一个有效的解决方案,但它不是我要求的.所以我给它一个"大拇指侧身".. (3认同)

小智 6

一些很好的表达式SQL:

select
        userId,
    dbo.MaxConsecutiveDates(CreationDate) as blah
from
    dbo.Logins
group by
    userId
Run Code Online (Sandbox Code Playgroud)

假设你有一个用户定义的聚合函数的某些东西(注意这是错误的):

using System;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Runtime.InteropServices;

namespace SqlServerProject1
{
    [StructLayout(LayoutKind.Sequential)]
    [Serializable]
    internal struct MaxConsecutiveState
    {
        public int CurrentSequentialDays;
        public int MaxSequentialDays;
        public SqlDateTime LastDate;
    }

    [Serializable]
    [SqlUserDefinedAggregate(
        Format.Native,
        IsInvariantToNulls = true, //optimizer property
        IsInvariantToDuplicates = false, //optimizer property
        IsInvariantToOrder = false) //optimizer property
    ]
    [StructLayout(LayoutKind.Sequential)]
    public class MaxConsecutiveDates
    {
        /// <summary>
        /// The variable that holds the intermediate result of the concatenation
        /// </summary>
        private MaxConsecutiveState _intermediateResult;

        /// <summary>
        /// Initialize the internal data structures
        /// </summary>
        public void Init()
        {
            _intermediateResult = new MaxConsecutiveState { LastDate = SqlDateTime.MinValue, CurrentSequentialDays = 0, MaxSequentialDays = 0 };
        }

        /// <summary>
        /// Accumulate the next value, not if the value is null
        /// </summary>
        /// <param name="value"></param>
        public void Accumulate(SqlDateTime value)
        {
            if (value.IsNull)
            {
                return;
            }
            int sequentialDays = _intermediateResult.CurrentSequentialDays;
            int maxSequentialDays = _intermediateResult.MaxSequentialDays;
            DateTime currentDate = value.Value.Date;
            if (currentDate.AddDays(-1).Equals(new DateTime(_intermediateResult.LastDate.TimeTicks)))
                sequentialDays++;
            else
            {
                maxSequentialDays = Math.Max(sequentialDays, maxSequentialDays);
                sequentialDays = 1;
            }
            _intermediateResult = new MaxConsecutiveState
                                      {
                                          CurrentSequentialDays = sequentialDays,
                                          LastDate = currentDate,
                                          MaxSequentialDays = maxSequentialDays
                                      };
        }

        /// <summary>
        /// Merge the partially computed aggregate with this aggregate.
        /// </summary>
        /// <param name="other"></param>
        public void Merge(MaxConsecutiveDates other)
        {
            // add stuff for two separate calculations
        }

        /// <summary>
        /// Called at the end of aggregation, to return the results of the aggregation.
        /// </summary>
        /// <returns></returns>
        public SqlInt32 Terminate()
        {
            int max = Math.Max((int) ((sbyte) _intermediateResult.CurrentSequentialDays), (sbyte) _intermediateResult.MaxSequentialDays);
            return new SqlInt32(max);
        }
    }
}
Run Code Online (Sandbox Code Playgroud)


Bil*_*ill 5

似乎您可以利用这样一个事实:要连续 n 天需要有 n 行。

所以像这样:

SELECT users.UserId, count(1) as cnt
FROM users
WHERE users.CreationDate > now() - INTERVAL 30 DAY
GROUP BY UserId
HAVING cnt = 30
Run Code Online (Sandbox Code Playgroud)