计算彼此不在10秒内的行数

Abs*_*Abs 12 mysql sql

我跟踪网络访问者.我存储IP地址以及访问的时间戳.

ip_address    time_stamp
180.2.79.3  1301654105
180.2.79.3  1301654106
180.2.79.3  1301654354
180.2.79.3  1301654356
180.2.79.3  1301654358
180.2.79.3  1301654366
180.2.79.3  1301654368
180.2.79.3  1301654422
Run Code Online (Sandbox Code Playgroud)

我有一个查询来获取总曲目:

SELECT COUNT(*) AS tracks FROM tracking
Run Code Online (Sandbox Code Playgroud)

但是,我现在想要忽略每次访问后10秒内多次访问过的用户的访问.由于我不考虑这次访问,它仍然是第一次访问的一部分.

当ip_address相同时,检查时间戳并仅计算彼此相距10秒的那些行.

我很难将它放入SQL查询表单中,我将不胜感激任何帮助!

Mik*_*ll' 15

让我从这张桌子开始吧.我将使用普通时间戳,以便我们可以轻松查看正在发生的事情.

180.2.79.3   2011-01-01 08:00:00
180.2.79.3   2011-01-01 08:00:09
180.2.79.3   2011-01-01 08:00:20
180.2.79.3   2011-01-01 08:00:23
180.2.79.3   2011-01-01 08:00:25
180.2.79.3   2011-01-01 08:00:40
180.2.79.4   2011-01-01 08:00:00
180.2.79.4   2011-01-01 08:00:13
180.2.79.4   2011-01-01 08:00:23
180.2.79.4   2011-01-01 08:00:25
180.2.79.4   2011-01-01 08:00:27
180.2.79.4   2011-01-01 08:00:29
180.2.79.4   2011-01-01 08:00:50
Run Code Online (Sandbox Code Playgroud)

如果我理解正确,你想要像这样算这些.

180.2.79.3   3
180.2.79.4   3
Run Code Online (Sandbox Code Playgroud)

您可以通过选择两者的最大时间戳来为每个ip_address执行此操作

  • 大于当前行的时间戳,和
  • 小于或等于当前行的时间戳的10秒.

将这两个标准放在一起将会引入一些空值,结果证明这些空值非常有用.

select ip_address, 
       t_s.time_stamp, 
       (select max(t.time_stamp) 
        from t_s t 
        where t.ip_address = t_s.ip_address 
          and t.time_stamp > t_s.time_stamp
          and t.time_stamp - t_s.time_stamp <= interval '10' second) next_page
from t_s 
group by ip_address, t_s.time_stamp
order by ip_address, t_s.time_stamp;

ip_address   time_stamp            next_page
180.2.79.3   2011-01-01 08:00:00   2011-01-01 08:00:09
180.2.79.3   2011-01-01 08:00:09   <null>
180.2.79.3   2011-01-01 08:00:20   2011-01-01 08:00:25
180.2.79.3   2011-01-01 08:00:23   2011-01-01 08:00:25
180.2.79.3   2011-01-01 08:00:25   <null>
180.2.79.3   2011-01-01 08:00:40   <null>
180.2.79.4   2011-01-01 08:00:00   <null>
180.2.79.4   2011-01-01 08:00:13   2011-01-01 08:00:23
180.2.79.4   2011-01-01 08:00:23   2011-01-01 08:00:29
180.2.79.4   2011-01-01 08:00:25   2011-01-01 08:00:29
180.2.79.4   2011-01-01 08:00:27   2011-01-01 08:00:29
180.2.79.4   2011-01-01 08:00:29   <null>
180.2.79.4   2011-01-01 08:00:50   <null>
Run Code Online (Sandbox Code Playgroud)

标记访问结束的时间戳对于其自己的next_page而言为null.那是因为没有时间戳小于或等于该行的time_stamp + 10秒.

为了获得计数,我可能会创建一个视图并计算空值.

select ip_address, count(*)
from t_s_visits 
where next_page is null
group by ip_address

180.2.79.3   3
180.2.79.4   3
Run Code Online (Sandbox Code Playgroud)


Lie*_*ers 6

您可以JOIN将跟踪表发送给自己,并通过添加WHERE子句过滤掉您不需要的记录.

SELECT  t1.ip_address
        , COUNT(*) AS tracks
FROM    tracking t1
        LEFT OUTER JOIN tracking t2 ON t2.ip_address = t1.ip_address
                                       AND t2.time_stamp < t1.time_stamp + 10
WHERE   t2.ip_adress IS NULL
GROUP BY
        t1.ip_address
Run Code Online (Sandbox Code Playgroud)

编辑

以下脚本在SQL Server中工作,但我无法在单个SQL语句中表达它,更不用说将其转换为MySQL.它可能会给你一些关于所需内容的指示.

注意:我假设对于给定的输入,应该选择数字1和11.

;WITH q (number) AS (
  SELECT 1
  UNION ALL SELECT 2
  UNION ALL SELECT 10
  UNION ALL SELECT 11  
  UNION ALL SELECT 12
)
SELECT  q1.Number as n1
        , q2.Number as n2
        , 0 as Done
INTO    #Temp
FROM    q q1
        LEFT OUTER JOIN q q2 ON q2.number < q1.number + 10
                                AND q2.number > q1.number

DECLARE @n1 INTEGER
DECLARE @n2 INTEGER

WHILE EXISTS (SELECT * FROM #Temp WHERE Done = 0)
BEGIN

  SELECT  TOP 1 @n1 = n1
          , @n2= n2
  FROM    #Temp
  WHERE   Done = 0

  DELETE  FROM #Temp
  WHERE   n1 = @n2

  UPDATE  #Temp 
  SET     Done = 1
  WHERE   n1 = @n1 
          AND n2 = @n2         
END        

SELECT  DISTINCT n1 
FROM    #Temp

DROP TABLE #Temp
Run Code Online (Sandbox Code Playgroud)