我跟踪网络访问者.我存储IP地址以及访问的时间戳.
ip_address time_stamp
180.2.79.3 1301654105
180.2.79.3 1301654106
180.2.79.3 1301654354
180.2.79.3 1301654356
180.2.79.3 1301654358
180.2.79.3 1301654366
180.2.79.3 1301654368
180.2.79.3 1301654422
Run Code Online (Sandbox Code Playgroud)
我有一个查询来获取总曲目:
SELECT COUNT(*) AS tracks FROM tracking
Run Code Online (Sandbox Code Playgroud)
但是,我现在想要忽略每次访问后10秒内多次访问过的用户的访问.由于我不考虑这次访问,它仍然是第一次访问的一部分.
当ip_address相同时,检查时间戳并仅计算彼此相距10秒的那些行.
我很难将它放入SQL查询表单中,我将不胜感激任何帮助!
Mik*_*ll' 15
让我从这张桌子开始吧.我将使用普通时间戳,以便我们可以轻松查看正在发生的事情.
180.2.79.3 2011-01-01 08:00:00
180.2.79.3 2011-01-01 08:00:09
180.2.79.3 2011-01-01 08:00:20
180.2.79.3 2011-01-01 08:00:23
180.2.79.3 2011-01-01 08:00:25
180.2.79.3 2011-01-01 08:00:40
180.2.79.4 2011-01-01 08:00:00
180.2.79.4 2011-01-01 08:00:13
180.2.79.4 2011-01-01 08:00:23
180.2.79.4 2011-01-01 08:00:25
180.2.79.4 2011-01-01 08:00:27
180.2.79.4 2011-01-01 08:00:29
180.2.79.4 2011-01-01 08:00:50
Run Code Online (Sandbox Code Playgroud)
如果我理解正确,你想要像这样算这些.
180.2.79.3 3
180.2.79.4 3
Run Code Online (Sandbox Code Playgroud)
您可以通过选择两者的最大时间戳来为每个ip_address执行此操作
将这两个标准放在一起将会引入一些空值,结果证明这些空值非常有用.
select ip_address,
t_s.time_stamp,
(select max(t.time_stamp)
from t_s t
where t.ip_address = t_s.ip_address
and t.time_stamp > t_s.time_stamp
and t.time_stamp - t_s.time_stamp <= interval '10' second) next_page
from t_s
group by ip_address, t_s.time_stamp
order by ip_address, t_s.time_stamp;
ip_address time_stamp next_page
180.2.79.3 2011-01-01 08:00:00 2011-01-01 08:00:09
180.2.79.3 2011-01-01 08:00:09 <null>
180.2.79.3 2011-01-01 08:00:20 2011-01-01 08:00:25
180.2.79.3 2011-01-01 08:00:23 2011-01-01 08:00:25
180.2.79.3 2011-01-01 08:00:25 <null>
180.2.79.3 2011-01-01 08:00:40 <null>
180.2.79.4 2011-01-01 08:00:00 <null>
180.2.79.4 2011-01-01 08:00:13 2011-01-01 08:00:23
180.2.79.4 2011-01-01 08:00:23 2011-01-01 08:00:29
180.2.79.4 2011-01-01 08:00:25 2011-01-01 08:00:29
180.2.79.4 2011-01-01 08:00:27 2011-01-01 08:00:29
180.2.79.4 2011-01-01 08:00:29 <null>
180.2.79.4 2011-01-01 08:00:50 <null>
Run Code Online (Sandbox Code Playgroud)
标记访问结束的时间戳对于其自己的next_page而言为null.那是因为没有时间戳小于或等于该行的time_stamp + 10秒.
为了获得计数,我可能会创建一个视图并计算空值.
select ip_address, count(*)
from t_s_visits
where next_page is null
group by ip_address
180.2.79.3 3
180.2.79.4 3
Run Code Online (Sandbox Code Playgroud)
您可以JOIN将跟踪表发送给自己,并通过添加WHERE子句过滤掉您不需要的记录.
SELECT t1.ip_address
, COUNT(*) AS tracks
FROM tracking t1
LEFT OUTER JOIN tracking t2 ON t2.ip_address = t1.ip_address
AND t2.time_stamp < t1.time_stamp + 10
WHERE t2.ip_adress IS NULL
GROUP BY
t1.ip_address
Run Code Online (Sandbox Code Playgroud)
编辑
以下脚本在SQL Server中工作,但我无法在单个SQL语句中表达它,更不用说将其转换为MySQL.它可能会给你一些关于所需内容的指示.
注意:我假设对于给定的输入,应该选择数字1和11.
;WITH q (number) AS (
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 10
UNION ALL SELECT 11
UNION ALL SELECT 12
)
SELECT q1.Number as n1
, q2.Number as n2
, 0 as Done
INTO #Temp
FROM q q1
LEFT OUTER JOIN q q2 ON q2.number < q1.number + 10
AND q2.number > q1.number
DECLARE @n1 INTEGER
DECLARE @n2 INTEGER
WHILE EXISTS (SELECT * FROM #Temp WHERE Done = 0)
BEGIN
SELECT TOP 1 @n1 = n1
, @n2= n2
FROM #Temp
WHERE Done = 0
DELETE FROM #Temp
WHERE n1 = @n2
UPDATE #Temp
SET Done = 1
WHERE n1 = @n1
AND n2 = @n2
END
SELECT DISTINCT n1
FROM #Temp
DROP TABLE #Temp
Run Code Online (Sandbox Code Playgroud)