这个问题是这个问题的延续:如何从两个不同的表中获取数据?
现在我需要将表Sessions与 2 个表连接起来:
Input Downloads 当我将此查询与真实数据一起使用时:
SELECT sessions.ip
, COUNT(sessions.id)
, COUNT(input.input) as TotalInputs
, COUNT(DISTINCT input.input) as UniqInputs
, COUNT(downloads.shasum) as files
, COUNT(DISTINCT download.shasum) as Uniqfiles
FROM sessions, input, downloads
WHERE sessions.id = input.session
AND sessions.session = downloads.session
AND date_format(sessions.starttime, '%Y-%m-%d') > "2015-01-01"
GROUP BY sessions.ip
ORDER BY COUNT(sessions.id) DESC LIMIT 5;
Run Code Online (Sandbox Code Playgroud)
我得到这个输出:
ip | COUNT(sessions.id) | TotalInputs | UniqInputs | files | Uniqfiles
IP1 | 11145 | 11145 | 15 | 11145 | 8
IP2 | 9125 | 9125 | 71 | 0 | 0
IP3 | 7882 | 7882 | 56 | 7882 | 19
Run Code Online (Sandbox Code Playgroud)
但是count(sessions.id),TotalInputs和 的数字Files并不准确。例如,如果我使用这个查询:
SELECT downloads.shasum
FROM sessions, downloads
WHERE sessions.id = downloads.session
AND date_format(sessions.starttime, '%Y-%m-%d') > "2015-01-01"
AND sessions.ip = "IP3";
Run Code Online (Sandbox Code Playgroud)
我发现FilesIP3 的计数具有正确的值 752(不是 7882)。的实际价值TotalInputs小于COUNT(sessions.id)。
如何修复我的查询?
此SQL Fiddle上提供了示例数据。
使用上面的查询和下面的示例数据,我得到了这个输出:
ip | COUNT(sessions.id) | TotalInputs | UniqInputs | files | uniq_files
IP2 | 3 | 3 | 2 | 3 | 1
IP3 | 8 | 8 | 4 | 8 | 2
Run Code Online (Sandbox Code Playgroud)
我需要这个输出:
ip | COUNT(sessions.id) | TotalInputs | UniqInputs | files | uniq_files
IP2 | 1 | 3 | 2 | 1 | 1
IP3 | 3 | 5 | 4 | 3 | 2
Run Code Online (Sandbox Code Playgroud)
如何更新我的查询?
示例会话数据:
id | starttime | endtime | sensor | ip | termsize | client
id1 | 2015-05-07 11:01:20 | 2015-05-07 18:01:32 | 10 | IP3 | 80x50 | 3
id2 | 2015-05-07 18:03:20 | 2015-03-07 18:11:32 | 2 | IP2 | 80x50 | 1
id3 | 2015-05-07 23:05:20 | 2015-06-07 18:10:32 | 10 | IP3 | 80x70 | 3
id4 | 2015-05-07 13:05:20 | 2015-05-09 20:05:32 | 7 | IP3 | 60x30 | 5
Run Code Online (Sandbox Code Playgroud)
样本输入数据:
id | session | timestamp | realm | success | input
1 | id1 | 2015-07-13 10:29:18 | NULL | 1 | date
2 | id3 | 2015-08-13 10:11:18 | NULL | 0 | aaa
3 | id1 | 2015-03-13 10:11:18 | NULL | 0 | aaa
4 | id1 | 2015-07-14 10:33:15 | NULL | 1 | uname
5 | id3 | 2015-05-19 20:33:11 | NULL | 1 | netstat
6 | id2 | 2015-09-22 10:53:21 | NULL | 1 | pwd
7 | id2 | 2015-09-22 10:58:11 | NULL | 1 | pwd
8 | id2 | 2015-11-03 09:53:07 | NULL | 0 | bbb
Run Code Online (Sandbox Code Playgroud)
示例下载数据:
id | session | timestamp | url | outfile | shasum
1 | id1 | 2014-07-13 12:15:47 | http://xxx | xxx | SHA1
2 | id2 | 2014-09-13 12:18:50 | http://xxx2 | xxx2 | SHA2
3 | id1 | 2015-09-11 13:20:50 | http://xxx3 | xxx3 | SHA1
4 | id3 | 2016-01-19 18:21:30 | http://xxx4 | xxx4 | SHA3
Run Code Online (Sandbox Code Playgroud)
这个查询:
LEFT JOIN,以便为每个表来计算会话的IP没有Input或DownloadsDISTINCT为每个COUNT以删除重复所添加的JOIN表之间询问:
SELECT s.ip
, COUNT(DISTINCT s.id)
, COUNT(DISTINCT i.id) as TotalInputs
, COUNT(DISTINCT i.input) as UniqInputs
, COUNT(DISTINCT d.id) as files
, COUNT(DISTINCT d.shasum) as Uniqfiles
FROM sessions s
LEFT JOIN input i
ON s.id = i.session
LEFT JOIN downloads d
ON s.id = d.session
GROUP BY s.ip;
Run Code Online (Sandbox Code Playgroud)
输出:
ip | COUNT(DISTINCT s.id) | TotalInputs | UniqInputs | files | Uniqfiles
IP2 | 1 | 3 | 2 | 1 | 1
IP3 | 3 | 5 | 4 | 3 | 2
Run Code Online (Sandbox Code Playgroud)