Dar*_*nke 5 php mysql group-by greatest-n-per-group
简化表结构:
CREATE TABLE IF NOT EXISTS `hpa` (
`id` bigint(15) NOT NULL auto_increment,
`core` varchar(50) NOT NULL,
`hostname` varchar(50) NOT NULL,
`status` varchar(255) NOT NULL,
`entered_date` int(11) NOT NULL,
`active_date` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `hostname` (`hostname`),
KEY `status` (`status`),
KEY `entered_date` (`entered_date`),
KEY `core` (`core`),
KEY `active_date` (`active_date`)
)
Run Code Online (Sandbox Code Playgroud)
为此,我有以下SQL查询,它简单地总计了具有已定义状态的所有记录.
SELECT core,COUNT(hostname) AS hostname_count, MAX(active_date) AS last_active
FROM `hpa`
WHERE
status != 'OK' AND status != 'Repaired'
GROUP BY core
ORDER BY core
Run Code Online (Sandbox Code Playgroud)
此查询已简化为将INNER JOINS移除到不相关的数据和不应影响问题的额外列.
MAX(active_date)对于特定日期的所有记录都相同,应始终选择最近一天,或允许从NOW()偏移.(这是一个UNIXTIME字段)
我想要两个计数:(状态!='确定'和状态!='修复')
和反...计数:(状态='确定'或状态='已修复')
并且第一个答案除以第二个答案,对于'percentage_dead'(可能跟后处理一样快)
最近一天或抵消( - 昨天的86400等)
表包含大约500k记录,并且每天增长大约5000个,因此单个SQL查询而不是循环将是真正的好.
我想一些创造性的IF可以做到这一点.您的专业知识表示赞赏
编辑:我愿意对今天的数据或来自偏移的数据使用不同的SQL查询.
编辑:查询工作,足够快,但我目前不能让用户对百分比列(从坏和良好计数派生的那一列)排序.这不是一个表演限制,但我允许他们对其他一切进行排序.这个ORDER BY:
SELECT h1.core, MAX(h1.entered_date) AS last_active,
SUM(CASE WHEN h1.status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS good_host_count,
SUM(CASE WHEN h1.status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS bad_host_count
FROM `hpa` h1
LEFT OUTER JOIN `hpa` h2 ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date)
WHERE h2.hostname IS NULL
GROUP BY h1.core
ORDER BY ( bad_host_count / ( bad_host_count + good_host_count ) ) DESC,h1.core
Run Code Online (Sandbox Code Playgroud)
给我:#1247 - 不支持引用'bad_host_count'(引用组功能)
编辑:解决了不同的部分.以下工作并允许我ORDER BY percentage_dead
SELECT c.core, c.last_active,
SUM(CASE WHEN d.dead = 1 THEN 0 ELSE 1 END) AS good_host_count,
SUM(CASE WHEN d.dead = 1 THEN 1 ELSE 0 END) AS bad_host_count,
( SUM(CASE WHEN d.dead = 1 THEN 1 ELSE 0 END) * 100/
( (SUM(CASE WHEN d.dead = 1 THEN 0 ELSE 1 END) )+(SUM(CASE WHEN d.dead = 1 THEN 1 ELSE 0 END) ) ) ) AS percentage_dead
FROM `agent_cores` c
LEFT JOIN `dead_agents` d ON c.core = d.core
WHERE d.active = 1
GROUP BY c.core
ORDER BY percentage_dead
Run Code Online (Sandbox Code Playgroud)
如果我理解的话,您希望获得上次活动日期的正常主机名与不正常主机名的状态计数。正确的?然后应该按核心分组。
SELECT core, MAX(active_date)
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS OK_host_count,
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS broken_host_count
FROM `hpa` h1 LEFT OUTER JOIN `hpa` h2
ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date)
WHERE h2.hostname IS NULL
GROUP BY core
ORDER BY core;
Run Code Online (Sandbox Code Playgroud)
这是“greatest-n-per-group”问题的变体,我在 StackOverflow 上的 SQL 问题中经常看到该问题。
首先,只想选择每个主机名具有最新活动日期的行,我们可以通过对具有相同主机名和更大的 active_date 的行进行外连接来实现这一点。如果没有找到这样的匹配,我们已经拥有每个给定主机名的最新行。
然后按核心分组并按状态对行进行计数。
这是今天日期的解决方案(假设未来没有行具有 active_date)。要将结果限制为 N 天前的行,您必须限制两个表。
SELECT core, MAX(active_date)
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS OK_host_count,
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS broken_host_count
FROM `hpa` h1 LEFT OUTER JOIN `hpa` h2
ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date
AND h2.active_date <= CURDATE() - INTERVAL 1 DAY)
WHERE h1.active_date <= CURDATE() - INTERVAL 1 DAY AND h2.hostname IS NULL
GROUP BY core
ORDER BY core;
Run Code Online (Sandbox Code Playgroud)
关于正常主机名和损坏主机名之间的比率,我建议仅在 PHP 代码中计算该比率。SQL 不允许您在其他选择列表表达式中引用列别名,因此您必须将上述内容包装为子查询,这比本例中的价值要复杂得多。
我忘了你说过你正在使用 UNIX 时间戳。做这样的事情:
SELECT core, MAX(active_date)
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS OK_host_count,
SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS broken_host_count
FROM `hpa` h1 LEFT OUTER JOIN `hpa` h2
ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date
AND h2.active_date <= UNIX_TIMESTAMP() - 86400)
WHERE h1.active_date <= UNIX_TIMESTAMP() - 86400 AND h2.hostname IS NULL
GROUP BY core
ORDER BY core;
Run Code Online (Sandbox Code Playgroud)