允许在HAVING子句中使用别名的性能影响

Question

允许在HAVING子句中使用别名的性能影响

今天早些时候我在这个问题上做了一点傻瓜.问题是使用SQL Server,正确的答案涉及添加一个HAVING子句.我犯的最初错误是认为SELECT语句中的别名可以在HAVING子句中使用,这在SQL Server中是不允许的.我犯了这个错误是因为我认为SQL Server与MySQL有相同的规则,它允许在HAVING子句中使用别名.

这让我很好奇,我在Stack Overflow和其他地方探索过,发现了一堆材料,解释了为什么在两个相应的RDBMS上强制实施这些规则.但我没有找到解释在该条款中允许/禁止别名的性能影响的解释HAVING.

举一个具体的例子,我将复制上述问题中出现的查询:

SELECT students.camID, campus.camName, COUNT(students.stuID) as studentCount
FROM students
JOIN campus
    ON campus.camID = students.camID
GROUP BY students.camID, campus.camName
HAVING COUNT(students.stuID) > 3
ORDER BY studentCount

Run Code Online (Sandbox Code Playgroud)

在HAVING子句中使用别名而不是重新指定COUNT？的性能影响是什么？这个问题可以在MySQL中直接回答,希望有人可以深入了解SQL中如果支持该HAVING子句中的别名会发生什么.

这是一个罕见的实例,可以用MySQL和SQL Server标记SQL问题,所以在阳光下享受这一刻.

Answer 1

Dre*_*rew 4

只专注于该特定查询，并在下面加载示例数据。这确实解决了其他一些问题，例如count(distinct ...)其他人提到的问题。

它alias in the HAVING似乎稍微优于或远远优于其替代方案（取决于查询）。

这使用了一个预先存在的表，其中包含大约 500 万行，通过我的这个答案快速创建，需要 3 到 5 分钟。

结果结构：

CREATE TABLE `ratings` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `thing` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5046214 DEFAULT CHARSET=utf8;

Run Code Online (Sandbox Code Playgroud)

而是使用 INNODB 代替。由于范围保留插入而产生预期的 INNODB 间隙异常。只是说说而已，但没有什么区别。470 万行。

修改该表以接近 Tim 假设的架构。

rename table ratings to students; -- not exactly instanteous (a COPY)
alter table students add column camId int; -- get it near Tim's schema
-- don't add the `camId` index yet

Run Code Online (Sandbox Code Playgroud)

接下来需要一段时间。一次又一次地运行它，否则你的连接可能会超时。超时是由于更新语句中没有 LIMIT 子句导致 500 万行。请注意，我们确实有一个 LIMIT 子句。

所以我们要进行 50 万行迭代。将列设置为 1 到 20 之间的随机数

update students set camId=floor(rand()*20+1) where camId is null limit 500000; -- well that took a while (no surprise)

Run Code Online (Sandbox Code Playgroud)

继续运行上面的代码，直到 nocamId为空。

我运行了大约 10 次（整个过程需要 7 到 10 分钟）

select camId,count(*) from students
group by camId order by 1 ;

1   235641
2   236060
3   236249
4   235736
5   236333
6   235540
7   235870
8   236815
9   235950
10  235594
11  236504
12  236483
13  235656
14  236264
15  236050
16  236176
17  236097
18  235239
19  235556
20  234779

select count(*) from students;
-- 4.7 Million rows

Run Code Online (Sandbox Code Playgroud)

创建一个有用的索引（当然是在插入之后）。

create index `ix_stu_cam` on students(camId); -- takes 45 seconds

ANALYZE TABLE students; -- update the stats: http://dev.mysql.com/doc/refman/5.7/en/analyze-table.html
-- the above is fine, takes 1 second

Run Code Online (Sandbox Code Playgroud)

创建校园表。

create table campus
(   camID int auto_increment primary key,
    camName varchar(100) not null
);
insert campus(camName) values
('one'),('2'),('3'),('4'),('5'),
('6'),('7'),('8'),('9'),('ten'),
('etc'),('etc'),('etc'),('etc'),('etc'),
('etc'),('etc'),('etc'),('etc'),('twenty');
-- ok 20 of them

Run Code Online (Sandbox Code Playgroud)

运行两个查询：

SELECT students.camID, campus.camName, COUNT(students.id) as studentCount 
FROM students 
JOIN campus 
    ON campus.camID = students.camID 
GROUP BY students.camID, campus.camName 
HAVING COUNT(students.id) > 3 
ORDER BY studentCount; 
-- run it many many times, back to back, 5.50 seconds, 20 rows of output

Run Code Online (Sandbox Code Playgroud)

和

SELECT students.camID, campus.camName, COUNT(students.id) as studentCount 
FROM students 
JOIN campus 
    ON campus.camID = students.camID 
GROUP BY students.camID, campus.camName 
HAVING studentCount > 3 
ORDER BY studentCount; 
-- run it many many times, back to back, 5.50 seconds, 20 rows of output

Run Code Online (Sandbox Code Playgroud)

所以时间是相同的。每人跑十几次。

两者的输出EXPLAIN相同

+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+
| id | select_type | table    | type | possible_keys | key        | key_len | ref                  | rows   | Extra                           |
+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+
|  1 | SIMPLE      | campus   | ALL  | PRIMARY       | NULL       | NULL    | NULL                 |     20 | Using temporary; Using filesort |
|  1 | SIMPLE      | students | ref  | ix_stu_cam    | ix_stu_cam | 5       | bigtest.campus.camID | 123766 | Using index                     |
+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+

Run Code Online (Sandbox Code Playgroud)

使用 AVG() 函数，通过以下两个查询中的别名having（具有相同的输出），我的性能提高了约 12%。EXPLAIN

SELECT students.camID, campus.camName, avg(students.id) as studentAvg 
FROM students 
JOIN campus 
    ON campus.camID = students.camID 
GROUP BY students.camID, campus.camName 
HAVING avg(students.id) > 2200000 
ORDER BY students.camID; 
-- avg time 7.5

explain 

SELECT students.camID, campus.camName, avg(students.id) as studentAvg 
FROM students 
JOIN campus 
    ON campus.camID = students.camID 
GROUP BY students.camID, campus.camName 
HAVING studentAvg > 2200000
ORDER BY students.camID;
-- avg time 6.5

Run Code Online (Sandbox Code Playgroud)

最后，DISTINCT：

SELECT students.camID, count(distinct students.id) as studentDistinct 
FROM students 
JOIN campus 
    ON campus.camID = students.camID 
GROUP BY students.camID 
HAVING count(distinct students.id) > 1000000 
ORDER BY students.camID; -- 10.6   10.84   12.1   11.49   10.1   9.97   10.27   11.53   9.84 9.98
-- 9.9

 SELECT students.camID, count(distinct students.id) as studentDistinct 
 FROM students 
 JOIN campus 
    ON campus.camID = students.camID 
 GROUP BY students.camID 
 HAVING studentDistinct > 1000000 
 ORDER BY students.camID; -- 6.81    6.55   6.75   6.31   7.11 6.36   6.55
-- 6.45

Run Code Online (Sandbox Code Playgroud)

在相同的输出下，具有的别名始终运行速度快 35%EXPLAIN。见下。因此，相同的解释输出已显示两次，但不会产生相同的性能，而是作为一般线索。

+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+
| id | select_type | table    | type  | possible_keys | key        | key_len | ref                  | rows   | Extra                                        |
+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | campus   | index | PRIMARY       | PRIMARY    | 4       | NULL                 |     20 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | students | ref   | ix_stu_cam    | ix_stu_cam | 5       | bigtest.campus.camID | 123766 | Using index                                  |
+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+

Run Code Online (Sandbox Code Playgroud)

优化器目前似乎更倾向于使用别名，特别是对于DISTINCT.

归档时间：	9 年，7 月前
查看次数：	725 次
最近记录：	9 年，7 月前