Owe*_*ker 6 mysql performance join subquery query-performance
我有一个 Drupal 安装访问一个大型用户数据库(~200k 行),我的“人员查找器”功能需要访问所有这些行(以随机顺序)。我似乎无法在 Drupal 的 UI 中使用LIMIT和使用OFFSET(并且我在 Drupal 视图中使用大型数据集进行慢查询时在 Drupal.SE 上有一个与 Drupal 特定的问题略有不同- 更适合在 SQL 或 PHP 中处理?,哪个解决了这个问题,以及这个问题的一部分),但我的特定于 mySQL 的问题如下:
我需要根据另一个表中的数据排除一些行(“包括角色 A 中所有不在角色 B、C 或 D 中的用户)。Drupal 生成的查询是有效的
SELECT
users.uid AS uid,
/* some columns */,
RAND() AS random_field
FROM
users users
INNER JOIN users_roles users_roles ON users.uid = users_roles.uid
LEFT JOIN users_roles users_roles2
ON users.uid = users_roles2.uid
AND (users_roles2.rid = :views_join_condition_0
OR users_roles2.rid = :views_join_condition_1
OR users_roles2.rid = :views_join_condition_2)
WHERE
(( (users.status <> :db_condition_placeholder_3) -- Active users only
AND (users_roles.rid = :db_condition_placeholder_4) -- Must be in rôle A
AND (users_roles2.rid IS NULL) -- Must not be in rôles B, C, D
AND (users.uid != :users_uid OR users.uid IS NULL) )) -- Must not be current user
ORDER BY random_field ASC
Run Code Online (Sandbox Code Playgroud)
(对 的引用users.uid IS NULL是一个红鲱鱼;这不应该是这种情况,并且与此查询没有密切关系。)
现在让我震惊的是,手动滚动 Drupal 的过滤条件(在 Drupal 的 UI 的约束范围内)可能会有所帮助——我可以对几乎所有的:db_condition_placeholders进行硬编码——但我不确定以下两个选项之间是否存在显着的性能差异:
将FROM条款更改为
FROM users INNER JOIN users_roles
ON users.uid = users_roles.uid AND users_roles.rid NOT IN (6, 8, 9)
Run Code Online (Sandbox Code Playgroud)
(然后WHERE users_roles.rid = 5像以前一样,只是删除users_roles2引用);或者
JOIN完全删除并将WHERE子句更改为:
WHERE users.status = 1 -- Active users only
AND users.uid IN
(SELECT DISTINCT uid FROM users_roles WHERE rid = 5) -- Must be in rôle A
AND users.uid NOT IN
(SELECT DISTINCT uid FROM users_roles WHERE rid IN (6,8,9)) -- Not rôles B, C, D
AND users.uid != :users_uid -- Not current user
Run Code Online (Sandbox Code Playgroud)如果有帮助,mySQL 版本号是5.1.50-enterprise-gpl-pro,所有表都使用 InnoDB 存储引擎,并且该表users_roles已经具有跨两列的聚集主键:
mysql> describe users_roles;
+-------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+-------+
| uid | int(10) unsigned | NO | PRI | 0 | |
| rid | int(10) unsigned | NO | PRI | 0 | |
+-------+------------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)
我看到的实际性能问题是双重的——服务器在 RAM 上已经达到极限,我在这里讨论的查询需要 2 秒多的时间来执行。我猜我无法在不看的情况下寻址 RAMLIMITand 的OFFSET,但加快此查询绝对是一个好的开始。
mysql> describe users;
+------------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------------+------+-----+---------+-------+
| uid | int(10) unsigned | NO | PRI | 0 | |
| name | varchar(60) | NO | UNI | | |
| pass | varchar(128) | NO | | | |
| mail | varchar(254) | YES | MUL | | |
| theme | varchar(255) | NO | | | |
| signature | varchar(255) | NO | | | |
| signature_format | varchar(255) | YES | | NULL | |
| created | int(11) | NO | MUL | 0 | |
| access | int(11) | NO | MUL | 0 | |
| login | int(11) | NO | | 0 | |
| status | tinyint(4) | NO | | 0 | |
| timezone | varchar(32) | YES | | NULL | |
| language | varchar(12) | NO | | | |
| picture | int(11) | NO | | 0 | |
| init | varchar(254) | YES | | | |
| data | longblob | YES | | NULL | |
+------------------+------------------+------+-----+---------+-------+
16 rows in set (0.00 sec)
mysql> EXPLAIN EXTENDED SELECT
users.uid AS uid,
/* some columns */,
RAND() AS random_field
FROM
users users
INNER JOIN users_roles users_roles ON users.uid = users_roles.uid
LEFT JOIN users_roles users_roles2
ON users.uid = users_roles2.uid
AND (users_roles2.rid = 6 OR users_roles2.rid = 8 OR users_roles2.rid = 9)
WHERE
(( (users.status <> 0) -- Active users only
AND (users_roles.rid = 5) -- Must be in rôle A
AND (users_roles2.rid IS NULL) -- Not in rôles B, C, D
AND (users.uid != 35635 OR users.uid IS NULL) )) -- Not (random valid UID)
ORDER BY random_field ASC
+----+-------------+--------------+--------+---------------+---------+---------+------------------------+-------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+---------------+---------+---------+------------------------+-------+-----------------------------------------------------------+
| 1 | SIMPLE | users_roles | ref | PRIMARY,rid | rid | 4 | const | 69985 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | dbname.users_roles.uid | 1 | Using where |
| 1 | SIMPLE | users_roles2 | ref | PRIMARY,rid | PRIMARY | 4 | dbname.users.uid | 1 | Using where; Using index; Not exists |
+----+-------------+--------------+--------+---------------+---------+---------+-----------------------+-------+-----------------------------------------------------------+
3 rows in set, 1 warning (0.01 sec)
Run Code Online (Sandbox Code Playgroud)
我会在 上添加一个索引users_roles (rid, uid)。在具有两列的多对多表中(a,b),您几乎总是需要两个索引:(a,b)并且(b,a)在一个查询或另一个查询中。我认为这个索引会对这个查询有所帮助。
尝试对查询及其EXPLAIN EXTENDED产生的结果进行各种重写。
关于您的建议,第一个是不正确的(它不会显示相同的结果)。对于第二个建议:
WHERE users.status = 1 -- Active users only\nRun Code Online (Sandbox Code Playgroud)\n\n是的,这比users.status <> 0. 如果有索引,此更改可能会产生更好的效果users (status)(如果活跃用户不多,效果甚至会更好)。使用 B 树来优化布尔列(或充当布尔值的列)的查询并不容易。
AND users.uid IN\n (SELECT DISTINCT uid FROM users_roles WHERE rid = 5) -- Must be in r\xc3\xb4le A\nRun Code Online (Sandbox Code Playgroud)\n\n不。众所周知,MySQL 存在问题column IN (SELECT ...),特别是当外部表很大时(而你的表有 200K 列,所以不,不好)。
AND users.uid NOT IN\n (SELECT DISTINCT uid FROM users_roles WHERE rid IN (6,8,9)) -- Not r\xc3\xb4les B, C, D\nRun Code Online (Sandbox Code Playgroud)\n\n是的,这是重写的一种方法。但这DISTINCT是多余的。
AND users.uid <> :users_uid -- Not current user\nRun Code Online (Sandbox Code Playgroud)\n\n是的,删除users.uid IS NOT NULL可能会有所帮助,并且不会改变结果。
将rid = 5条件移至ON子句:
INNER JOIN users_roles users_roles \n ON users.uid = users_roles.uid\n AND users_roles.rid = 5\nRun Code Online (Sandbox Code Playgroud)\n\n(重写) toNOT IN也可以写成NOT EXISTS:
AND NOT EXISTS \n ( SELECT * \n FROM users_roles ur \n WHERE ur.uid = users.uid \n AND ur.rid IN (6,8,9)\n )\nRun Code Online (Sandbox Code Playgroud)\n