mySQL 查询优化 — 多重连接或选择 ... where not in (select distinct...)?

Owe*_*ker 6 mysql performance join subquery query-performance

背景

我有一个 Drupal 安装访问一个大型用户数据库(~200k 行),我的“人员查找器”功能需要访问所有这些行(以随机顺序)。我似乎无法在 Drupal 的 UI 中使用LIMIT和使用OFFSET(并且我在 Drupal 视图中使用大型数据集进行慢查询时在 Drupal.SE 上有一个与 Drupal 特定的问题略有不同- 更适合在 SQL 或 PHP 中处理?,哪个解决了这个问题,以及这个问题的一部分),但我的特定于 mySQL 的问题如下

我需要根据另一个表中的数据排除一些行(“包括角色 A 中所有不在角色 B、C 或 D 中的用户)。Drupal 生成的查询是有效的

SELECT
    users.uid AS uid,
    /* some columns */,
    RAND() AS random_field
FROM
    users users
    INNER JOIN users_roles users_roles ON users.uid = users_roles.uid
    LEFT JOIN users_roles users_roles2
        ON users.uid = users_roles2.uid
        AND (users_roles2.rid = :views_join_condition_0
          OR users_roles2.rid = :views_join_condition_1
          OR users_roles2.rid = :views_join_condition_2)
WHERE
    (( (users.status <> :db_condition_placeholder_3)      -- Active users only
    AND (users_roles.rid = :db_condition_placeholder_4)   -- Must be in rôle A
    AND (users_roles2.rid IS NULL)                        -- Must not be in rôles B, C, D
    AND (users.uid != :users_uid OR users.uid IS NULL) )) -- Must not be current user
ORDER BY random_field ASC
Run Code Online (Sandbox Code Playgroud)

(对 的引用users.uid IS NULL是一个红鲱鱼;这不应该是这种情况,并且与此查询没有密切关系。)

现在让我震惊的是,手动滚动 Drupal 的过滤条件(在 Drupal 的 UI 的约束范围内)可能会有所帮助——我可以对几乎所有的:db_condition_placeholders进行硬编码——但我不确定以下两个选项之间是否存在显着的性能差异:

  1. FROM条款更改为

    FROM users INNER JOIN users_roles
        ON users.uid = users_roles.uid AND users_roles.rid NOT IN (6, 8, 9)
    
    Run Code Online (Sandbox Code Playgroud)

    (然后WHERE users_roles.rid = 5像以前一样,只是删除users_roles2引用);或者

  2. JOIN完全删除并将WHERE子句更改为:

    WHERE users.status = 1                                             -- Active users only
        AND users.uid IN
           (SELECT DISTINCT uid FROM users_roles WHERE rid = 5)        -- Must be in rôle A
        AND users.uid NOT IN
           (SELECT DISTINCT uid FROM users_roles WHERE rid IN (6,8,9)) -- Not rôles B, C, D
        AND users.uid != :users_uid                                    -- Not current user
    
    Run Code Online (Sandbox Code Playgroud)

额外的信息

如果有帮助,mySQL 版本号是5.1.50-enterprise-gpl-pro,所有表都使用 InnoDB 存储引擎,并且该表users_roles已经具有跨两列的聚集主键:

mysql> describe users_roles;
+-------+------------------+------+-----+---------+-------+
| Field | Type             | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+-------+
| uid   | int(10) unsigned | NO   | PRI | 0       |       |
| rid   | int(10) unsigned | NO   | PRI | 0       |       |
+-------+------------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)

我看到的实际性能问题是双重的——服务器在 RAM 上已经达到极限,我在这里讨论的查询需要 2 秒多的时间来执行。我猜我无法在不看的情况下寻址 RAMLIMITand 的OFFSET,但加快此查询绝对是一个好的开始。

应要求提供更多额外信息

mysql> describe users;
+------------------+------------------+------+-----+---------+-------+
| Field            | Type             | Null | Key | Default | Extra |
+------------------+------------------+------+-----+---------+-------+
| uid              | int(10) unsigned | NO   | PRI | 0       |       |
| name             | varchar(60)      | NO   | UNI |         |       |
| pass             | varchar(128)     | NO   |     |         |       |
| mail             | varchar(254)     | YES  | MUL |         |       |
| theme            | varchar(255)     | NO   |     |         |       |
| signature        | varchar(255)     | NO   |     |         |       |
| signature_format | varchar(255)     | YES  |     | NULL    |       |
| created          | int(11)          | NO   | MUL | 0       |       |
| access           | int(11)          | NO   | MUL | 0       |       |
| login            | int(11)          | NO   |     | 0       |       |
| status           | tinyint(4)       | NO   |     | 0       |       |
| timezone         | varchar(32)      | YES  |     | NULL    |       |
| language         | varchar(12)      | NO   |     |         |       |
| picture          | int(11)          | NO   |     | 0       |       |
| init             | varchar(254)     | YES  |     |         |       |
| data             | longblob         | YES  |     | NULL    |       |
+------------------+------------------+------+-----+---------+-------+
16 rows in set (0.00 sec)

mysql> EXPLAIN EXTENDED SELECT
    users.uid AS uid,
    /* some columns */,
    RAND() AS random_field
FROM
    users users
    INNER JOIN users_roles users_roles ON users.uid = users_roles.uid
    LEFT JOIN users_roles users_roles2
        ON users.uid = users_roles2.uid
        AND (users_roles2.rid = 6 OR users_roles2.rid = 8 OR users_roles2.rid = 9)
WHERE
    (( (users.status <> 0)                           -- Active users only
    AND (users_roles.rid = 5)                        -- Must be in rôle A
    AND (users_roles2.rid IS NULL)                   -- Not in rôles B, C, D
    AND (users.uid != 35635 OR users.uid IS NULL) )) -- Not (random valid UID)
ORDER BY random_field ASC
+----+-------------+--------------+--------+---------------+---------+---------+------------------------+-------+-----------------------------------------------------------+
| id | select_type | table        | type   | possible_keys | key     | key_len | ref                    | rows  | Extra                                                     |
+----+-------------+--------------+--------+---------------+---------+---------+------------------------+-------+-----------------------------------------------------------+
|  1 | SIMPLE      | users_roles  | ref    | PRIMARY,rid   | rid     | 4       | const                  | 69985 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | users        | eq_ref | PRIMARY       | PRIMARY | 4       | dbname.users_roles.uid |     1 | Using where                                               |
|  1 | SIMPLE      | users_roles2 | ref    | PRIMARY,rid   | PRIMARY | 4       | dbname.users.uid       |     1 | Using where; Using index; Not exists                      |
+----+-------------+--------------+--------+---------------+---------+---------+-----------------------+-------+-----------------------------------------------------------+
3 rows in set, 1 warning (0.01 sec)
Run Code Online (Sandbox Code Playgroud)

ype*_*eᵀᴹ 3

    \n
  • 我会在 上添加一个索引users_roles (rid, uid)。在具有两列的多对多表中(a,b),您几乎总是需要两个索引:(a,b)并且(b,a)在一个查询或另一个查询中。我认为这个索引会对这个查询有所帮助。

  • \n
  • 尝试对查询及其EXPLAIN EXTENDED产生的结果进行各种重写。

  • \n
  • 关于您的建议,第一个是不正确的(它不会显示相同的结果)。对于第二个建议:

  • \n
\n\n\n\n
WHERE users.status = 1                                           -- Active users only\n
Run Code Online (Sandbox Code Playgroud)\n\n

是的,这比users.status <> 0. 如果有索引,此更改可能会产生更好的效果users (status)(如果活跃用户不多,效果甚至会更好)。使用 B 树来优化布尔列(或充当布尔值的列)的查询并不容易。

\n\n
  AND users.uid IN\n     (SELECT DISTINCT uid FROM users_roles WHERE rid = 5)        -- Must be in r\xc3\xb4le A\n
Run Code Online (Sandbox Code Playgroud)\n\n

不。众所周知,MySQL 存在问题column IN (SELECT ...),特别是当外部表很大时(而你的表有 200K 列,所以不,不好)。

\n\n
  AND users.uid NOT IN\n     (SELECT DISTINCT uid FROM users_roles WHERE rid IN (6,8,9)) -- Not r\xc3\xb4les B, C, D\n
Run Code Online (Sandbox Code Playgroud)\n\n

是的,这是重写的一种方法。但这DISTINCT是多余的。

\n\n
  AND users.uid <> :users_uid                                    -- Not current user\n
Run Code Online (Sandbox Code Playgroud)\n\n

是的,删除users.uid IS NOT NULL可能会有所帮助,并且不会改变结果。

\n\n
    \n
  • 您可以尝试的其他事情:
  • \n
\n\n

rid = 5条件移至ON子句:

\n\n
INNER JOIN users_roles users_roles \n  ON  users.uid = users_roles.uid\n  AND users_roles.rid = 5\n
Run Code Online (Sandbox Code Playgroud)\n\n

(重写) toNOT IN也可以写成NOT EXISTS

\n\n
  AND NOT EXISTS \n      ( SELECT * \n        FROM users_roles ur \n        WHERE ur.uid = users.uid \n          AND ur.rid IN (6,8,9)\n      )\n
Run Code Online (Sandbox Code Playgroud)\n