跨层次数据优化MySQL查询

egg*_*yal 13 mysql sql database-design query-optimization data-structures

我有一个相当稳定的有序图~100k顶点和大小~1k边.它是二维的,因为它的顶点可以用一对整数(x, y)(基数~100 x~1000)来识别,并且所有边都严格增加x.

此外,还存在(key, val)与每个顶点相关联的~1k 对的字典.

我目前存储在三个(InnoDB的)表中的MySQL数据库的图形:顶点(我不认为是有关我的问题的表,所以我忽略了包括它,这指的是外键约束它在我的摘录中); 一个包含词典的表格; 和Bill Karwin雄辩地描述的连接顶点的"闭合表".

顶点字典表定义如下:

CREATE TABLE `VertexDictionary` (
  `x`   smallint(6) unsigned NOT NULL,
  `y`   smallint(6) unsigned NOT NULL,
  `key` varchar(50) NOT NULL DEFAULT '',
  `val` smallint(1) DEFAULT NULL,
  PRIMARY KEY (`x`, `y`  , `key`),
  KEY  `dict` (`x`, `key`, `val`)
);
Run Code Online (Sandbox Code Playgroud)

和连接顶点的闭包表:

CREATE TABLE `ConnectedVertices` (
  `tail_x` smallint(6) unsigned NOT NULL,
  `tail_y` smallint(6) unsigned NOT NULL,
  `head_x` smallint(6) unsigned NOT NULL,
  `head_y` smallint(6) unsigned NOT NULL,
  PRIMARY KEY   (`tail_x`, `tail_y`, `head_x`),
  KEY `reverse` (`head_x`, `head_y`, `tail_x`),
  KEY `fx` (`tail_x`, `head_x`),
  KEY `rx` (`head_x`, `tail_x`)
);
Run Code Online (Sandbox Code Playgroud)

还存在(x, key)对的字典,使得对于每个这样的对,所有用x它们标识的顶点在其字典内具有该值key.该词典存储在第四个表中:

CREATE TABLE `SpecialKeys` (
  `x`   smallint(6) unsigned NOT NULL,
  `key` varchar(50) NOT NULL DEFAULT '',
  PRIMARY KEY (`x`),
  KEY `xkey`  (`x`, `key`)
);
Run Code Online (Sandbox Code Playgroud)

我经常希望提取具有特定的所有顶点的字典中使用的键集x=X,以及SpecialKeys连接到左侧的任何相关值:

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
       `ConnectedVertices` AS `c`
  JOIN `VertexDictionary`  AS `u` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
  JOIN `VertexDictionary`  AS `v` ON (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
WHERE
  `v`.`x` = X
;
Run Code Online (Sandbox Code Playgroud)

的量,EXPLAIN输出是:

id   select_type   table   type     possible_keys           key       key_len   ref                                rows   Extra
 1   SIMPLE        k       index    PRIMARY,xkey            xkey          154   NULL                                 40   Using index; Using temporary
 1   SIMPLE        c       ref      PRIMARY,reverse,fx,rx   PRIMARY         2   db.k.x                                1   Using where
 1   SIMPLE        v       ref      PRIMARY,dict            PRIMARY         4   const,db.c.head_y                   136   Using index
 1   SIMPLE        u       eq_ref   PRIMARY,dict            PRIMARY       156   db.c.tail_x,db.c.tail_y,db.k.key      1   Using where

但是这个查询需要大约10秒才能完成.一直在撞墙试图改善问题,但无济于事.

可以改进查询,还是应该考虑不同的数据结构?非常感谢你的想法!


UPDATE

我仍然无处可去,虽然我重建了表并发现EXPLAIN输出略有不同(如上所示,从中获取的行数v从1增加到136!); 查询仍然需要大约10秒才能执行.

我真的不明白这里发生了什么.查询获得所有(x, y, SpecialValue)和所有(x, y, key)元组(分别为30毫秒〜和〜150毫秒)都非常快,但基本上是连接两个花费的时间比他们的合并时间超过五十次长...我怎样才能提高执行加入所需的时间?

输出SHOW VARIABLES LIKE '%innodb%';如下:

Variable_name                    Value
------------------------------------------------------------
have_innodb                      YES
ignore_builtin_innodb            ON
innodb_adaptive_flushing         ON
innodb_adaptive_hash_index       ON
innodb_additional_mem_pool_size  2097152
innodb_autoextend_increment      8
innodb_autoinc_lock_mode         1
innodb_buffer_pool_size          1179648000
innodb_change_buffering          inserts
innodb_checksums                 ON
innodb_commit_concurrency        0
innodb_concurrency_tickets       500
innodb_data_file_path            ibdata1:10M:autoextend
innodb_data_home_dir             /rdsdbdata/db/innodb
innodb_doublewrite               ON
innodb_fast_shutdown             1
innodb_file_format               Antelope
innodb_file_format_check         Barracuda
innodb_file_per_table            ON
innodb_flush_log_at_trx_commit   1
innodb_flush_method              O_DIRECT
innodb_force_recovery            0
innodb_io_capacity               200
innodb_lock_wait_timeout         50
innodb_locks_unsafe_for_binlog   OFF
innodb_log_buffer_size           8388608
innodb_log_file_size             134217728
innodb_log_files_in_group        2
innodb_log_group_home_dir        /rdsdbdata/log/innodb
innodb_max_dirty_pages_pct       75
innodb_max_purge_lag             0
innodb_mirrored_log_groups       1
innodb_old_blocks_pct            37
innodb_old_blocks_time           0
innodb_open_files                300
innodb_read_ahead_threshold      56
innodb_read_io_threads           4
innodb_replication_delay         0
innodb_rollback_on_timeout       OFF
innodb_spin_wait_delay           6
innodb_stats_method              nulls_equal
innodb_stats_on_metadata         ON
innodb_stats_sample_pages        8
innodb_strict_mode               OFF
innodb_support_xa                ON
innodb_sync_spin_loops           30
innodb_table_locks               ON
innodb_thread_concurrency        0
innodb_thread_sleep_delay        10000
innodb_use_sys_malloc            ON
innodb_version                   1.0.16
innodb_write_io_threads          4

DRa*_*app 0

其他人可能不同意,但我已经并定期提供 STRAIGHT_JOIN 查询...一旦您了解数据和关系。由于您的 WHERE 子句针对“V”表别名并且它是“x”值,因此您可以很好地使用索引。将其移至前面位置,然后从该位置加入。

SELECT STRAIGHT_JOIN DISTINCT
      v.`key`,
      u.`val`
   FROM
      VertexDictionary AS v 

         JOIN ConnectedVertices AS c
            ON v.x = c.head_x
            AND v.y = c.head_y

            JOIN VertexDictionary AS u 
               ON c.tail_x = u.x 
               AND c.tail_y = u.y

               JOIN SpecialKeys AS k
                  ON u.x = k.x
                  AND u.key = k.key
   WHERE
      v.x = {some value}      
Run Code Online (Sandbox Code Playgroud)

很想知道这种调整对您有何帮助