什么是适当的范围/索引,以帮助使scoped发现通过rails更高性能?

con*_*are 6 mysql database optimization scope ruby-on-rails

我有一个相对较大的4深度关系数据设置,如下所示:

ClientApplication         has_many => ClientApplicationVersions
ClientApplicationVersions has_many => CloudLogs
CloudLogs                 has_many => Logs
Run Code Online (Sandbox Code Playgroud)

client_applications:(可能有1,000个记录)
   - ...
   - account_id
   - public_key
   -deleted_at

client_application_versions:(可能有10,000个记录)
   - ...
   - client_application_id
   - public_key
   -deleted_at

cloud_logs:(可能有1,000,000的记录)
   - ......
   - client_application_version_id
   - public_key
   -deleted_at

logs:(可能有1,000,000,000的记录)
   - ...
   - cloud_log_id
   - public_key
   - time_stamp
   -deleted_at


我还在开发中,所以结构和设置并不是一成不变的,但我希望它设置正常.使用Rails 3.2.11和InnoDB MySQL.数据库当前填充了一小部分(与最终的db大小相比)数据集(logs只有500,000行)我有4个范围的查询,其中3个有问题,以检索日志.

  1. 抓住日志第一页,按时间戳排序,通过限制account_id,client_application.public_key,client_application_version.public_key(超过100秒)
  2. 抓住第一页日志,按时间戳排序,受限制account_id,client_application.public_key(超过100秒)
  3. 抓住第一页日志,按时间戳排序,限制为account_id(超过100秒)
  4. 抓住第一页日志,按时间戳排序(~2秒)

我正在使用rails scope来帮助进行这些调用:

  scope :account_id, proc {|account_id| joins(:client_application).where("client_applications.account_id = ?", account_id) }
  scope :client_application_key, proc {|client_application_key| joins(:client_application).where("client_applications.public_key = ?", client_application_key) }
  scope :client_application_version_key, proc {|client_application_version_key| joins(:client_application_version).where("client_application_versions.public_key = ?", client_application_version_key) }

  default_scope order('logs.timestamp DESC')
Run Code Online (Sandbox Code Playgroud)

我在每张桌子上都有索引public_key.我在logs表上有几个索引,包括优化程序更喜欢使用的索引(index_logs_on_cloud_log_id),但查询仍然需要很长时间才能运行.


以下是我在调用方法的方法rails console:

Log.account_id(1).client_application_key('p0kZudG0').client_application_version_key('0HgoJRyE').page(1)
Run Code Online (Sandbox Code Playgroud)

......这是rails将其转化为:

SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `cloud_logs`.`id` = `logs`.`cloud_log_id` INNER JOIN `client_application_versions` ON `client_application_versions`.`id` = `cloud_logs`.`client_application_version_id` INNER JOIN `client_applications` ON `client_applications`.`id` = `client_application_versions`.`client_application_id` INNER JOIN `cloud_logs` `cloud_logs_logs_join` ON `cloud_logs_logs_join`.`id` = `logs`.`cloud_log_id` INNER JOIN `client_application_versions` `client_application_versions_logs` ON `client_application_versions_logs`.`id` = `cloud_logs_logs_join`.`client_application_version_id` WHERE (logs.deleted_at IS NULL) AND (client_applications.account_id = 1) AND (client_applications.public_key = 'p0kZudG0') AND (client_application_versions.public_key = '0HgoJRyE') ORDER BY logs.timestamp DESC LIMIT 100 OFFSET 0
Run Code Online (Sandbox Code Playgroud)

...这是该查询的EXPLAIN语句.

+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+
| id | select_type | table                            | type   | possible_keys                                                                                                                                         | key                                               | key_len | ref                                                                    | rows | Extra                                        |
+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | client_application_versions      | ref    | PRIMARY,index_client_application_versions_on_client_application_id,index_client_application_versions_on_public_key                                    | index_client_application_versions_on_public_key   | 768     | const                                                                  |    1 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | client_applications              | eq_ref | PRIMARY,index_client_applications_on_account_id,index_client_applications_on_public_key                                                               | PRIMARY                                           | 4       | cloudlog_production.client_application_versions.client_application_id  |    1 | Using where                                  |
|  1 | SIMPLE      | cloud_logs                       | ref    | PRIMARY,index_cloud_logs_on_client_application_version_id                                                                                             | index_cloud_logs_on_client_application_version_id | 5       | cloudlog_production.client_application_versions.id                     |  481 | Using where; Using index                     |
|  1 | SIMPLE      | cloud_logs_logs_join             | eq_ref | PRIMARY,index_cloud_logs_on_client_application_version_id                                                                                             | PRIMARY                                           | 4       | cloudlog_production.cloud_logs.id                                      |    1 |                                              |
|  1 | SIMPLE      | client_application_versions_logs | eq_ref | PRIMARY                                                                                                                                               | PRIMARY                                           | 4       | cloudlog_production.cloud_logs_logs_join.client_application_version_id |    1 | Using index                                  |
|  1 | SIMPLE      | logs                             | ref    | index_logs_on_cloud_log_id_and_deleted_at_and_timestamp,index_logs_on_cloud_log_id_and_deleted_at,index_logs_on_cloud_log_id,index_logs_on_deleted_at | index_logs_on_cloud_log_id                        | 5       | cloudlog_production.cloud_logs.id                                      |    4 | Using where                                  |
+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

这个问题有三个部分:

  1. 我可以使用其他索引优化我的数据库,以帮助这些类型的依赖于连接的排序查询变得更高效吗?
  2. 我可以优化rails代码以find更高效的方式帮助这种类型的运行吗?
  3. 我只是接近这个范​​围,找到大数据集的错误方法?




更新2012年1月24日
正如Geoff和J_MCCaffrey在答案中所建议的那样,我将查询分成3个不同的部分来尝试隔离问题.正如预期的那样,处理最大的表是一个问题.MYSQL优化器通过使用不同的索引来处理这种情况,但延迟仍然存在.这是这种方法的EXPLAIN.

ClientApplication.find_by_account_id_and_public_key(1, 'p0kZudG0').versions.select{|cav| cav.public_key = '0HgoJRyE'}.first.logs.page(2)
  ClientApplication Load (165.9ms)  SELECT `client_applications`.* FROM `client_applications` WHERE `client_applications`.`account_id` = 1 AND `client_applications`.`public_key` = 'p0kZudG0' AND (client_applications.deleted_at IS NULL) ORDER BY client_applications.id LIMIT 1
  ClientApplicationVersion Load (105.1ms)  SELECT `client_application_versions`.* FROM `client_application_versions` WHERE `client_application_versions`.`client_application_id` = 3 AND (client_application_versions.deleted_at IS NULL) ORDER BY client_application_versions.created_at DESC, client_application_versions.id DESC
  Log Load (57295.0ms)  SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
  EXPLAIN (214.5ms)  EXPLAIN SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
EXPLAIN for: SELECT  `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table      | type        | possible_keys                                                                                                                                         | key                                                                              | key_len | ref                               | rows | Extra                                                                                                                                           |
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|  1 | SIMPLE      | cloud_logs | index_merge | PRIMARY,index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at                                                              | index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at | 5,9     | NULL                              | 1874 | Using intersect(index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at); Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | logs       | ref         | index_logs_on_cloud_log_id_and_deleted_at_and_timestamp,index_logs_on_cloud_log_id_and_deleted_at,index_logs_on_cloud_log_id,index_logs_on_deleted_at | index_logs_on_cloud_log_id                                                       | 5       | cloudlog_production.cloud_logs.id |    4 | Using where                                                                                                                                     |
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)




更新1/25/12
以下是所有相关表格的索引:

CLIENT_APPLICATIONS:
  PRIMARY KEY  (`id`),
  UNIQUE KEY `index_client_applications_on_key` (`key`),
  KEY `index_client_applications_on_account_id` (`account_id`),
  KEY `index_client_applications_on_deleted_at` (`deleted_at`),
  KEY `index_client_applications_on_public_key` (`public_key`)

CLIENT_APPLICATION_VERSIONS:
  PRIMARY KEY  (`id`),
  KEY `index_client_application_versions_on_client_application_id` (`client_application_id`),
  KEY `index_client_application_versions_on_deleted_at` (`deleted_at`),
  KEY `index_client_application_versions_on_public_key` (`public_key`)

CLOUD_LOGS:
  PRIMARY KEY  (`id`),
  KEY `index_cloud_logs_on_api_client_version_id` (`api_client_version_id`),
  KEY `index_cloud_logs_on_client_application_version_id` (`client_application_version_id`),
  KEY `index_cloud_logs_on_deleted_at` (`deleted_at`),
  KEY `index_cloud_logs_on_device_id` (`device_id`),
  KEY `index_cloud_logs_on_public_key` (`public_key`),
  KEY `index_cloud_logs_on_received_at` (`received_at`)

LOGS:
  PRIMARY KEY  (`id`),
  KEY `index_logs_on_class_name` (`class_name`),
  KEY `index_logs_on_cloud_log_id_and_deleted_at_and_timestamp` (`cloud_log_id`,`deleted_at`,`timestamp`),
  KEY `index_logs_on_cloud_log_id_and_deleted_at` (`cloud_log_id`,`deleted_at`),
  KEY `index_logs_on_cloud_log_id` (`cloud_log_id`),
  KEY `index_logs_on_deleted_at` (`deleted_at`),
  KEY `index_logs_on_file_name` (`file_name`),
  KEY `index_logs_on_method_name` (`method_name`),
  KEY `index_logs_on_public_key` (`public_key`),
  KEY `index_logs_on_timestamp` USING BTREE (`timestamp`)
Run Code Online (Sandbox Code Playgroud)

con*_*are 0

我写这篇文章是对我自己的问题的可能解决方案,希望能得到更好的答案。目前,数据库已完全建立并按规定关系。

ClientApplication         has_many => ClientApplicationVersions
ClientApplicationVersions has_many => CloudLogs
CloudLogs                 has_many => Logs
Run Code Online (Sandbox Code Playgroud)

这意味着当我需要查找属于客户端应用程序的日志时,我必须执行 3 次额外的连接才能获取它。通过向 Logs 表引入一些foreign_key非规范化,我可以跳过所有连接:

ClientApplication         has_many => ClientApplicationVersions
ClientApplication         has_many => Logs
ClientApplicationVersions has_many => CloudLogs
ClientApplicationVersions has_many => Logs
CloudLogs                 has_many => Logs
Run Code Online (Sandbox Code Playgroud)

最终结果是我的日志表中会有一些额外的列:client_application_keyclient_application_version_keycloud_log_key

尽管我冒着数据不一致的风险,但我可以避免此处的 3 个连接,这些连接会降低查询的性能。请有人劝我别再这样了。