con*_*are 6 mysql database optimization scope ruby-on-rails
我有一个相对较大的4深度关系数据设置,如下所示:
ClientApplication has_many => ClientApplicationVersions
ClientApplicationVersions has_many => CloudLogs
CloudLogs has_many => Logs
Run Code Online (Sandbox Code Playgroud)
client_applications:(可能有1,000个记录)
- ...
- account_id
- public_key
-deleted_at
client_application_versions:(可能有10,000个记录)
- ...
- client_application_id
- public_key
-deleted_at
cloud_logs:(可能有1,000,000的记录)
- ......
- client_application_version_id
- public_key
-deleted_at
logs:(可能有1,000,000,000的记录)
- ...
- cloud_log_id
- public_key
- time_stamp
-deleted_at
我还在开发中,所以结构和设置并不是一成不变的,但我希望它设置正常.使用Rails 3.2.11和InnoDB MySQL.数据库当前填充了一小部分(与最终的db大小相比)数据集(logs只有500,000行)我有4个范围的查询,其中3个有问题,以检索日志.
account_id,client_application.public_key,client_application_version.public_key(超过100秒)account_id,client_application.public_key(超过100秒)account_id(超过100秒)我正在使用rails scope来帮助进行这些调用:
scope :account_id, proc {|account_id| joins(:client_application).where("client_applications.account_id = ?", account_id) }
scope :client_application_key, proc {|client_application_key| joins(:client_application).where("client_applications.public_key = ?", client_application_key) }
scope :client_application_version_key, proc {|client_application_version_key| joins(:client_application_version).where("client_application_versions.public_key = ?", client_application_version_key) }
default_scope order('logs.timestamp DESC')
Run Code Online (Sandbox Code Playgroud)
我在每张桌子上都有索引public_key.我在logs表上有几个索引,包括优化程序更喜欢使用的索引(index_logs_on_cloud_log_id),但查询仍然需要很长时间才能运行.
以下是我在调用方法的方法rails console:
Log.account_id(1).client_application_key('p0kZudG0').client_application_version_key('0HgoJRyE').page(1)
Run Code Online (Sandbox Code Playgroud)
......这是rails将其转化为:
SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `cloud_logs`.`id` = `logs`.`cloud_log_id` INNER JOIN `client_application_versions` ON `client_application_versions`.`id` = `cloud_logs`.`client_application_version_id` INNER JOIN `client_applications` ON `client_applications`.`id` = `client_application_versions`.`client_application_id` INNER JOIN `cloud_logs` `cloud_logs_logs_join` ON `cloud_logs_logs_join`.`id` = `logs`.`cloud_log_id` INNER JOIN `client_application_versions` `client_application_versions_logs` ON `client_application_versions_logs`.`id` = `cloud_logs_logs_join`.`client_application_version_id` WHERE (logs.deleted_at IS NULL) AND (client_applications.account_id = 1) AND (client_applications.public_key = 'p0kZudG0') AND (client_application_versions.public_key = '0HgoJRyE') ORDER BY logs.timestamp DESC LIMIT 100 OFFSET 0
Run Code Online (Sandbox Code Playgroud)
...这是该查询的EXPLAIN语句.
+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | client_application_versions | ref | PRIMARY,index_client_application_versions_on_client_application_id,index_client_application_versions_on_public_key | index_client_application_versions_on_public_key | 768 | const | 1 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | client_applications | eq_ref | PRIMARY,index_client_applications_on_account_id,index_client_applications_on_public_key | PRIMARY | 4 | cloudlog_production.client_application_versions.client_application_id | 1 | Using where |
| 1 | SIMPLE | cloud_logs | ref | PRIMARY,index_cloud_logs_on_client_application_version_id | index_cloud_logs_on_client_application_version_id | 5 | cloudlog_production.client_application_versions.id | 481 | Using where; Using index |
| 1 | SIMPLE | cloud_logs_logs_join | eq_ref | PRIMARY,index_cloud_logs_on_client_application_version_id | PRIMARY | 4 | cloudlog_production.cloud_logs.id | 1 | |
| 1 | SIMPLE | client_application_versions_logs | eq_ref | PRIMARY | PRIMARY | 4 | cloudlog_production.cloud_logs_logs_join.client_application_version_id | 1 | Using index |
| 1 | SIMPLE | logs | ref | index_logs_on_cloud_log_id_and_deleted_at_and_timestamp,index_logs_on_cloud_log_id_and_deleted_at,index_logs_on_cloud_log_id,index_logs_on_deleted_at | index_logs_on_cloud_log_id | 5 | cloudlog_production.cloud_logs.id | 4 | Using where |
+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
这个问题有三个部分:
find更高效的方式帮助这种类型的运行吗?
更新2012年1月24日
正如Geoff和J_MCCaffrey在答案中所建议的那样,我将查询分成3个不同的部分来尝试隔离问题.正如预期的那样,处理最大的表是一个问题.MYSQL优化器通过使用不同的索引来处理这种情况,但延迟仍然存在.这是这种方法的EXPLAIN.
ClientApplication.find_by_account_id_and_public_key(1, 'p0kZudG0').versions.select{|cav| cav.public_key = '0HgoJRyE'}.first.logs.page(2)
ClientApplication Load (165.9ms) SELECT `client_applications`.* FROM `client_applications` WHERE `client_applications`.`account_id` = 1 AND `client_applications`.`public_key` = 'p0kZudG0' AND (client_applications.deleted_at IS NULL) ORDER BY client_applications.id LIMIT 1
ClientApplicationVersion Load (105.1ms) SELECT `client_application_versions`.* FROM `client_application_versions` WHERE `client_application_versions`.`client_application_id` = 3 AND (client_application_versions.deleted_at IS NULL) ORDER BY client_application_versions.created_at DESC, client_application_versions.id DESC
Log Load (57295.0ms) SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
EXPLAIN (214.5ms) EXPLAIN SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
EXPLAIN for: SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | SIMPLE | cloud_logs | index_merge | PRIMARY,index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at | index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at | 5,9 | NULL | 1874 | Using intersect(index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at); Using where; Using temporary; Using filesort |
| 1 | SIMPLE | logs | ref | index_logs_on_cloud_log_id_and_deleted_at_and_timestamp,index_logs_on_cloud_log_id_and_deleted_at,index_logs_on_cloud_log_id,index_logs_on_deleted_at | index_logs_on_cloud_log_id | 5 | cloudlog_production.cloud_logs.id | 4 | Using where |
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
更新1/25/12
以下是所有相关表格的索引:
CLIENT_APPLICATIONS:
PRIMARY KEY (`id`),
UNIQUE KEY `index_client_applications_on_key` (`key`),
KEY `index_client_applications_on_account_id` (`account_id`),
KEY `index_client_applications_on_deleted_at` (`deleted_at`),
KEY `index_client_applications_on_public_key` (`public_key`)
CLIENT_APPLICATION_VERSIONS:
PRIMARY KEY (`id`),
KEY `index_client_application_versions_on_client_application_id` (`client_application_id`),
KEY `index_client_application_versions_on_deleted_at` (`deleted_at`),
KEY `index_client_application_versions_on_public_key` (`public_key`)
CLOUD_LOGS:
PRIMARY KEY (`id`),
KEY `index_cloud_logs_on_api_client_version_id` (`api_client_version_id`),
KEY `index_cloud_logs_on_client_application_version_id` (`client_application_version_id`),
KEY `index_cloud_logs_on_deleted_at` (`deleted_at`),
KEY `index_cloud_logs_on_device_id` (`device_id`),
KEY `index_cloud_logs_on_public_key` (`public_key`),
KEY `index_cloud_logs_on_received_at` (`received_at`)
LOGS:
PRIMARY KEY (`id`),
KEY `index_logs_on_class_name` (`class_name`),
KEY `index_logs_on_cloud_log_id_and_deleted_at_and_timestamp` (`cloud_log_id`,`deleted_at`,`timestamp`),
KEY `index_logs_on_cloud_log_id_and_deleted_at` (`cloud_log_id`,`deleted_at`),
KEY `index_logs_on_cloud_log_id` (`cloud_log_id`),
KEY `index_logs_on_deleted_at` (`deleted_at`),
KEY `index_logs_on_file_name` (`file_name`),
KEY `index_logs_on_method_name` (`method_name`),
KEY `index_logs_on_public_key` (`public_key`),
KEY `index_logs_on_timestamp` USING BTREE (`timestamp`)
Run Code Online (Sandbox Code Playgroud)
我写这篇文章是对我自己的问题的可能解决方案,希望能得到更好的答案。目前,数据库已完全建立并按规定关系。
ClientApplication has_many => ClientApplicationVersions
ClientApplicationVersions has_many => CloudLogs
CloudLogs has_many => Logs
Run Code Online (Sandbox Code Playgroud)
这意味着当我需要查找属于客户端应用程序的日志时,我必须执行 3 次额外的连接才能获取它。通过向 Logs 表引入一些foreign_key非规范化,我可以跳过所有连接:
ClientApplication has_many => ClientApplicationVersions
ClientApplication has_many => Logs
ClientApplicationVersions has_many => CloudLogs
ClientApplicationVersions has_many => Logs
CloudLogs has_many => Logs
Run Code Online (Sandbox Code Playgroud)
最终结果是我的日志表中会有一些额外的列:client_application_key、client_application_version_key和cloud_log_key。
尽管我冒着数据不一致的风险,但我可以避免此处的 3 个连接,这些连接会降低查询的性能。请有人劝我别再这样了。