帮助优化查询

Noa*_*ich 1 mysql mysql-5 optimization mysql-5.1

对于 MySQL 5.1.49 中的以下表结构:

CREATE TABLE `leads` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `institution_id` int(10) unsigned NOT NULL,
  `lender_id` int(10) unsigned NOT NULL,
  `product_id` int(10) unsigned NOT NULL,
  `client_id` int(10) unsigned NOT NULL,
  `contract_id` int(10) unsigned DEFAULT NULL,
  `employee_id` int(11) unsigned DEFAULT NULL,
  `create_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `inquiry_date` timestamp NULL DEFAULT NULL,
  `claimed_date` timestamp NULL DEFAULT NULL,
  `refunded` timestamp NULL DEFAULT NULL,
  `price` decimal(10,2) unsigned NOT NULL,
  `downloaded` int(11) NOT NULL DEFAULT '0',
  `status` enum('in_review','declined','pre-approved') DEFAULT NULL,
  `amount` bigint(20) DEFAULT NULL,
  `response` longtext,
  `pushed` tinyint(1) NOT NULL DEFAULT '0',
  `priority` tinyint(2) NOT NULL,
  `disabled` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `unq_lead` (`institution_id`,`client_id`,`product_id`),
  KEY `status` (`status`),
  KEY `fk-product-leads` (`product_id`),
  KEY `fk-contract-leads` (`contract_id`),
  KEY `fk-lender-leads` (`lender_id`),
  KEY `fk-client-leads` (`client_id`),
  KEY `fk-employee-leads` (`employee_id`),
  CONSTRAINT `fk-client-leads` FOREIGN KEY (`client_id`) REFERENCES `clients` (`id`) ON DELETE CASCADE,
  CONSTRAINT `fk-contract-leads` FOREIGN KEY (`contract_id`) REFERENCES `contracts` (`id`),
  CONSTRAINT `fk-employee-leads` FOREIGN KEY (`employee_id`) REFERENCES `users` (`id`) ON UPDATE CASCADE,
  CONSTRAINT `fk-institution-leads` FOREIGN KEY (`institution_id`) REFERENCES `institutions` (`id`) ON DELETE CASCADE,
  CONSTRAINT `fk-lender-leads` FOREIGN KEY (`lender_id`) REFERENCES `lenders` (`id`),
  CONSTRAINT `fk-product-leads` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5930472 DEFAULT CHARSET=utf8
Run Code Online (Sandbox Code Playgroud)

有什么办法可以进一步优化这个查询:

SELECT * 
FROM leads
INNER JOIN (
    SELECT MIN(id) AS 'min_id', `leads`.`institution_id`, `leads`.`client_id`
    FROM `leads`
    WHERE (claimed_date IS NOT NULL)
    AND institution_id = 224
    GROUP BY `institution_id`,`client_id`
    HAVING (COUNT(*) > 0)
) AS `cl` ON leads.institution_id = cl.institution_id AND leads.client_id = cl.client_id AND leads.id != cl.min_id
WHERE (leads.disabled = 0);
Run Code Online (Sandbox Code Playgroud)

这些是我目前从 EXPLAIN 得到的结果:

id  select_type  table       type  possible_keys                                       key       key_len  ref                             rows  Extra        
1   PRIMARY      <derived2>  ALL                                                                                                          5832               
1   PRIMARY      leads       ref   unq_lead,fk-client-leads,fk-institution-leads,test  unq_lead  8        cl.institution_id,cl.client_id  1     Using where  
2   DERIVED      leads       ref   unq_lead,fk-institution-leads                       unq_lead  4                                        6013  Using where 
Run Code Online (Sandbox Code Playgroud)

更新

所以我重写了查询,使其看起来像这样:

SELECT *
FROM leads
INNER JOIN (
SELECT MIN(id) AS `min_id`, `leads`.`institution_id`, `leads`.`client_id`
FROM `leads`
WHERE (claimed_date IS NOT NULL)
AND (institution_id = 224)
GROUP BY `client_id`
) AS `cl` ON leads.institution_id = cl.institution_id AND leads.client_id = cl.client_id AND leads.id <> cl.min_id
WHERE (leads.disabled = 0)
AND (leads.institution_id = 224);
Run Code Online (Sandbox Code Playgroud)

我还尝试添加以下附加索引:

ALTER TABLE leads
    ADD INDEX test (disabled, institution_id, client_id),
    ADD INDEX test2 (claimed_date, institution_id, client_id);
Run Code Online (Sandbox Code Playgroud)

但我仍然从 EXPLAIN 得到以下信息:

id  select_type  table       type  possible_keys                  key       key_len  ref                 rows   Extra        
1   PRIMARY      <derived2>  ALL                                                                         5718   Using where  
1   PRIMARY      leads       ref   unq_lead,fk-client-leads,test  unq_lead  8        const,cl.client_id  1      Using where  
2   DERIVED      leads       ref   unq_lead,test2                 unq_lead  4                            12304  Using where  
Run Code Online (Sandbox Code Playgroud)

ype*_*eᵀᴹ 5

  • 删除HAVING COUNT(*) > 0. 这是没用的,没有行会0在 group by 之后计数。

  • 更改GROUP BY为:GROUP BY client_idinstitution_id不需要分组依据,您已经有了将WHERE其缩小一个值的条件。

  • 正如@HLGEM 建议的那样,删除select *并使用您需要的字段列表。现在您正在client_id现场重复数据,这浪费了服务器和网络资源。

所以查询变成:

SELECT le.*                  --- only the fields you need here
                             --- for example `institution_id` is 224, so
                             --- there is no need to include that
FROM leads AS le
  INNER JOIN (
    SELECT MIN(id) AS min_id, institution_id, client_id
    FROM leads 
    WHERE claimed_date IS NOT NULL
      AND institution_id = 224
    GROUP BY client_id
  ) AS cl 
      ON  le.institution_id = cl.institution_id 
      AND le.client_id = cl.client_id 
      AND le.id <> cl.min_id
WHERE le.disabled = 0 ;
Run Code Online (Sandbox Code Playgroud)
  • 然后添加一个索引(claimed_date, institution_id, client_id)来加速嵌套子查询。

  • 如果这并不能真正提高速度,我认为索引(disabled, institution_id, client_id)将有助于加入。

您还可以将查询重写为:

SELECT le.* 
FROM leads AS le
  INNER JOIN (
    SELECT MIN(id) AS min_id, client_id
    FROM leads 
    WHERE claimed_date IS NOT NULL
      AND institution_id = 224
    GROUP BY client_id
  ) AS cl 
      ON  le.client_id = cl.client_id 
      AND le.id <> cl.min_id
WHERE le.disabled = 0 
  AND le.institution_id = 224;
Run Code Online (Sandbox Code Playgroud)