You*_*nse 1 mysql sql join group-by query-optimization
我有一个非常简单的查询,它必须按连接表中的字段对结果进行分组:
SELECT SQL_NO_CACHE p.name, COUNT(1) FROM ycs_sales s
INNER JOIN ycs_products p ON s.id = p.sales_id
WHERE s.dtm BETWEEN '2018-02-16 00:00:00' AND '2018-02-22 23:59:59'
GROUP BY p.name
Run Code Online (Sandbox Code Playgroud)
表 ycs_products 实际上是 sales_products,列出了每个销售中的产品。我想查看一段时间内每种产品的销售份额。
当前的查询速度是 2 秒,这对于用户交互来说太多了。我需要让这个查询快速运行。有没有办法摆脱Using temporary不规范化?
连接顺序至关重要,两个表中都有大量数据,按日期限制记录数是毋庸置疑的先决条件。
这是解释结果
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: s
type: range
possible_keys: PRIMARY,dtm
key: dtm
key_len: 6
ref: NULL
rows: 1164728
Extra: Using where; Using index; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: p
type: ref
possible_keys: sales_id
key: sales_id
key_len: 5
ref: test.s.id
rows: 1
Extra:
2 rows in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)
和 json 一样
EXPLAIN: {
"query_block": {
"select_id": 1,
"filesort": {
"sort_key": "p.`name`",
"temporary_table": {
"table": {
"table_name": "s",
"access_type": "range",
"possible_keys": ["PRIMARY", "dtm"],
"key": "dtm",
"key_length": "6",
"used_key_parts": ["dtm"],
"rows": 1164728,
"filtered": 100,
"attached_condition": "s.dtm between '2018-02-16 00:00:00' and '2018-02-22 23:59:59'",
"using_index": true
},
"table": {
"table_name": "p",
"access_type": "ref",
"possible_keys": ["sales_id"],
"key": "sales_id",
"key_length": "5",
"used_key_parts": ["sales_id"],
"ref": ["test.s.id"],
"rows": 1,
"filtered": 100
}
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
以及创建表,虽然我觉得它没有必要
CREATE TABLE `ycs_sales` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`dtm` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `dtm` (`dtm`)
) ENGINE=InnoDB AUTO_INCREMENT=2332802 DEFAULT CHARSET=latin1
CREATE TABLE `ycs_products` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`sales_id` int(11) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `sales_id` (`sales_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2332802 DEFAULT CHARSET=latin1
Run Code Online (Sandbox Code Playgroud)
还有一个PHP代码来复制测试环境
#$pdo->query("set global innodb_flush_log_at_trx_commit = 2");
$pdo->query("create table ycs_sales (id int auto_increment primary key, dtm datetime)");
$stmt = $pdo->prepare("insert into ycs_sales values (null, ?)");
foreach (range(mktime(0,0,0,2,1,2018), mktime(0,0,0,2,28,2018)) as $stamp){
$stmt->execute([date("Y-m-d", $stamp)]);
}
$max_id = $pdo->lastInsertId();
$pdo->query("alter table ycs_sales add key(dtm)");
$pdo->query("create table ycs_products (id int auto_increment primary key, sales_id int, name varchar(255))");
$stmt = $pdo->prepare("insert into ycs_products values (null, ?, ?)");
$products = ['food', 'drink', 'vape'];
foreach (range(1, $max_id) as $id){
$stmt->execute([$id, $products[rand(0,2)]]);
}
$pdo->query("alter table ycs_products add key(sales_id)");
Run Code Online (Sandbox Code Playgroud)
小智 5
问题是分组方式name会让你丢失sales_id信息,因此 MySQL 被迫使用临时表。
虽然它不是解决方案的干净,和我的最爱少一个方法,你可以添加一个新的索引,在两者的name和sales_id栏目,如:
ALTER TABLE `yourdb`.`ycs_products`
ADD INDEX `name_sales_id_idx` (`name` ASC, `sales_id` ASC);
Run Code Online (Sandbox Code Playgroud)
并强制查询使用此索引,force index或者use index:
SELECT SQL_NO_CACHE p.name, COUNT(1) FROM ycs_sales s
INNER JOIN ycs_products p use index(name_sales_id_idx) ON s.id = p.sales_id
WHERE s.dtm BETWEEN '2018-02-16 00:00:00' AND '2018-02-22 23:59:59'
GROUP BY p.name;
Run Code Online (Sandbox Code Playgroud)
我的执行只报告了表 p 上的“使用位置;使用索引”和表 s 上的“使用位置”。
无论如何,我强烈建议您重新考虑您的架构,因为您可能会为这两个表找到更好的设计。另一方面,如果这不是您的应用程序的关键部分,您可以处理“强制”索引。
由于很明显问题出在设计中,我建议将关系绘制为多对多。如果您有机会在测试环境中验证它,我会这样做:
1)创建一个临时表来存储产品的名称和ID:
create temporary table tmp_prods
select min(id) id, name
from ycs_products
group by name;
Run Code Online (Sandbox Code Playgroud)
2) 从临时表开始,加入 sales 表来创建一个替代ycs_product:
create table ycs_products_new
select * from tmp_prods;
ALTER TABLE `poc`.`ycs_products_new`
CHANGE COLUMN `id` `id` INT(11) NOT NULL ,
ADD PRIMARY KEY (`id`);
Run Code Online (Sandbox Code Playgroud)
3)创建连接表:
CREATE TABLE `prod_sale` (
`prod_id` INT(11) NOT NULL,
`sale_id` INT(11) NOT NULL,
PRIMARY KEY (`prod_id`, `sale_id`),
INDEX `sale_fk_idx` (`sale_id` ASC),
CONSTRAINT `prod_fk`
FOREIGN KEY (`prod_id`)
REFERENCES ycs_products_new (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `sale_fk`
FOREIGN KEY (`sale_id`)
REFERENCES ycs_sales (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION);
Run Code Online (Sandbox Code Playgroud)
并用现有值填充它:
insert into prod_sale (prod_id, sale_id)
select tmp_prods.id, sales_id from ycs_sales s
inner join ycs_products p
on p.sales_id=s.id
inner join tmp_prods on tmp_prods.name=p.name;
Run Code Online (Sandbox Code Playgroud)
最后,连接查询:
select name, count(name) from ycs_products_new p
inner join prod_sale ps on ps.prod_id=p.id
inner join ycs_sales s on s.id=ps.sale_id
WHERE s.dtm BETWEEN '2018-02-16 00:00:00' AND '2018-02-22 23:59:59'
group by p.id;
Run Code Online (Sandbox Code Playgroud)
请注意,分组依据是主键,而不是名称。
解释输出:
explain select name, count(name) from ycs_products_new p inner join prod_sale ps on ps.prod_id=p.id inner join ycs_sales s on s.id=ps.sale_id WHERE s.dtm BETWEEN '2018-02-16 00:00:00' AND '2018-02-22 23:59:59' group by p.id;
+------+-------------+-------+--------+---------------------+---------+---------+-----------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+---------------------+---------+---------+-----------------+------+-------------+
| 1 | SIMPLE | p | index | PRIMARY | PRIMARY | 4 | NULL | 3 | |
| 1 | SIMPLE | ps | ref | PRIMARY,sale_fk_idx | PRIMARY | 4 | test.p.id | 1 | Using index |
| 1 | SIMPLE | s | eq_ref | PRIMARY,dtm | PRIMARY | 4 | test.ps.sale_id | 1 | Using where |
+------+-------------+-------+--------+---------------------+---------+---------+-----------------+------+-------------+
Run Code Online (Sandbox Code Playgroud)