我在 SO for postgres 上有一个类似的问题 - 现在 mysql 也有同样的问题。
我有两张桌子——
表A:1MM行,AsOfDate,Id,BId(表B的外键)
表 B:50k 行、Id、Flag、ValidFrom、ValidTo
表 A 包含 2011/01/01 和 2011/12/31 之间每天跨 100 个 BId 的多条记录。表 B 包含 100 个投标的多个非重叠(在 validfrom 和 validto 之间)记录。
连接的任务是返回在给定 AsOfDate 上为 BId 激活的标志。
select
a.AsOfDate, b.Flag
from
A a inner Join B b on
a.BId = b.Id and b.ValidFrom <= a.AsOfDate and b.ValidTo >= a.AsOfDate
where
a.AsOfDate >= 20110101 and a.AsOfDate <= 20111231
Run Code Online (Sandbox Code Playgroud)
在具有 64Gb 内存的非常高端的服务器 (+3Ghz) 上,此查询需要 3 多分钟。
+-------+-------------------------+
| Table | Create Table
|
+-------+-------------------------+
| a | CREATE TABLE `a` (
`asofdate` int(4) NOT NULL,
`bid` int(4) NOT NULL,
KEY `asofdate_bid` (`asofdate`,`bid`),
KEY `bid` (`bid`),
KEY `bid_asofdate` (`bid`,`asofdate`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-------------------------+
+-------+-------------------------+
| Table | Create Table |
+-------+-------------------------+
| b | CREATE TABLE `b` (
`key` int(4) NOT NULL,
`id` int(4) NOT NULL,
`flag` char(1) NOT NULL,
`validfrom` int(4) NOT NULL,
`validto` int(4) NOT NULL,
KEY `id` (`id`),
KEY `validfrom` (`validfrom`),
KEY `validfrom_id` (`validfrom`,`id`),
KEY `id_validfrom` (`id`,`validfrom`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-------------------------+
Run Code Online (Sandbox Code Playgroud)
这是解释:
mysql> explain select count(1) from a a inner join b b on a.bid = b.id and b.validfrom <= a.asofdate and b.validto >= a.asofdate where a.asofdate >= 20120101 and a.asofdate <= 20121231;
+----+-------------+-------+------+----------------------------------------+--------------+---------+----------+-------+-----------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
+----+-------------+-------+------+----------------------------------------+--------------+---------+----------+-------+-----------+
| 1 | SIMPLE | b | ALL | id,validfrom,validfrom_id,id_validfrom | NULL | NULL | NULL | 50510 | |
| 1 | SIMPLE | a | ref | asofdate_bid,bid,bid_asofdate | bid_asofdate | 4 | foo.b.id | 1433 | Using where; Using index |
+----+-------------+-------+------+----------------------------------------+--------------+---------+----------+-------+-----------+
Run Code Online (Sandbox Code Playgroud)
SqlServer express 和 Postgres 需要大约 300 毫秒来执行上述查询。我正在决定一个多 TB 的安装,目前它对 mySql(我的首选数据库)来说并不好看!
建议查询的执行计划
删除连接条件(3 分钟):
mysql> EXPLAIN SELECT count(1) FROM a a
-> INNER JOIN b b ON a.bid = b.id
-> WHERE (a.asofdate >= 20120101 and a.asofdate <= 20121231)
-> AND (b.validfrom <= a.asofdate AND b.validto >= a.asofdate);
+----+-------------+-------+------+----------------------------------------+--------------+---------+----------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------------------+--------------+---------+----------+-------+--------------------------+
| 1 | SIMPLE | b | ALL | id,validfrom,validfrom_id,id_validfrom | NULL | NULL | NULL | 50510 | |
| 1 | SIMPLE | a | ref | asofdate_bid,bid,bid_asofdate | bid_asofdate | 4 | foo.b.id | 1433 | Using where; Using index |
+----+-------------+-------+------+----------------------------------------+--------------+---------+----------+-------+--------------------------+
2 rows in set (0.02 sec)
Run Code Online (Sandbox Code Playgroud)
使用直接连接实际上改变了查询计划并使时间变为 6 分钟:
mysql> EXPLAIN SELECT count(1) FROM a a STRAIGHT_JOIN b b ON a.bid = b.id WHERE (a.asofdate >= 20120101 and a.asofdate <= 20121231) AND (b.validfrom <= a.asofdate AND b.validto >= a.asofdate);
+----+-------------+-------+-------+----------------------------------------+--------------+---------+-----------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+----------------------------------------+--------------+---------+-----------+--------+--------------------------+
| 1 | SIMPLE | a | range | asofdate_bid,bid,bid_asofdate | asofdate_bid | 4 | NULL | 500296 | Using where; Using index |
| 1 | SIMPLE | b | ref | id,validfrom,validfrom_id,id_validfrom | id | 4 | foo.a.bid | 255 | Using where |
+----+-------------+-------+-------+----------------------------------------+--------------+---------+-----------+--------+--------------------------+
Run Code Online (Sandbox Code Playgroud)
这是您的原始查询
select
a.AsOfDate, b.Flag
from
A a inner Join B b on
a.BId = b.Id and b.ValidFrom <= a.AsOfDate and b.ValidTo >= a.AsOfDate
where
a.AsOfDate >= 20110101 and a.AsOfDate <= 20111231
Run Code Online (Sandbox Code Playgroud)
我建议在这种情况下重构您的查询:
select
a.AsOfDate, b.Flag
from
(
select * from A
WHERE AsOfDate >= 20110101
AND AsOfDate <= 20111231
) a INNER JOIN B b ON a.bid=b.id
AND b.validfrom <= a.asofdate
AND b.validto >= a.asofdate
;
Run Code Online (Sandbox Code Playgroud)
这样,在 JOIN 之前首先处理A 方的日期范围 ( 20110101
- 20111231
)。重构查询的另一个好处是JOIN
A 和 B 的 涉及 A 的较小子集。
如果你对重构后的查询感到不舒服,这里有另一个建议:切换基于范围的WHERE
和JOIN
子句
select
a.AsOfDate, b.Flag
from
A a inner Join B b on
a.BId = b.Id and a.AsOfDate >= 20110101 and a.AsOfDate <= 20111231
where
b.ValidFrom <= a.AsOfDate and b.ValidTo >= a.AsOfDate
Run Code Online (Sandbox Code Playgroud)
试一试 !!!
我的猜测是你的连接条件混淆了 MySQL 优化器,正如解释所示,它正在加载整个b
表。这给你带来了什么:
EXPLAIN SELECT count(1) FROM a a
INNER JOIN b b ON a.bid = b.id
WHERE (a.asofdate >= 20120101 and a.asofdate <= 20121231)
AND (b.validfrom <= a.asofdate AND b.validto >= a.asofdate);
Run Code Online (Sandbox Code Playgroud)
旁注,您不应该需要KEY (bid)
表 A 上的表,因为KEY bid_asofdate (bid, asofdate)
它将处理这个问题,并且 InnoDB 处理索引的方式,这只会占用比所需更多的空间。
关于索引的一些进一步的漫谈。为什么不在任何表中定义主键?我会b
像这样更新你的表:
CREATE TABLE `b` (
`key` int(4) NOT NULL PRIMARY KEY,
`id` int(4) NOT NULL,
`flag` char(1) NOT NULL,
`validfrom` int(4) NOT NULL,
`validto` int(4) NOT NULL,
KEY `validfrom_id` (`validfrom`,`id`),
KEY `id_validfrom_validto` (`id`,`validfrom`, `validto`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Run Code Online (Sandbox Code Playgroud)
假设它id
实际上不是主键并且key
实际上是有用的:)
归档时间: |
|
查看次数: |
11799 次 |
最近记录: |