JBe*_*rer 8 mysql sql query-optimization
我有一个拥有6000万条目的数据库.
每个条目包含:
我需要选择某个月的参赛作品.每个月包含大约200万条目.
select *
from Entries
where time between "2010-04-01 00:00:00" and "2010-05-01 00:00:00"
Run Code Online (Sandbox Code Playgroud)
(查询大约需要1.5分钟)
我还想从给定的DataSourceID中选择某个月的数据.(大约需要20秒)
大约有50-100个不同的DataSourceID.
有没有办法让这更快?我有什么选择?如何优化此数据库/查询?
编辑:有约.每秒60-100次插入!
要在特定月份中获取特定年份的条目,请更快 - 您需要为该time列编制索引:
CREATE INDEX idx_time ON ENTRIES(time) USING BTREE;
Run Code Online (Sandbox Code Playgroud)
另外,使用:
SELECT e.*
FROM ENTRIES e
WHERE e.time BETWEEN '2010-04-01' AND DATE_SUB('2010-05-01' INTERVAL 1 SECOND)
Run Code Online (Sandbox Code Playgroud)
...因为BETWEEN具有包容性,所以您可以使用您发布的查询获得与"2010-05-01 00:00:00"相关的任何内容.
您可以为datasourceid列添加单独的索引:
CREATE INDEX idx_time ON ENTRIES(datasourceid) USING BTREE;
Run Code Online (Sandbox Code Playgroud)
...或设置覆盖索引以包含两列:
CREATE INDEX idx_time ON ENTRIES(time, datasourceid) USING BTREE;
Run Code Online (Sandbox Code Playgroud)
覆盖索引要求必须在查询中使用最左边的列来使用索引.在这个例子中,time第一个将适用于你提到的两种情况 - datasourceid不必用于索引的使用. 但是,您必须通过查看EXPLAIN输出来测试您的查询,以确切了解什么最适合您的数据以及对该数据执行的查询.
也就是说,索引会减慢INSERT,UPDATE和DELETE语句的速度.如果列数据具有很少的不同值,则索引不会提供很多值 - IE:布尔列是索引的错误选择,因为基数很低.
利用innodb集群主键索引.
http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
这将是非常高性能:
create table datasources
(
year_id smallint unsigned not null,
month_id tinyint unsigned not null,
datasource_id tinyint unsigned not null,
id int unsigned not null, -- needed for uniqueness
data int unsigned not null default 0,
primary key (year_id, month_id, datasource_id, id)
)
engine=innodb;
select * from datasources where year_id = 2011 and month_id between 1 and 3;
select * from datasources where year_id = 2011 and month_id = 4 and datasouce_id = 100;
-- etc..
Run Code Online (Sandbox Code Playgroud)
编辑2
忘了我用3个月的数据运行第一个测试脚本.这是一个月的结果:0.34和0.69秒.
select d.* from datasources d where d.year_id = 2010 and d.month_id = 3 and datasource_id = 100 order by d.id desc limit 10;
+---------+----------+---------------+---------+-------+
| year_id | month_id | datasource_id | id | data |
+---------+----------+---------------+---------+-------+
| 2010 | 3 | 100 | 3290330 | 38434 |
| 2010 | 3 | 100 | 3290329 | 9988 |
| 2010 | 3 | 100 | 3290328 | 25680 |
| 2010 | 3 | 100 | 3290327 | 17627 |
| 2010 | 3 | 100 | 3290326 | 64508 |
| 2010 | 3 | 100 | 3290325 | 14257 |
| 2010 | 3 | 100 | 3290324 | 45950 |
| 2010 | 3 | 100 | 3290323 | 49986 |
| 2010 | 3 | 100 | 3290322 | 2459 |
| 2010 | 3 | 100 | 3290321 | 52971 |
+---------+----------+---------------+---------+-------+
10 rows in set (0.34 sec)
select d.* from datasources d where d.year_id = 2010 and d.month_id = 3 order by d.id desc limit 10;
+---------+----------+---------------+---------+-------+
| year_id | month_id | datasource_id | id | data |
+---------+----------+---------------+---------+-------+
| 2010 | 3 | 116 | 3450346 | 42455 |
| 2010 | 3 | 116 | 3450345 | 64039 |
| 2010 | 3 | 116 | 3450344 | 27046 |
| 2010 | 3 | 116 | 3450343 | 23730 |
| 2010 | 3 | 116 | 3450342 | 52380 |
| 2010 | 3 | 116 | 3450341 | 35700 |
| 2010 | 3 | 116 | 3450340 | 20195 |
| 2010 | 3 | 116 | 3450339 | 21758 |
| 2010 | 3 | 116 | 3450338 | 51378 |
| 2010 | 3 | 116 | 3450337 | 34687 |
+---------+----------+---------------+---------+-------+
10 rows in set (0.69 sec)
Run Code Online (Sandbox Code Playgroud)
编辑1
决定用大约测试上述模式.3年内传播6000万行.每个查询都是冷的运行,即每个查询分别运行,之后重启mysql清除任何缓冲区并且没有查询缓存.
完整的测试脚本可以在这里找到:http://pastie.org/1723506或以下...
正如你所看到的那样,即使在我简陋的桌面上,它也是一个非常高性能的架构:)
select count(*) from datasources;
+----------+
| count(*) |
+----------+
| 60306030 |
+----------+
select count(*) from datasources where year_id = 2010;
+----------+
| count(*) |
+----------+
| 16691669 |
+----------+
select
year_id, month_id, count(*) as counter
from
datasources
where
year_id = 2010
group by
year_id, month_id;
+---------+----------+---------+
| year_id | month_id | counter |
+---------+----------+---------+
| 2010 | 1 | 1080108 |
| 2010 | 2 | 1210121 |
| 2010 | 3 | 1160116 |
| 2010 | 4 | 1300130 |
| 2010 | 5 | 1860186 |
| 2010 | 6 | 1220122 |
| 2010 | 7 | 1250125 |
| 2010 | 8 | 1460146 |
| 2010 | 9 | 1730173 |
| 2010 | 10 | 1490149 |
| 2010 | 11 | 1570157 |
| 2010 | 12 | 1360136 |
+---------+----------+---------+
12 rows in set (5.92 sec)
select
count(*) as counter
from
datasources d
where
d.year_id = 2010 and d.month_id between 1 and 3 and datasource_id = 100;
+---------+
| counter |
+---------+
| 30003 |
+---------+
1 row in set (1.04 sec)
explain
select
d.*
from
datasources d
where
d.year_id = 2010 and d.month_id between 1 and 3 and datasource_id = 100
order by
d.id desc limit 10;
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref |rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
| 1 | SIMPLE | d | range | PRIMARY | PRIMARY | 4 | NULL |4451372 | Using where; Using filesort |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
1 row in set (0.00 sec)
select
d.*
from
datasources d
where
d.year_id = 2010 and d.month_id between 1 and 3 and datasource_id = 100
order by
d.id desc limit 10;
+---------+----------+---------------+---------+-------+
| year_id | month_id | datasource_id | id | data |
+---------+----------+---------------+---------+-------+
| 2010 | 3 | 100 | 3290330 | 38434 |
| 2010 | 3 | 100 | 3290329 | 9988 |
| 2010 | 3 | 100 | 3290328 | 25680 |
| 2010 | 3 | 100 | 3290327 | 17627 |
| 2010 | 3 | 100 | 3290326 | 64508 |
| 2010 | 3 | 100 | 3290325 | 14257 |
| 2010 | 3 | 100 | 3290324 | 45950 |
| 2010 | 3 | 100 | 3290323 | 49986 |
| 2010 | 3 | 100 | 3290322 | 2459 |
| 2010 | 3 | 100 | 3290321 | 52971 |
+---------+----------+---------------+---------+-------+
10 rows in set (0.98 sec)
select
count(*) as counter
from
datasources d
where
d.year_id = 2010 and d.month_id between 1 and 3;
+---------+
| counter |
+---------+
| 3450345 |
+---------+
1 row in set (1.64 sec)
explain
select
d.*
from
datasources d
where
d.year_id = 2010 and d.month_id between 1 and 3
order by
d.id desc limit 10;
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref |rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
| 1 | SIMPLE | d | range | PRIMARY | PRIMARY | 3 | NULL |6566916 | Using where; Using filesort |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
1 row in set (0.00 sec)
select
d.*
from
datasources d
where
d.year_id = 2010 and d.month_id between 1 and 3
order by
d.id desc limit 10;
+---------+----------+---------------+---------+-------+
| year_id | month_id | datasource_id | id | data |
+---------+----------+---------------+---------+-------+
| 2010 | 3 | 116 | 3450346 | 42455 |
| 2010 | 3 | 116 | 3450345 | 64039 |
| 2010 | 3 | 116 | 3450344 | 27046 |
| 2010 | 3 | 116 | 3450343 | 23730 |
| 2010 | 3 | 116 | 3450342 | 52380 |
| 2010 | 3 | 116 | 3450341 | 35700 |
| 2010 | 3 | 116 | 3450340 | 20195 |
| 2010 | 3 | 116 | 3450339 | 21758 |
| 2010 | 3 | 116 | 3450338 | 51378 |
| 2010 | 3 | 116 | 3450337 | 34687 |
+---------+----------+---------------+---------+-------+
10 rows in set (1.98 sec)
Run Code Online (Sandbox Code Playgroud)
希望这可以帮助 :)
| 归档时间: |
|
| 查看次数: |
16223 次 |
| 最近记录: |