有很多页面的类别(巨大的偏移量)(stackoverflow如何工作?)

ced*_*vad 7 mysql database

我认为我的问题可以通过了解堆栈流如何工作来解决.

例如,此页面加载几毫秒(<300毫秒):https://stackoverflow.com/questions page = 61440 & sort = newest

我可以为该页面考虑的唯一查询是类似的 SELECT * FROM stuff ORDER BY date DESC LIMIT {pageNumber}*{stuffPerPage}, {pageNumber}*{stuffPerPage}+{stuffPerPage}

像这样的查询可能需要几秒钟才能运行,但堆栈溢出页面几乎立即加载.它不能是缓存的查询,因为随着时间的推移发布该问题并且每次发布问题时重建缓存都是疯狂的.

那么,您认为这是如何工作的?

(为了使问题更容易,让我们忘记ORDER BY)示例(该表完全缓存在ram中并存储在ssd驱动器中)

mysql> select * from thread limit 1000000, 1;
1 row in set (1.61 sec)

mysql> select * from thread limit 10000000, 1;
1 row in set (16.75 sec)

mysql> describe select * from thread limit 1000000, 1;
+----+-------------+--------+------+---------------+------+---------+------+----------+-------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows     | Extra |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------+
|  1 | SIMPLE      | thread | ALL  | NULL          | NULL | NULL    | NULL | 64801163 |       |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------+

mysql> select * from thread ORDER BY thread_date DESC limit 1000000, 1;
1 row in set (1 min 37.56 sec)


mysql> SHOW INDEXES FROM thread;
+--------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| thread |          0 | PRIMARY  |            1 | newsgroup_id | A         |      102924 |     NULL | NULL   |      | BTREE      |         |               |
| thread |          0 | PRIMARY  |            2 | thread_id    | A         |    47036298 |     NULL | NULL   |      | BTREE      |         |               |
| thread |          0 | PRIMARY  |            3 | postcount    | A         |    47036298 |     NULL | NULL   |      | BTREE      |         |               |
| thread |          0 | PRIMARY  |            4 | thread_date  | A         |    47036298 |     NULL | NULL   |      | BTREE      |         |               |
| thread |          1 | date     |            1 | thread_date  | A         |    47036298 |     NULL | NULL   |      | BTREE      |         |               |
+--------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
5 rows in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)

nob*_*ody 2

在日期列上创建 BTREE 索引,查询将轻松运行

CREATE INDEX date ON stuff(date) USING BTREE
Run Code Online (Sandbox Code Playgroud)

更新:这是我刚刚做过的测试:

CREATE TABLE test( d DATE, i INT, INDEX(d) );
Run Code Online (Sandbox Code Playgroud)

i用具有不同唯一s 和ds的 2,000,000 行填充表

mysql> SELECT * FROM test LIMIT 1000000, 1;
+------------+---------+
| d          | i       |
+------------+---------+
| 1897-07-22 | 1000000 |
+------------+---------+
1 row in set (0.66 sec)

mysql> SELECT * FROM test ORDER BY d LIMIT 1000000, 1;
+------------+--------+
| d          | i      |
+------------+--------+
| 1897-07-22 | 999980 |
+------------+--------+
1 row in set (1.68 sec)
Run Code Online (Sandbox Code Playgroud)

这是一个有趣的观察:

mysql> EXPLAIN SELECT * FROM test ORDER BY d LIMIT 1000, 1;
+----+-------------+-------+-------+---------------+------+---------+------+------+-------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------+
|  1 | SIMPLE      | test  | index | NULL          | d    | 4       | NULL | 1001 |       |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------+

mysql> EXPLAIN SELECT * FROM test ORDER BY d LIMIT 10000, 1;
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------+
|  1 | SIMPLE      | test  | ALL  | NULL          | NULL | NULL    | NULL | 2000343 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------+
Run Code Online (Sandbox Code Playgroud)

MySql 确实使用 OFFSET 1000 的索引,但不使用 10000 的索引。

更有趣的是,如果我确实FORCE INDEX查询需要更多时间:

mysql> SELECT * FROM test FORCE INDEX(d) ORDER BY d LIMIT 1000000, 1;
+------------+--------+
| d          | i      |
+------------+--------+
| 1897-07-22 | 999980 |
+------------+--------+
1 row in set (2.21 sec)
Run Code Online (Sandbox Code Playgroud)