排名有数百万条目

Naa*_*tan 26 mysql database ranking

我正在开发一款能够处理数百万玩家的在线游戏服务器.现在游戏需要排行榜并且希望能够向玩家显示当前玩家当前位置以及可能在当前玩家位置附近的其他玩家以及玩家朋友的位置.

现在我已经在MySQL之前完成了这些工作并且我知道它在技术上是如何可能的,但我想,因为这是许多在线游戏的常见做法,必须有现有的库或数据库,特别是为此目的？

任何人都可以告诉我什么数据库最适合这些类型的查询,可能还有任何预先存在的库已经做了很多这方面的工作？具有API访问权限的第三方服务也可以.

希望得到一些好的建议,谢谢!

编辑:

为了澄清,我需要一个可以容纳数百万个条目的数据库(到目前为止MySQL是有用的),我可以轻松获得排名结果.例如,如果我从"排行榜"表中获取特定行,我需要知道该行具有哪个排名.无论db的大小如何,此查询都必须低于500毫秒.

或者,使用当前排名信息更新表的方法可能会很长,因为此更新查询不会锁定整个表,并且更新查询在30秒内运行.

关于使用什么数据库/机制或第三方服务的任何想法将非常感谢!

对于服务器级磁盘,单个磁盘搜索大约为15毫秒,可能稍微少一点.小于500毫秒的响应时间限制了大约30个随机磁盘访问.那不是很多.

在我的小笔记本电脑上,我有一个开发数据库

root@localhost [kris]> select @@innodb_buffer_pool_size/1024/1024 as pool_mb;
+--------------+
| pool_mb      |
+--------------+
| 128.00000000 |
+--------------+
1 row in set (0.00 sec)

Run Code Online (Sandbox Code Playgroud)

和一个慢的笔记本电脑磁盘我创建了一个得分表

root@localhost [kris]> show create table score\G
*************************** 1. row ***************************
       Table: score
Create Table: CREATE TABLE `score` (
  `player_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `score` int(11) NOT NULL,
  PRIMARY KEY (`player_id`),
  KEY `score` (`score`)
) ENGINE=InnoDB AUTO_INCREMENT=2490316 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Run Code Online (Sandbox Code Playgroud)

随机整数分数和连续的player_id值.我们有

root@localhost [kris]> select count(*)/1000/1000 as mrows from score\G
*************************** 1. row ***************************
mrows: 2.09715200
1 row in set (0.39 sec)

Run Code Online (Sandbox Code Playgroud)

数据库在索引(score, player_id)中score按顺序维护对score,因为InnoDB索引中的数据存储在BTREE中,行指针(数据指针)是主键值,因此定义KEY (score)最终在KEY(score, player_id)内部.我们可以通过查看分数检索的查询计划来证明:

root@localhost [kris]> explain select * from score where score = 17\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: score
         type: ref
possible_keys: score
          key: score
      key_len: 4
          ref: const
         rows: 29
        Extra: Using index
1 row in set (0.00 sec)

Run Code Online (Sandbox Code Playgroud)

如您所见,key: score正在使用它Using index,这意味着无需数据访问.

给定常量的排名查询player_id在我的笔记本电脑上只需要500毫秒:

root@localhost [kris]>  select p.*, count(*) as rank 
    from score as p join score as s on p.score < s.score 
   where p.player_id = 479269\G
*************************** 1. row ***************************
player_id: 479269
    score: 99901
     rank: 2074
1 row in set (0.50 sec)

Run Code Online (Sandbox Code Playgroud)

有了更多的内存和更快的盒子,它可以更快,但它仍然是一个相对昂贵的操作,因为该计划很糟糕:

root@localhost [kris]> explain select p.*, count(*) as rank from score as p join score as s on p.score < s.score where p.player_id = 479269;
+----+-------------+-------+-------+---------------+---------+---------+-------+---------+--------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows    | Extra                    |
+----+-------------+-------+-------+---------------+---------+---------+-------+---------+--------------------------+
|  1 | SIMPLE      | p     | const | PRIMARY,score | PRIMARY | 4       | const |       1 |                          |
|  1 | SIMPLE      | s     | index | score         | score   | 4       | NULL  | 2097979 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+-------+---------+--------------------------+
2 rows in set (0.00 sec)

Run Code Online (Sandbox Code Playgroud)

如您所见,计划中的第二个表是索引扫描,因此查询会随着玩家数量的增加而线性减慢.

如果你想要一个完整的排行榜,你需要省略where子句,然后你得到两次扫描和二次执行时间.所以这个计划完全崩溃了.

是时候进行程序化了:

root@localhost [kris]> set @count = 0; 
    select *, @count := @count + 1 as rank from score where score >= 99901 order by score desc ;
...
|   2353218 | 99901 | 2075 |
|   2279992 | 99901 | 2076 |
|   2264334 | 99901 | 2077 |
|   2239927 | 99901 | 2078 |
|   2158161 | 99901 | 2079 |
|   2076159 | 99901 | 2080 |
|   2027538 | 99901 | 2081 |
|   1908971 | 99901 | 2082 |
|   1887127 | 99901 | 2083 |
|   1848119 | 99901 | 2084 |
|   1692727 | 99901 | 2085 |
|   1658223 | 99901 | 2086 |
|   1581427 | 99901 | 2087 |
|   1469315 | 99901 | 2088 |
|   1466122 | 99901 | 2089 |
|   1387171 | 99901 | 2090 |
|   1286378 | 99901 | 2091 |
|    666050 | 99901 | 2092 |
|    633419 | 99901 | 2093 |
|    479269 | 99901 | 2094 |
|    329168 | 99901 | 2095 |
|    299189 | 99901 | 2096 |
|    290436 | 99901 | 2097 |
...

Run Code Online (Sandbox Code Playgroud)

因为这是一个程序计划,所以它不稳定:

你不能使用LIMIT,因为这会抵消计数器.相反,你必须下载所有这些数据.
你无法真正排序.此ORDER BY子句有效,因为它不排序,但使用索引.一看到using filesort,计数器值就会大幅下降.

但是,这个解决方案最接近于NoSQL(读取:程序)数据库将作为执行计划执行的操作.

我们可以在子查询中稳定NoSQL,然后切出我们感兴趣的部分,但是:

root@localhost [kris]> set @count = 0; 
    select * from ( 
        select *, @count := @count + 1 as rank 
          from score 
         where score >= 99901 
      order by score desc 
    ) as t 
    where player_id = 479269;
Query OK, 0 rows affected (0.00 sec)
+-----------+-------+------+
| player_id | score | rank |
+-----------+-------+------+
|    479269 | 99901 | 2094 |
+-----------+-------+------+
1 row in set (0.00 sec)

root@localhost [kris]> set @count = 0; 
    select * from ( 
        select *, @count := @count + 1 as rank 
          from score 
         where score >= 99901 
      order by score desc 
    ) as t 
    where rank between 2090 and 2100;
Query OK, 0 rows affected (0.00 sec)
+-----------+-------+------+
| player_id | score | rank |
+-----------+-------+------+
|   1387171 | 99901 | 2090 |
|   1286378 | 99901 | 2091 |
|    666050 | 99901 | 2092 |
|    633419 | 99901 | 2093 |
|    479269 | 99901 | 2094 |
|    329168 | 99901 | 2095 |
|    299189 | 99901 | 2096 |
|    290436 | 99901 | 2097 |
+-----------+-------+------+
8 rows in set (0.01 sec)

Run Code Online (Sandbox Code Playgroud)

子查询将前一个结果集实现为一个名为t的临时表,然后我们可以在外部查询中访问该表.因为它是一个临时表,在MySQL中它没有索引.这限制了外部查询中有效的可能性.

但请注意两个查询如何满足您的时序约束.这是计划:

root@localhost [kris]> set @count = 0; explain select * from ( select *, @count := @count + 1 as rank from score where score >= 99901 order by score desc ) as t where rank between 2090 and 2100\G
Query OK, 0 rows affected (0.00 sec)

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2097
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: score
         type: range
possible_keys: score
          key: score
      key_len: 4
          ref: NULL
         rows: 3750
        Extra: Using where; Using index
2 rows in set (0.00 sec)

Run Code Online (Sandbox Code Playgroud)

但是,对于排名不佳的玩家来说,两个查询组件(内部,DERIVED查询和外部BETWEEN约束)都会变慢,然后严重违反时间限制.

root@localhost [kris]> set @count = 0; select * from ( select *, @count := @count + 1 as rank from score where score >= 0 order by score desc ) as t;
...
2097152 rows in set (3.56 sec)

Run Code Online (Sandbox Code Playgroud)

描述性方法的执行时间是稳定的(仅取决于表大小):

root@localhost [kris]> select p.*, count(*) as rank 
   from score as p join score as s on p.score < s.score 
   where p.player_id = 1134026;
+-----------+-------+---------+
| player_id | score | rank    |
+-----------+-------+---------+
|   1134026 |     0 | 2097135 |
+-----------+-------+---------+
1 row in set (0.53 sec)

Run Code Online (Sandbox Code Playgroud)

你的来电.

我知道这是一个老问题,但我确实喜欢盯着这些问题.鉴于数据的比例 - >所需的查询速度,可以使用一些非传统的技巧,这些技巧需要更多的编码工作,但实际上可以提高查询性能.

得分桶

首先,我们应该用桶来跟踪分数.我们希望桶列表(这个名字很棒!)足够小,可以轻松保存在内存中,并且足够大,以至于桶不会经常(相对而言)受到影响.这为我们提供了更大的并发性以避免锁定问题.

你必须根据你的负载判断如何拆分这些存储桶,但我认为你要专注于拥有尽可能多的存储桶,这些存储桶很容易适应内存并快速添加.

为了适应这种情况,我的score_buckets表将具有以下结构:

minscore, maxscore, usercount; PK(minscore, maxscore)

Run Code Online (Sandbox Code Playgroud)

用户表

我们必须跟踪我们的用户,并且可能会完成以下操作:

userid, score, timestamp
#(etc., etc. that we don't care about for this part of the problem)

Run Code Online (Sandbox Code Playgroud)

为了有效地迭代这个以获得分数计数,我们需要得分的索引.时间戳只是我在我的例子中用来打破平局的东西,所以我有一个明确的排序.如果您不需要它,请抛弃它 - 它正在使用空间,这将影响查询时间.目前:索引(得分,时间戳).

插入/更新/删除用户及其分数

将触发器添加到用户表.插入时:

update score_buckets sb
    set sb.usercount = sb.usercount + 1
    where sb.minscore <= NEW.score
    and sb.maxscore >= NEW.score

Run Code Online (Sandbox Code Playgroud)

在更新

update score_buckets sb
    set sb.usercount = sb.usercount - 1
    where sb.minscore <= OLD.score
    and sb.maxscore >= OLD.score
update score_buckets sb
    set sb.usercount = sb.usercount + 1
    where sb.minscore <= NEW.score
    and sb.maxscore >= NEW.score

Run Code Online (Sandbox Code Playgroud)

删除时

update score_buckets sb
    set sb.usercount = sb.usercount - 1
    where sb.minscore <= OLD.score
    and sb.maxscore >= OLD.score

Run Code Online (Sandbox Code Playgroud)

确定等级

$usersBefore = select sum(usercount)
    from score_buckets
    where maxscore < $userscore;
$countFrom = select max(maxscore)
    from score_buckets
    where maxscore < $userscore;
$rank = select count(*) from user
    where score > $countFrom
    and score <= $userscore
    and timestamp <= $userTimestamp

Run Code Online (Sandbox Code Playgroud)

结束笔记

基准与各种数量的桶,每次加倍或减半.您可以快速编写一个桶加倍/减半脚本,以便加载测试.更新桶可以减少用户得分指数的扫描,并在更新分数时减少锁定/事务争用.更多的桶消耗更多的内存.要选择一个数字,请使用10,000个桶.理想情况下,您的存储桶将覆盖整个分数范围,每个存储桶的计数用户数大致相同.如果您对分布图得分遵循某种曲线,请使您的铲斗分布遵循该曲线.

这个理论与两层跳过列表有关.

我最近读过一篇关于用Redis解决此类问题的文章。您仍然可以使用 MySQL 作为基本存储，但您可以将未排序的结果缓存在 Redis 中并实时更新排名。可以在此处找到链接。本文的最后三分之一是关于键控排序的，就像排名列表一样。

归档时间：	14 年，10 月前
查看次数：	10661 次
最近记录：	11 年，9 月前

将列添加为外键使得外键约束中引用的ERROR列不存在 62

难以在Ubuntu上安装mysql gem 50

未知的表引擎'InnoDB' 24

“无法读取未定义的属性‘tablePath’”（TYPE ORM） 16

如何在删除大量数据后减少MongoDB存储空间 12

在PHP中从SQL格式化时间戳的最简单方法是什么？ 10

用于SQL数据定义语言的Java API 10

使用SQL Server的排行榜设计 10

我们如何为SQL中的列提供多个别名？ 9

sql select*在确切的行数之间 6

如何从当前的Git工作树中删除本地(未跟踪)文件？ 6561

如何查看远程Git分支？ 6408

如何使用CSS为文本或图像提供透明背景？ 2211

将字节转换为字符串？ 1968

npm install的--save选项是什么？ 1779

Make .gitignore会忽略除少数文件之外的所有内容 1531

在Python中创建一个包含列表推导的字典 1216

在对象名称之前单个和双下划线的含义是什么？ 1205

LF将被git中的CRLF取代 - 这是什么,它是否重要？ 1146

Ukkonen的简明英语后缀树算法 1065