Redis 慢查询与流水线 hgetall

Question

Redis 慢查询与流水线 hgetall

Pau*_*aul 7 python performance database-performance query-performance redis

所以我有一个小而简单的Redis数据库。它包含 136689 个键，其值是包含 27 个字段的哈希映射。我正在通过服务器节点上的 Python 接口访问该表，并且每次调用需要加载大约 1000-1500 个值（最终我将看到每秒大约 10 个请求）。一个简单的调用看起来像这样：

# below keys is a list of approximately 1000 integers, 
# not all of which are in the table

import redis
db = redis.StrictRedis(
  host='127.0.0.1',
  port=6379,
  db=0,
  socket_timeout=1,
  socket_connection_timeout=1,
  decode_responses=True
)

with db.pipeline() as pipe:
  for key in keys: 
    pipe.hgetall(key)
  results = zip(keys,pipe.execute())

Run Code Online (Sandbox Code Playgroud)

总时间约为 328 毫秒，每个请求的平均时间约为 0.25 毫秒。

问题：这对于小型数据库来说非常慢，每秒查询次数相对较少。我的配置或我调用服务器的方式有问题吗？可以做些什么来加快速度吗？我不希望桌子变得更大，所以我很高兴为了速度而牺牲磁盘空间。

附加信息

调用hget每个键（没有管道）较慢（如预期）并显示时间分布是双峰的。较小的峰值对应于不在表中的键，而较大的峰值对应于表中的键。

我的conf文件如下：

port 6379
daemonize yes 
save ""
bind 127.0.0.1
tcp-keepalive 300 
dbfilename mytable.rdb
dir .
rdbcompression yes 

appendfsync no
no-appendfsync-on-rewrite yes 
loglevel notice

Run Code Online (Sandbox Code Playgroud)

我使用以下命令启动服务器：

> echo never > /sys/kernel/mm/transparent_hugepage/enabled
> redis-server myconf.conf

Run Code Online (Sandbox Code Playgroud)

我还测量了固有延迟，redis-cli --intrinsic-latency 100它给出了：

Max latency so far: 1 microseconds.
Max latency so far: 10 microseconds.
Max latency so far: 11 microseconds.
Max latency so far: 12 microseconds.
Max latency so far: 18 microseconds.
Max latency so far: 32 microseconds.
Max latency so far: 34 microseconds.
Max latency so far: 38 microseconds.
Max latency so far: 48 microseconds.
Max latency so far: 52 microseconds.
Max latency so far: 60 microseconds.
Max latency so far: 75 microseconds.
Max latency so far: 94 microseconds.
Max latency so far: 120 microseconds.
Max latency so far: 281 microseconds.
Max latency so far: 413 microseconds.
Max latency so far: 618 microseconds.

1719069639 total runs (avg latency: 0.0582 microseconds / 58.17 nanoseconds per run).
Worst run took 10624x longer than the average latency.

Run Code Online (Sandbox Code Playgroud)

这表明我应该能够获得更好的延迟。但是，当我检查服务器延迟时：> redis-cli --latency -h 127.0.0.1 -p 6379我得到min: 0, max: 2, avg: 0.26 (2475 samples)

这似乎表明 ~0.25ms 是我的服务器的延迟，但这似乎表明我从 Python 看到的每个请求的延迟与 CLI 相同，但它似乎非常非常慢。

与每个键关联的哈希图（解码后）的大小约为 1200 字节。所以我运行了以下基准测试

redis-benchmark -h 127.0.0.1 -p 6379 -d 1500 hmset hgetall myhash rand_int rand_string
====== hmset hgetall myhash rand_int rand_string ======
  100000 requests completed in 1.45 seconds
  50 parallel clients
  1500 bytes payload
  keep alive: 1

100.00% <= 1 milliseconds
100.00% <= 1 milliseconds
69060.77 requests per second

Run Code Online (Sandbox Code Playgroud)

这似乎支持我的延迟非常高，但并没有真正告诉我原因。

Answer 1

Pel*_*can 2

我从使用 Redis 的方式中得到的结论之一是，我们不应该将每笔交易存储在一个哈希中。就像一笔交易一个哈希一样。

对于每个 hget 请求，我们都有一个网络连接，这会减慢查询速度。

我认为 Redis 的设计方式将所有内容存储在一个哈希中会更快，就像所有事务都在同一哈希下一样。

此外，粒度数据可以以 JSON 形式存储在每个键：值中。

对于 140mb 的数据，检索所有哈希值所需的时间与检索一个哈希值中存储的所有值所需的时间：

3 秒迭代每个哈希并获取其键：值与
0,008 秒获取一个哈希值，然后在该哈希值中搜索键：值，vs
0,008 秒即可获取存储在一个哈希下的所有数据。

在 for 迭代中，您不再有 1 000 000 000 次迭代（如果您有 1 000 000 000 个哈希值），这里使用建议的解决方案，您只有 1 次迭代（如果您可以根据内在值隔离数据，则次数会更多），因此显着减少查询时间。

归档时间：	6 年前
查看次数：	2089 次
最近记录：	5 年，6 月前