使用Python中的Redis将数据保存在内存中的最快方法

Question

使用Python中的Redis将数据保存在内存中的最快方法

Rob*_*bin 3 python database performance redis

我需要使用Python 3在烧瓶应用程序中保存一次并加载多次大型数组。这些数组最初是使用json库存储在磁盘上的。为了加快速度，我在同一台计算机上使用Redis通过将数组序列化为JSON字符串来存储该数组。我想知道为什么我没有得到任何改善（实际上，我使用的服务器花费了更多时间），而Redis将数据保留在RAM中。我想JSON序列化不是最优化的，但是我不知道如何加快这一步：

import json
import redis
import os 
import time

current_folder = os.path.dirname(os.path.abspath(__file__))
file_path = os.path.join(current_folder, "my_file")

my_array = [1]*10000000

with open(file_path, 'w') as outfile:
    json.dump(my_array, outfile)

start_time = time.time()
with open(file_path, 'r') as infile:
    my_array = json.load(infile)
print("JSON from disk  : ", time.time() - start_time)

r = redis.Redis()
my_array_as_string = json.dumps(my_array)
r.set("my_array_as_string", my_array_as_string)

start_time = time.time()
my_array_as_string = r.get("my_array_as_string")
print("Fetch from Redis:", time.time() - start_time)

start_time = time.time()
my_array = json.loads(my_array_as_string)
print("Parse JSON      :", time.time() - start_time)

Run Code Online (Sandbox Code Playgroud)

结果：

JSON from disk  : 1.075700044631958
Fetch from Redis: 0.078125
Parse JSON      : 1.0247752666473389

Run Code Online (Sandbox Code Playgroud)

编辑：似乎从redis抓取实际上是很快的，但是JSON解析是很慢的。有没有一种方法可以直接从Redis中获取数组，而无需JSON序列化部分？这就是我们使用pyMySQL所做的，并且速度很快。

Answer 1

Roo*_*iat 5

更新：2019年11月8日-在Python3.6上运行相同的测试

结果：

转储时间：JSON> msgpack>泡菜>封送
加载时间：JSON>泡菜> msgpack>封送
空间：封送> JSON>泡菜> msgpack

+---------+-----------+-----------+-------+
| package | dump time | load time | size  |
+---------+-----------+-----------+-------+
| json    | 0.00134   | 0.00079   | 30049 |
| pickle  | 0.00023   | 0.00019   | 20059 |
| msgpack | 0.00031   | 0.00012   | 10036 |
| marshal | 0.00022   | 0.00010   | 50038 |
+---------+-----------+-----------+-------+

Run Code Online (Sandbox Code Playgroud)

我尝试了泡菜vs json vs msgpack vs marshal。

Pickle比JSON慢得多。而msgpack的速度至少是JSON的4倍。MsgPack看起来是您的最佳选择。

编辑：也尝试过元帅。元帅比JSON快，但比msgpack慢。

花费的时间：泡菜> JSON>元帅> MsgPack花费的
空间：元帅>泡菜> Json> MsgPack

import time
import json
import pickle
import msgpack
import marshal
import sys

array = [1]*10000

start_time = time.time()
json_array = json.dumps(array)
print "JSON dumps: ", time.time() - start_time
print "JSON size: ", sys.getsizeof(json_array)
start_time = time.time()
_ = json.loads(json_array)
print "JSON loads: ", time.time() - start_time

# --------------

start_time = time.time()
pickled_object = pickle.dumps(array)
print "Pickle dumps: ", time.time() - start_time
print "Pickle size: ", sys.getsizeof(pickled_object)
start_time = time.time()
_ = pickle.loads(pickled_object)
print "Pickle loads: ", time.time() - start_time


# --------------

start_time = time.time()
package = msgpack.dumps(array)
print "Msg Pack dumps: ", time.time() - start_time
print "MsgPack size: ", sys.getsizeof(package)
start_time = time.time()
_ = msgpack.loads(package)
print "Msg Pack loads: ", time.time() - start_time

# --------------

start_time = time.time()
m_package = marshal.dumps(array)
print "Marshal dumps: ", time.time() - start_time
print "Marshal size: ", sys.getsizeof(m_package)
start_time = time.time()
_ = marshal.loads(m_package)
print "Marshal loads: ", time.time() - start_time

Run Code Online (Sandbox Code Playgroud)

结果：

    JSON dumps:  0.000760078430176
JSON size:  30037
JSON loads:  0.000488042831421
Pickle dumps:  0.0108790397644
Pickle size:  40043
Pickle loads:  0.0100247859955
Msg Pack dumps:  0.000202894210815
MsgPack size:  10040
Msg Pack loads:  7.58171081543e-05
Marshal dumps:  0.000118017196655
Marshal size:  50042
Marshal loads:  0.000118970870972

Run Code Online (Sandbox Code Playgroud)

从你的打印评论来看，你使用的是Python 2，其中pickle很慢，建议你使用带有“import cPickle as pickle”的C版本。在 Python 3.7 上，我得到以下保存和加载时间： - 使用 json：0.739 + 0.584 毫秒，30049 字节。- 使用 ujson：0.265 + 0.136 毫秒，20050 字节。- 使用pickle：0.188 + 0.132 ms，20059 字节。- 使用 msgpack：0.317 + 0.059 毫秒，10036 字节。- 使用元帅：0.154 + 0.081 毫秒，50038 字节。当然，如果您要存储大型同质数组，请使用 numpy 和 pickle： - 使用 pickle 的 Numpy 数组：0.016 + 0.000 毫秒，40192 字节。 (2认同)

归档时间：	7 年，5 月前
查看次数：	1443 次
最近记录：	6 年，3 月前