Rob*_*bin 3 python database performance redis
我需要使用Python 3在烧瓶应用程序中保存一次并加载多次大型数组。这些数组最初是使用json库存储在磁盘上的。为了加快速度,我在同一台计算机上使用Redis通过将数组序列化为JSON字符串来存储该数组。我想知道为什么我没有得到任何改善(实际上,我使用的服务器花费了更多时间),而Redis将数据保留在RAM中。我想JSON序列化不是最优化的,但是我不知道如何加快这一步:
import json
import redis
import os
import time
current_folder = os.path.dirname(os.path.abspath(__file__))
file_path = os.path.join(current_folder, "my_file")
my_array = [1]*10000000
with open(file_path, 'w') as outfile:
json.dump(my_array, outfile)
start_time = time.time()
with open(file_path, 'r') as infile:
my_array = json.load(infile)
print("JSON from disk : ", time.time() - start_time)
r = redis.Redis()
my_array_as_string = json.dumps(my_array)
r.set("my_array_as_string", my_array_as_string)
start_time = time.time()
my_array_as_string = r.get("my_array_as_string")
print("Fetch from Redis:", time.time() - start_time)
start_time = time.time()
my_array = json.loads(my_array_as_string)
print("Parse JSON :", time.time() - start_time)
Run Code Online (Sandbox Code Playgroud)
结果:
JSON from disk : 1.075700044631958
Fetch from Redis: 0.078125
Parse JSON : 1.0247752666473389
Run Code Online (Sandbox Code Playgroud)
编辑:似乎从redis抓取实际上是很快的,但是JSON解析是很慢的。有没有一种方法可以直接从Redis中获取数组,而无需JSON序列化部分?这就是我们使用pyMySQL所做的,并且速度很快。
更新:2019年11月8日-在Python3.6上运行相同的测试
结果:
转储时间:JSON> msgpack>泡菜>封送
加载时间:JSON>泡菜> msgpack>封送
空间:封送> JSON>泡菜> msgpack
+---------+-----------+-----------+-------+
| package | dump time | load time | size |
+---------+-----------+-----------+-------+
| json | 0.00134 | 0.00079 | 30049 |
| pickle | 0.00023 | 0.00019 | 20059 |
| msgpack | 0.00031 | 0.00012 | 10036 |
| marshal | 0.00022 | 0.00010 | 50038 |
+---------+-----------+-----------+-------+
Run Code Online (Sandbox Code Playgroud)
我尝试了泡菜vs json vs msgpack vs marshal。
Pickle比JSON慢得多。而msgpack的速度至少是JSON的4倍。MsgPack看起来是您的最佳选择。
编辑:也尝试过元帅。元帅比JSON快,但比msgpack慢。
花费的时间:泡菜> JSON>元帅> MsgPack花费的
空间:元帅>泡菜> Json> MsgPack
import time
import json
import pickle
import msgpack
import marshal
import sys
array = [1]*10000
start_time = time.time()
json_array = json.dumps(array)
print "JSON dumps: ", time.time() - start_time
print "JSON size: ", sys.getsizeof(json_array)
start_time = time.time()
_ = json.loads(json_array)
print "JSON loads: ", time.time() - start_time
# --------------
start_time = time.time()
pickled_object = pickle.dumps(array)
print "Pickle dumps: ", time.time() - start_time
print "Pickle size: ", sys.getsizeof(pickled_object)
start_time = time.time()
_ = pickle.loads(pickled_object)
print "Pickle loads: ", time.time() - start_time
# --------------
start_time = time.time()
package = msgpack.dumps(array)
print "Msg Pack dumps: ", time.time() - start_time
print "MsgPack size: ", sys.getsizeof(package)
start_time = time.time()
_ = msgpack.loads(package)
print "Msg Pack loads: ", time.time() - start_time
# --------------
start_time = time.time()
m_package = marshal.dumps(array)
print "Marshal dumps: ", time.time() - start_time
print "Marshal size: ", sys.getsizeof(m_package)
start_time = time.time()
_ = marshal.loads(m_package)
print "Marshal loads: ", time.time() - start_time
Run Code Online (Sandbox Code Playgroud)
结果:
JSON dumps: 0.000760078430176
JSON size: 30037
JSON loads: 0.000488042831421
Pickle dumps: 0.0108790397644
Pickle size: 40043
Pickle loads: 0.0100247859955
Msg Pack dumps: 0.000202894210815
MsgPack size: 10040
Msg Pack loads: 7.58171081543e-05
Marshal dumps: 0.000118017196655
Marshal size: 50042
Marshal loads: 0.000118970870972
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1443 次 |
| 最近记录: |