MPa*_*Paz 2 performance multithreading pymongo
我试图看到pymongo的性能改进,但我没有观察到任何.
我的示例数据库有400,000条记录.基本上我看到线程和单线程性能相同 - 并且唯一的性能增益来自多个流程执行.
pymongo在查询期间不会释放GIL吗?
Single Perf:真正的0m0.618s
Multiproc:真正的0m0.144s
多线程:真正的0m0.656s
常规代码:
choices = ['foo','bar','baz']
def regular_read(db, sample_choice):
rows = db.test_samples.find({'choice':sample_choice})
return 42 # done to remove calculations from the picture
def main():
client = MongoClient('localhost', 27017)
db = client['test-async']
for sample_choice in choices:
regular_read(db, sample_choice)
if __name__ == '__main__':
main()
$ time python3 mongotest_read.py
real 0m0.618s
user 0m0.085s
sys 0m0.018s
Run Code Online (Sandbox Code Playgroud)
现在,当我使用多处理时,我可以看到一些改进.
from random import randint, choice
import functools
from pymongo import MongoClient
from concurrent import futures
choices = ['foo','bar','baz']
MAX_WORKERS = 4
def regular_read(sample_choice):
client = MongoClient('localhost', 27017,connect=False)
db = client['test-async']
rows = db.test_samples.find({'choice':sample_choice})
#return sum(r['data'] for r in rows)
return 42
def main():
f = functools.partial(regular_read)
with futures.ProcessPoolExecutor(MAX_WORKERS) as executor:
res = executor.map(f, choices)
print(list(res))
return len(list(res))
if __name__ == '__main__':
main()
$ time python3 mongotest_proc_read.py
[42, 42, 42]
real 0m0.144s
user 0m0.106s
sys 0m0.041s
Run Code Online (Sandbox Code Playgroud)
但是当您从ProcessPoolExecutor切换到ThreadPoolExecutor时,速度将回退到单线程模式.
...
def main():
client = MongoClient('localhost', 27017,connect=False)
f = functools.partial(regular_read, client)
with futures.ThreadPoolExecutor(MAX_WORKERS) as executor:
res = executor.map(f, choices)
print(list(res))
return len(list(res))
$ time python3 mongotest_thread_read.py
[42, 42, 42]
real 0m0.656s
user 0m0.111s
sys 0m0.024s
Run Code Online (Sandbox Code Playgroud)
...
A. *_*vis 16
PyMongo使用标准的Python套接字模块,它在通过网络发送和接收数据时丢弃GIL.然而,它不是MongoDB或网络是你的瓶颈:它是Python.
CPU密集型Python进程无法通过添加线程进行扩展; 事实上,由于环境转换和其他低效率,它们会略微放缓.要在Python中使用多个CPU,请启动子进程.
我知道"查找"应该是CPU密集型似乎并不直观,但Python解释器的速度足以与我们的直觉相矛盾.如果查询速度很快且localhost上的MongoDB没有延迟,那么MongoDB可以轻松地胜过Python客户端.您刚刚运行的实验,用子进程替换线程,确认Python性能是瓶颈.
要确保最大吞吐量,请确保已启用C扩展:pymongo.has_c() == True.有了这个,PyMongo的运行速度就像Python客户端库可以实现的那样快,以获得更多的吞吐量.
如果您预期的实际场景涉及更耗时的查询,或者具有一些网络延迟的远程MongoDB,多线程可能会给您带来一些性能提升.
| 归档时间: |
|
| 查看次数: |
3233 次 |
| 最近记录: |