我试图理解为什么在并行线程中运行多个解析器不会加速解析HTML.一个线程完成100个任务的速度是两个线程的两倍,每个线程有50个任务.
这是我的代码:
from lxml.html import fromstring
import time
from threading import Thread
try:
from urllib import urlopen
except ImportError:
from urllib.request import urlopen
DATA = urlopen('http://lxml.de/FAQ.html').read()
def func(number):
for x in range(number):
fromstring(DATA)
print('Testing one thread (100 job per thread)')
start = time.time()
t1 = Thread(target=func, args=[100])
t1.start()
t1.join()
elapsed = time.time() - start
print('Time: %.5f' % elapsed)
print('Testing two threads (50 jobs per thread)')
start = time.time()
t1 = Thread(target=func, args=[50])
t2 = Thread(target=func, args=[50])
t1.start()
t2.start()
t1.join()
t2.join() …Run Code Online (Sandbox Code Playgroud) 从mongodb集合中提取所有_id的最佳方法是什么?我正在使用pymongo与mongodb合作.以下代码:
for item in db.some_collection.find({}, {'_id': 1}):
# do something
Run Code Online (Sandbox Code Playgroud)
需要一些时间来迭代所有集合.我只需要_id值,它们都应该适合记忆.为什么这段代码不能立即完成?