在Python 3中发送多个HTTP请求的最佳方法是什么？

Question

在Python 3中发送多个HTTP请求的最佳方法是什么？

Nik*_*ach 7 python concurrency multithreading http python-3.x

这个想法很简单:我需要并行发送多个HTTP请求.

我决定使用requests-futures库,它基本上产生了多个线程.

现在,我有大约200个请求,它仍然很慢(在我的笔记本电脑上大约需要12秒).我也使用回调来解析响应json(如库文档中所建议的).此外,是否有一个经验法则可以根据请求数量确定最佳线程数,是否有？

基本上,我想知道我是否可以进一步加快这些要求.

Answer 1

roi*_*ppi 8

由于您使用的是python 3.3,我建议您使用@ njzk2在链接线程中找不到的stdlib解决方案:concurrent.futures.

这是一个更高级别的交互,而不仅仅是直接处理threading或multiprocessing原语.您将获得一个Executor处理池和异步报告的接口.

文档有一个基本上直接适用于您的情况的示例,所以我将它放在这里:

import concurrent.futures
import urllib.request

URLS = #[some list of urls]

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result() 
            # do json processing here
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

Run Code Online (Sandbox Code Playgroud)

如果您愿意,可以用urllib.request来电替换来电requests.requests出于显而易见的原因,我确实喜欢更多.

API有点像这样:创建一堆Future表示函数异步执行的对象.然后concurrent.futures.as_completed,您可以使用为Future实例提供迭代器.它会在完成后产生它们.

至于你的问题:

此外,是否有一个经验法则可以根据请求数量确定最佳线程数,是否有？

经验法则,没有.这取决于太多的东西,包括你的互联网连接的速度.我会说它并不真正取决于您拥有的请求数量,更多地取决于您运行的硬件.

幸运的是,很容易调整max_workerskwarg并自己测试.从5或10个线程开始,以5为增量上升.您可能会注意到某些时候性能趋于稳定,然后随着添加额外线程的开销超过增加并行化的边际增益(这是一个单词)而开始减少.

归档时间：	12 年，3 月前
查看次数：	6019 次
最近记录：	12 年，3 月前