如何启用请求异步模式?

use*_*901 17 python asynchronous urllib2 gevent python-requests

对于此代码:

import sys

import gevent
from gevent import monkey

monkey.patch_all()

import requests
import urllib2

def worker(url, use_urllib2=False):
    if use_urllib2:
        content = urllib2.urlopen(url).read().lower()
    else:
        content = requests.get(url, prefetch=True).content.lower()
    title = content.split('<title>')[1].split('</title>')[0].strip()

urls = ['http://www.mail.ru']*5

def by_requests():
    jobs = [gevent.spawn(worker, url) for url in urls]
    gevent.joinall(jobs)

def by_urllib2():
    jobs = [gevent.spawn(worker, url, True) for url in urls]
    gevent.joinall(jobs)

if __name__=='__main__':
    from timeit import Timer
    t = Timer(stmt="by_requests()", setup="from __main__ import by_requests")  
    print 'by requests: %s seconds'%t.timeit(number=3)
    t = Timer(stmt="by_urllib2()", setup="from __main__ import by_urllib2")  
    print 'by urllib2: %s seconds'%t.timeit(number=3)
    sys.exit(0)
Run Code Online (Sandbox Code Playgroud)

这个结果:

by requests: 18.3397213892 seconds
by urllib2: 2.48605842363 seconds
Run Code Online (Sandbox Code Playgroud)

在嗅探器中它看起来像这样:

说明:前5个请求由请求库发出,接下来的5个请求由urllib2库发送.红色 - 是工作冻结的时间,黑暗 - 当数据接收... wtf ?!

如果套接字库修补并且库必须以相同的方式工作,它是如何可行的?如何在没有requests.async的情况下使用请求进行异步工作?

use*_*901 15

对不起Kenneth Reitz.他的图书馆很精彩.

我很蠢.我需要为httplib选择猴子补丁,如下所示:

gevent.monkey.patch_all(httplib=True)
Run Code Online (Sandbox Code Playgroud)

因为默认情况下禁用了httplib的补丁.

  • 不再有效:ValueError:不再提供gevent.httplib,httplib必须为False (12认同)
  • 使用 grequests(@KennethReitz)。它主要覆盖主要动词,并继承其余部分。 (2认同)

Pha*_*ani 7

正如Kenneth所指出的,我们可以做的另一件事是让requests模块处理异步部分.我已相应地更改了您的代码.同样,对我来说,结果表明该requests模块的性能优于urllib2

这样做意味着我们无法"线程化"回调部分.但这应该没问题,因为由于请求/响应延迟,只能通过HTTP请求预期主要收益.

import sys

import gevent
from gevent import monkey

monkey.patch_all()

import requests
from requests import async
import urllib2

def call_back(resp):
    content = resp.content
    title = content.split('<title>')[1].split('</title>')[0].strip()
    return title

def worker(url, use_urllib2=False):
    if use_urllib2:
        content = urllib2.urlopen(url).read().lower()
        title = content.split('<title>')[1].split('</title>')[0].strip()

    else:
        rs = [async.get(u) for u in url]
        resps = async.map(rs)
        for resp in resps:
            call_back(resp) 

urls = ['http://www.mail.ru']*5

def by_requests():
    worker(urls)
def by_urllib2():
    jobs = [gevent.spawn(worker, url, True) for url in urls]
    gevent.joinall(jobs)

if __name__=='__main__':
    from timeit import Timer
    t = Timer(stmt="by_requests()", setup="from __main__ import by_requests")
    print 'by requests: %s seconds'%t.timeit(number=3)
    t = Timer(stmt="by_urllib2()", setup="from __main__ import by_urllib2")
    print 'by urllib2: %s seconds'%t.timeit(number=3)
    sys.exit(0)
Run Code Online (Sandbox Code Playgroud)

这是我的一个结果:

by requests: 2.44117593765 seconds
by urllib2: 4.41298294067 seconds
Run Code Online (Sandbox Code Playgroud)