Python请求的异步请求

trb*_*bck 128 python asynchronous httprequest python-requests

我尝试了python请求库文档中提供的示例:

http://docs.python-requests.org/en/latest/user/advanced/#asynchronous-requests

async.map(rs)我得到的响应代码,但我想请求每一页的内容.

out = async.map(rs)
print out[0].content
Run Code Online (Sandbox Code Playgroud)

例如,只是不工作.

Jef*_*eff 143

注意

下面的答案是适用于请求v0.13.0 +.在编写此问题后,异步功能已移至grequests.但是,你可以requestsgrequests下面的替换,它应该工作.

我留下这个答案是为了反映原始问题是关于使用请求<v0.13.0.


要以async.map 异步方式执行多个任务,您必须:

  1. 为每个对象(您的任务)定义一个函数
  2. 将该函数添加为请求中的事件挂钩
  3. 调用async.map所有请求/操作的列表

例:

from requests import async
# If using requests > v0.13.0, use
# from grequests import async

urls = [
    'http://python-requests.org',
    'http://httpbin.org',
    'http://python-guide.org',
    'http://kennethreitz.com'
]

# A simple task to do to each response object
def do_something(response):
    print response.url

# A list to hold our things to do via async
async_list = []

for u in urls:
    # The "hooks = {..." part is where you define what you want to do
    # 
    # Note the lack of parentheses following do_something, this is
    # because the response will be used as the first argument automatically
    action_item = async.get(u, hooks = {'response' : do_something})

    # Add the task to our list of things to do via async
    async_list.append(action_item)

# Do our list of things to do via async
async.map(async_list)
Run Code Online (Sandbox Code Playgroud)

  • `from grequests import async`不起作用.. dosomething的这个定义对我有用`def do_something(response,**kwargs):`,我从http://stackoverflow.com/questions/15594015/problems-找到它与钩-使用-请求-蟒封装 (10认同)
  • 如果async.map调用仍然阻塞,那么这是如何异步的?除了请求本身是异步发送的,检索仍然是同步的? (3认同)
  • 好主意留下你的评论:由于最新请求和grequests之间的兼容性问题(请求1.1.0中缺少max_retries选项)我不得不降级请求以检索异步,我发现异步功能已被移动版本0.13+ (https://pypi.python.org/pypi/requests) (2认同)
  • 用`导入grequests as async`代替`from request import async`对我有用。 (2认同)
  • `grequests` 现在推荐 `requests-threads` 或 `requests-futures` (2认同)

out*_*ile 75

async现在是一个独立的模块:grequests.

请参见:https://github.com/kennethreitz/grequests

那里:通过Python发送多个HTTP请求的理想方法?

安装:

$ pip install grequests
Run Code Online (Sandbox Code Playgroud)

用法:

构建一个堆栈:

import grequests

urls = [
    'http://www.heroku.com',
    'http://tablib.org',
    'http://httpbin.org',
    'http://python-requests.org',
    'http://kennethreitz.com'
]

rs = (grequests.get(u) for u in urls)
Run Code Online (Sandbox Code Playgroud)

发送堆栈

grequests.map(rs)
Run Code Online (Sandbox Code Playgroud)

结果看起来像

[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]
Run Code Online (Sandbox Code Playgroud)

grequests似乎没有为并发请求设置限制,即当多个请求被发送到同一服务器时.

  • 关于并发请求的限制 - 您可以在运行map()/ imap()时指定池大小.即grequests.map(rs,size = 20)有20个并发抓取. (11认同)
  • 在 github、repo 上,grequests 的作者建议使用 requests-threads 或 requests-futures 来代替。 (3认同)
  • 截至目前,这不支持 python3(gevent 无法在 py3.4 上构建 v2.6)。 (2认同)

Ant*_*lin 44

我测试了请求 - 期货问候.Grequests更快,但带来了猴子修补和依赖项的其他问题.请求 - 期货比问候慢几倍.我决定编写自己的简单包装请求到ThreadPollExecutor,它几乎和grequest一样快,但没有外部依赖.

import requests
import concurrent.futures

def get_urls():
    return ["url1","url2"]

def load_url(url, timeout):
    return requests.get(url, timeout = timeout)

with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:

    future_to_url = {executor.submit(load_url, url, 10): url for url in     get_urls()}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            resp_err = resp_err + 1
        else:
            resp_ok = resp_ok + 1
Run Code Online (Sandbox Code Playgroud)

  • 我不明白为什么这个答案得到了这么多赞成。OP 问题是关于异步请求的。ThreadPoolExecutor 运行线程。是的,您可以在多个线程中发出请求,但这永远不会是异步程序,所以我怎么能回答原始问题呢? (5认同)
  • 对不起,我不明白你的问题.在多个线程中只使用单个URL?只有一例DDoS攻击)) (2认同)
  • 实际上,问题是如何并行加载 URL。是的,线程池执行器不是最好的选择,最好使用 async io,但它在 Python 中运行良好。我不明白为什么线程不能用于异步?如果您需要异步运行 CPU 密集型任务怎么办? (2认同)

Dre*_*puf 28

也许请求 - 期货是另一种选择.

from requests_futures.sessions import FuturesSession

session = FuturesSession()
# first request is started in background
future_one = session.get('http://httpbin.org/get')
# second requests is started immediately
future_two = session.get('http://httpbin.org/get?foo=bar')
# wait for the first request to complete, if it hasn't already
response_one = future_one.result()
print('response one status: {0}'.format(response_one.status_code))
print(response_one.content)
# wait for the second request to complete, if it hasn't already
response_two = future_two.result()
print('response two status: {0}'.format(response_two.status_code))
print(response_two.content)
Run Code Online (Sandbox Code Playgroud)

办公室文件中也建议使用它.如果你不想参与gevent,这是一个很好的.


Har*_*oli 14

我对发布的大多数答案都有很多问题 - 他们要么使用已过时且功能有限的库,要么提供在执行请求时具有太多魔力的解决方案,从而难以处理错误。如果它们不属于上述类别之一,则它们是第 3 方库或已弃用。

一些解决方案纯粹在 http 请求中工作得很好,但这些解决方案无法满足任何其他类型的请求,这很荒谬。这里不需要高度定制的解决方案。

简单地使用 python 内置库asyncio就足以执行任何类型的异步请求,并为复杂和用例特定的错误处理提供足够的流动性。

import asyncio

loop = asyncio.get_event_loop()

def do_thing(params):
    async def get_rpc_info_and_do_chores(id):
        # do things
        response = perform_grpc_call(id)
        do_chores(response)

    async def get_httpapi_info_and_do_chores(id):
        # do things
        response = requests.get(URL)
        do_chores(response)

    async_tasks = []
    for element in list(params.list_of_things):
       async_tasks.append(loop.create_task(get_chan_info_and_do_chores(id)))
       async_tasks.append(loop.create_task(get_httpapi_info_and_do_chores(ch_id)))

    loop.run_until_complete(asyncio.gather(*async_tasks))
Run Code Online (Sandbox Code Playgroud)

它的工作原理很简单。您正在创建一系列希望异步执行的任务,然后要求循环执行这些任务并在完成后退出。没有因缺乏维护而导致的额外库,也没有所需的功能。

  • 如果我理解正确的话,这会在执行 GRPC 和 HTTP 调用时阻止事件循环吗?因此,如果这些调用需要几秒钟才能完成,那么整个事件循环将阻塞几秒钟?为了避免这种情况,您需要使用“异步”的 GRPC 或 HTTP 库。然后你可以执行“await response = requests.get(URL)”。不? (7认同)

Dra*_*obZ 13

不幸的是,据我所知,请求库不具备执行异步请求的能力。您可以将async/await语法包裹在 周围requests,但这将使底层请求的同步性不会降低。如果您想要真正的异步请求,则必须使用提供它的其他工具。一种这样的解决方案是aiohttp(Python 3.5.3+)。根据我在 Python 3.7async/await语法中使用它的经验,它运行良好。下面我写了三个使用执行n个web请求的实现

  1. sync_requests_get_all使用 Pythonrequests库的纯同步请求 ( )
  2. async_requests_get_all使用requestsPython 3.7async/await语法包装的 Python库的同步请求 ( )和asyncio
  3. 使用Python 3.7语法包装async_aiohttp_get_all的 Pythonaiohttp库的真正异步实现 ( )async/awaitasyncio
import time
import asyncio
import requests
import aiohttp

from types import SimpleNamespace

durations = []


def timed(func):
    """
    records approximate durations of function calls
    """
    def wrapper(*args, **kwargs):
        start = time.time()
        print(f'{func.__name__:<30} started')
        result = func(*args, **kwargs)
        duration = f'{func.__name__:<30} finished in {time.time() - start:.2f} seconds'
        print(duration)
        durations.append(duration)
        return result
    return wrapper


async def fetch(url, session):
    """
    asynchronous get request
    """
    async with session.get(url) as response:
        response_json = await response.json()
        return SimpleNamespace(**response_json)


async def fetch_many(loop, urls):
    """
    many asynchronous get requests, gathered
    """
    async with aiohttp.ClientSession() as session:
        tasks = [loop.create_task(fetch(url, session)) for url in urls]
        return await asyncio.gather(*tasks)

@timed
def sync_requests_get_all(urls):
    """
    performs synchronous get requests
    """
    # use session to reduce network overhead
    session = requests.Session()
    return [SimpleNamespace(**session.get(url).json()) for url in urls]


@timed
def async_requests_get_all(urls):
    """
    asynchronous wrapper around synchronous requests
    """
    loop = asyncio.get_event_loop()
    # use session to reduce network overhead
    session = requests.Session()

    async def async_get(url):
        return session.get(url)

    async_tasks = [loop.create_task(async_get(url)) for url in urls]
    return loop.run_until_complete(asyncio.gather(*async_tasks))


@timed
def asnyc_aiohttp_get_all(urls):
    """
    performs asynchronous get requests
    """
    loop = asyncio.get_event_loop()
    return loop.run_until_complete(fetch_many(loop, urls))


if __name__ == '__main__':
    # this endpoint takes ~3 seconds to respond,
    # so a purely synchronous implementation should take
    # little more than 30 seconds and a purely asynchronous
    # implementation should take little more than 3 seconds.
    urls = ['https://postman-echo.com/delay/3']*10

    sync_requests_get_all(urls)
    async_requests_get_all(urls)
    asnyc_aiohttp_get_all(urls)
    print('----------------------')
    [print(duration) for duration in durations]
Run Code Online (Sandbox Code Playgroud)

在我的机器上,这是输出:

sync_requests_get_all          started
sync_requests_get_all          finished in 30.92 seconds
async_requests_get_all         started
async_requests_get_all         finished in 30.87 seconds
asnyc_aiohttp_get_all          started
asnyc_aiohttp_get_all          finished in 3.22 seconds
----------------------
sync_requests_get_all          finished in 30.92 seconds
async_requests_get_all         finished in 30.87 seconds
asnyc_aiohttp_get_all          finished in 3.22 seconds
Run Code Online (Sandbox Code Playgroud)

  • 绝对是一个错字 (4认同)
  • @CpILL它包装了一个返回协程(asyncio.gather的结果)的函数,因此可以在同步上下文中调用它。我喜欢这样做。您可以改为使用 [asyncio.run](https://docs.python.org/3/library/asyncio-task.html#asyncio.run) 来执行 asyncio.gather 目录的结果。 (2认同)

Uri*_*Uri 10

你可以用httpx它。

import httpx

async def get_async(url):
    async with httpx.AsyncClient() as client:
        return await client.get(url)

urls = ["http://google.com", "http://wikipedia.org"]

# Note that you need an async context to use `await`.
await asyncio.gather(*map(get_async, urls))
Run Code Online (Sandbox Code Playgroud)

如果您想要函数式语法,gamla会将其包装到get_async.

然后你可以做


await gamla.map(gamla.get_async(10))(["http://google.com", "http://wikipedia.org"])
Run Code Online (Sandbox Code Playgroud)

10是以秒为单位的超时。

(免责声明:我是它的作者)


Mon*_*son 7

我知道这已经关闭了一段时间,但我认为推广构建在请求库上的另一个异步解决方案可能会有用.

list_of_requests = ['http://moop.com', 'http://doop.com', ...]

from simple_requests import Requests
for response in Requests().swarm(list_of_requests):
    print response.content
Run Code Online (Sandbox Code Playgroud)

文档在这里:http://pythonhosted.org/simple-requests/


Tom*_*tie 6

如果您想使用 asyncio,requests-async则为requests- https://github.com/encode/requests-async提供 async/await 功能

  • 已确认,效果很好。在项目页面上,它说这项工作已被以下项目取代 https://github.com/encode/httpx (3认同)

小智 5

threads=list()

for requestURI in requests:
    t = Thread(target=self.openURL, args=(requestURI,))
    t.start()
    threads.append(t)

for thread in threads:
    thread.join()

...

def openURL(self, requestURI):
    o = urllib2.urlopen(requestURI, timeout = 600)
    o...
Run Code Online (Sandbox Code Playgroud)

  • 这是线程中的“正常”请求。不错的例子,购买是题外话。 (3认同)

归档时间:

查看次数:

205272 次

最近记录:

6 年,6 月 前