同时发出多个异步请求

Cos*_*tin 2 python python-3.x python-requests python-asyncio

我正在尝试同时调用~300个API调用,这样我最多可以在几秒内得到结果.

我的伪代码看起来像这样:

def function_1():
    colors = ['yellow', 'green', 'blue', + ~300 other ones]
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    res = loop.run_until_complete(get_color_info(colors))

async def get_color_info(colors):
    loop = asyncio.get_event_loop()
    responses = []
    for color in colors:
        print("getting color")
        url = "https://api.com/{}/".format(color)
        data = loop.run_in_executor(None, requests.get, url)
        r = await data
        responses.append(r.json())
    return responses
Run Code Online (Sandbox Code Playgroud)

这样做我getting color每隔一秒左右打印一次,代码需要永远,所以我很确定它们不会同时运行.我究竟做错了什么?

Bra*_*mon 11

aiohttp与Native Coroutines(async/ await)

这是一种典型的模式,可以完成你想要做的事情.(Python 3.7+.)

其中一个重大变化是,你将需要移动的requests,这是为同步IO建成,到包装等aiohttp,是专门建有工作async/ await(本机协同程序):

import asyncio
import aiohttp  # pip install aiohttp aiodns


async def get(
    session: aiohttp.ClientSession,
    color: str,
    **kwargs
) -> dict:
    url = f"https://api.com/{color}/"
    print(f"Requesting {url}")
    resp = await session.request('GET', url=url, **kwargs)
    # Note that this may raise an exception for non-2xx responses
    # You can either handle that here, or pass the exception through
    data = await resp.json()
    print(f"Received data for {url}")
    return data


async def main(colors, **kwargs):
    # Asynchronous context manager.  Prefer this rather
    # than using a different session for each GET request
    async with aiohttp.ClientSession() as session:
        tasks = []
        for c in colors:
            tasks.append(get(session=session, color=c, **kwargs))
        # asyncio.gather() will wait on the entire task set to be
        # completed.  If you want to process results greedily as they come in,
        # loop over asyncio.as_completed()
        htmls = await asyncio.gather(*tasks, return_exceptions=True)
        return htmls


if __name__ == '__main__':
    colors = ['red', 'blue', 'green']  # ...
    # Either take colors from stdin or make some default here
    asyncio.run(main(colors))  # Python 3.7+
Run Code Online (Sandbox Code Playgroud)

这有两个不同的元素,一个是协同程序的异步方面,另一个是指定任务容器(期货)时引入的并发性:

  • 你创建了一个get使用await两个等待的协程:第一个是.request第二个,第二个是.json.这是异步方面.await这些IO绑定响应的目的是告诉事件循环其他get()调用可以轮流运行同一例程.
  • 并发方面封装在await asyncio.gather(*tasks).这会将等待的get()呼叫映射到您的每个人colors.结果是返回值的汇总列表.请注意,此包装将等待所有响应进入并调用.json().或者,如果你想在它们准备就绪时贪婪地处理它们,你可以循环asyncio.as_completed:返回的每个Future对象代表剩余等待的集合中最早的结果.

最后,请注意,这asyncio.run()是Python 3.7中引入的高级"瓷器"功能.在早期版本中,您可以(大致)模仿它:

# The "full" versions makes a new event loop and calls
# loop.shutdown_asyncgens(), see link above
loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(main(colors))
finally:
    loop.close()
Run Code Online (Sandbox Code Playgroud)

限制请求

有许多方法可以限制并发率.例如,请参阅asyncio.semaphoreasync-await函数具有有限并发性的大量任务.

  • @user4815162342 这是我最近写的一篇文章 - 感谢任何反馈和更正。https://realpython.com/async-io-python/ (2认同)
  • 很棒的文章和答案,@BradSolomon!2019年以来有更新吗? (2认同)