使用 asyncio/aiohttp 获取多个 URL 并重试失败

Question

使用 asyncio/aiohttp 获取多个 URL 并重试失败

Lou*_*dox 4 python python-asyncio aiohttp

我正在尝试使用 aiohttp 包编写一些异步 GET 请求，并且已经弄清楚了大部分内容，但是我想知道处理失败（作为异常返回）时的标准方法是什么。

到目前为止我的代码的总体思路（经过一些试验和错误，我遵循这里的方法）：

import asyncio
import aiofiles
import aiohttp
from pathlib import Path

with open('urls.txt', 'r') as f:
    urls = [s.rstrip() for s in f.readlines()]

async def fetch(session, url):
    async with session.get(url) as response:
        if response.status != 200:
            response.raise_for_status()
        data = await response.text()
    # (Omitted: some more URL processing goes on here)
    out_path = Path(f'out/')
    if not out_path.is_dir():
        out_path.mkdir()
    fname = url.split("/")[-1]
    async with aiofiles.open(out_path / f'{fname}.html', 'w+') as f:
        await f.write(data)

async def fetch_all(urls, loop):
    async with aiohttp.ClientSession(loop=loop) as session:
        results = await asyncio.gather(*[fetch(session, url) for url in urls],
                return_exceptions=True)
        return results

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    results = loop.run_until_complete(fetch_all(urls, loop))

Run Code Online (Sandbox Code Playgroud)

现在运行良好：

正如预期的那样，该results变量填充有成功请求None相应 URL [即在urls数组变量中的相同索引处，即在输入文件中的相同行号urls.txt] 的条目，并将相应的文件写入磁盘。
这意味着我可以使用 results 变量来确定哪些 URL 不成功（那些条目results不等于None）

我已经看过了几个不同的导游使用各种异步Python包（aiohttp，aiofiles，和asyncio），但我还没有看到标准的方法来处理这最后一步。

是否应该在await语句“完成”/“完成”之后重试发送 GET 请求？
...或者是否应该在失败时通过某种回调启动重试发送 GET 请求
- 错误看起来像这样：(ClientConnectorError(111, "Connect call failed ('000.XXX.XXX.XXX', 443)")即对000.XXX.XXX.XXX端口IP 地址的请求443失败，可能是因为服务器有一些限制，我应该在重试之前等待超时来尊重。
我是否可以考虑设置某种限制来批量处理请求而不是全部尝试？
在我的列表中尝试几百个（超过 500 个）URL 时，我收到了大约 40-60 个成功的请求。

天真地，我希望run_until_complete以这样一种方式处理这个问题，即它会在成功请求所有 URL 后完成，但事实并非如此。

我之前没有使用过异步 Python 和会话/循环，因此如果您能帮助我找到如何获取results. 请让我知道我是否可以提供更多信息来改进这个问题，谢谢！

Answer 1

use*_*342 6

是否应该在 await 语句“完成”/“完成”之后重试发送 GET 请求？...或者是否应该在失败时通过某种回调启动重试发送 GET 请求

你可以做前者。您不需要任何特殊的回调，因为您是在协程内部执行的，所以一个简单的while循环就足够了，并且不会干扰其他协程的执行。例如：

async def fetch(session, url):
    data = None
    while data is None:
        try:
            async with session.get(url) as response:
                response.raise_for_status()
                data = await response.text()
        except aiohttp.ClientError:
            # sleep a little and try again
            await asyncio.sleep(1)
    # (Omitted: some more URL processing goes on here)
    out_path = Path(f'out/')
    if not out_path.is_dir():
        out_path.mkdir()
    fname = url.split("/")[-1]
    async with aiofiles.open(out_path / f'{fname}.html', 'w+') as f:
        await f.write(data)

Run Code Online (Sandbox Code Playgroud)

天真地，我希望run_until_complete以这样一种方式处理这个问题，即它会在成功请求所有 URL 后完成

术语“完成”是指协程完成（运行其过程）的技术意义，这是通过协程返回或引发异常来实现的。

归档时间：	6 年，10 月前
查看次数：	2616 次
最近记录：	6 年，10 月前