相关疑难解决方法(0)

在Python 3.5中使用aiohttp获取多个URL

因为Python 3.5引入了async with在推荐的语法文档的aiohttp改变.现在要获得一个网址,他们建议:

import aiohttp
import asyncio

async def fetch(session, url):
    with aiohttp.Timeout(10):
        async with session.get(url) as response:
            return await response.text()

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    with aiohttp.ClientSession(loop=loop) as session:
        html = loop.run_until_complete(
            fetch(session, 'http://python.org'))
        print(html)

Run Code Online (Sandbox Code Playgroud)

如何修改此设置以获取网址集合而不仅仅是一个网址？

在旧asyncio示例中,您将设置一个任务列表,例如

    tasks = [
            fetch(session, 'http://cnn.com'),
            fetch(session, 'http://google.com'),
            fetch(session, 'http://twitter.com')
            ]

Run Code Online (Sandbox Code Playgroud)

我试图将这样的列表与上面的方法结合起来但是失败了.

python web-scraping python-3.x python-asyncio aiohttp

Han*_*ler

2018 02-11

11
推荐指数

1
解决办法

5289
查看次数

使用 asyncio/aiohttp 获取多个 URL 并重试失败

我正在尝试使用 aiohttp 包编写一些异步 GET 请求，并且已经弄清楚了大部分内容，但是我想知道处理失败（作为异常返回）时的标准方法是什么。

到目前为止我的代码的总体思路（经过一些试验和错误，我遵循这里的方法）：

import asyncio
import aiofiles
import aiohttp
from pathlib import Path

with open('urls.txt', 'r') as f:
    urls = [s.rstrip() for s in f.readlines()]

async def fetch(session, url):
    async with session.get(url) as response:
        if response.status != 200:
            response.raise_for_status()
        data = await response.text()
    # (Omitted: some more URL processing goes on here)
    out_path = Path(f'out/')
    if not out_path.is_dir():
        out_path.mkdir()
    fname = url.split("/")[-1]
    async with aiofiles.open(out_path / f'{fname}.html', 'w+') as f:
        await f.write(data)

async def fetch_all(urls, …

Run Code Online (Sandbox Code Playgroud)

python python-asyncio aiohttp

Lou*_*dox

lucky-day

4
推荐指数

1
解决办法

2616
查看次数