相关疑难解决方法(0)

aiohttp:设置每秒的最大请求数

如何使用aiohttp在客户端设置每秒的最大请求数(限制它们)？

python python-asyncio aiohttp

png*_*iko

2017 01-29

18
推荐指数

3
解决办法

8554
查看次数

如何管理单个aiohttp.ClientSession？

作为一项学习练习，我试图修改aiohttp的快速入门示例，以使用单个ClientSession提取多个URL（文档建议通常应为每个应用程序创建一个ClientSession）。

import aiohttp
import asyncio

async def fetch(session, url):
  async with session.get(url) as response:
    return await response.text()

async def main(url, session):
  print(f"Starting '{url}'")
  html = await fetch(session, url)
  print(f"'{url}' done")

urls = (
  "https://python.org",
  "https://twitter.com",
  "https://tumblr.com",
  "https://example.com",
  "https://github.com",
)

loop = asyncio.get_event_loop()
session = aiohttp.ClientSession()
loop.run_until_complete(asyncio.gather(
  *(loop.create_task(main(url, session)) for url in urls)
))
# session.close()   <- this doesn't make a difference

Run Code Online (Sandbox Code Playgroud)

但是，在协程之外创建ClientSession显然不是可行的方法：

？python 1_async.py
1_async.py:30：用户警告：在协程之外创建客户端会话是一个非常危险的想法
  会话= aiohttp.ClientSession（）
在协程之外创建客户端会话
client_session： 
开始'https://python.org'
开始“ https://twitter.com”
开始'https://tumblr.com'
开始“ https://example.com”
开始'https://github.com'
'https://twitter.com'完成
“ https://example.com”完成
'https://github.com'完成 …

python python-asyncio aiohttp

Tam*_*lei

lucky-day

7
推荐指数

1
解决办法

2599
查看次数

aiohttp：按域限制每秒请求数

我正在编写一个网络爬虫，它为许多不同的域运行并行提取。我想限制对每个域发出的每秒请求数，但我不关心打开的连接总数，或者所有域中发出的每秒请求总数。我想最大限度地增加打开连接数和每秒请求数，同时限制对单个域的每秒请求数。

我可以找到的所有当前存在的示例 (1) 限制打开的连接数或 (2) 限制在 fetch 循环中每秒发出的请求总数。例子包括：

他们都没有做我所要求的，即在每个域的基础上限制每秒请求数。第一个问题只回答如何限制每秒请求数。第二个甚至没有对实际问题的答案（OP 询问每秒请求数，答案都在谈论限制连接数）。

这是我尝试的代码，使用我为同步版本制作的简单速率限制器，当 DomainTimer 代码在异步事件循环中运行时，它不起作用：

from collections import defaultdict
from datetime import datetime, timedelta
import asyncio
import async_timeout
import aiohttp
from urllib.parse import urlparse
from queue import Queue, Empty

from HTMLProcessing import processHTML
import URLFilters

SEED_URLS = ['http://www.bbc.co.uk', 'http://www.news.google.com']
url_queue = Queue()
for u in SEED_URLS:
    url_queue.put(u)

# number of pages to download per run of crawlConcurrent()
BATCH_SIZE = 100
DELAY = timedelta(seconds = 1.0) # delay …

Run Code Online (Sandbox Code Playgroud)

python asynchronous rate-limiting python-asyncio aiohttp

J. *_*lor

2018 04-08

5
推荐指数

1
解决办法

1525
查看次数

限制最大并发连接数是否也限制了并发请求数？

这个 Web 教程和这个 SO answer建议使用信号量来限制使用 aiohttp 发出的并发请求的数量。

我很困惑，因为aiohttp它自己提供了一个工具来限制并发连接的数量（limit和limit_per_host，记录在这里） - 那么使用信号量不是重新发明轮子吗？

也许不是。每个连接可以有多个并发请求吗？根据这篇维基百科文章和这个 SO answer，似乎可以。因此，根据我链接到限制并发连接的文档，设置limit和/或limit_per_hostin可能没有限制并发请求的效果。aiohttp

我仍然很困惑，因为如果是这种情况，这些参数aiohttp提供的用途是什么？为什么用户想要限制连接而不是请求？但是这样的推理当然不需要任何东西，所以我准备继续前进并使用信号量。

然后我偶然发现了这个问题。它有两个相对高度赞成的答案。这些答案之一再次建议使用信号量。但另一个答案建议使用这些aiohttp设施limit和limit_per_host. 如果这个答案是正确的，那么限制连接在aiohttp也限制了请求-所以没有信号量是必要的_{（除非还希望限制每秒请求的速率，这是不是我在这里抢断）}

这就是我想在这个问题中提出的问题。是否限制并发连接，在aiohttp通过limit和/或limit_per_host也限制并发请求？我想答案取决于aiohttp每个请求是否只使用一个连接，我也不知道。

难道aiohttp只使用每个请求一个连接？限制并发连接是否也限制并发请求？

python python-asyncio aiohttp

作者

lucky-day

5
推荐指数

1
解决办法

594
查看次数

为什么我的脚本在从网络上抓取 img 时停止？

我目前正在尝试使用一个脚本来丰富机器学习的数据集，该脚本允许我从谷歌下载图像。

我首先浏览一个包含要在 google 上搜索的字段的数据框，然后使用 selenium webdriver 检索要下载的图像的 url，并通过此函数根据字段将它们保存在特定文件夹中：

def download_image(file_path, url, file_name): try: response = requests.get(url) response.raise_for_status() with open(os.path.join(file_path, file_name), 'wb') as file: file.write(response.content) print(f"Image downloaded successfully to {os.path.join(file_path, file_name)}") except requests.exceptions.HTTPError as http_error: print(f"HTTP error occurred: {http_error}") except Exception as error: print(f"An error occurred: {error}")
Run Code Online (Sandbox Code Playgroud)
在此循环中调用：

def enhanced_dataset_folder(name:str, tag:str, df): DRIVER_PATH = "chromedriver" wd = webdriver.Chrome(DRIVER_PATH) urls = get_images(tag, wd, 1, 2) folder_name = name.split('/')[0] props = tag.split(' ') test = [] for i, url in enumerate(urls): try: img_name …
Run Code Online (Sandbox Code Playgroud)

python machine-learning selenium-chromedriver selenium-webdriver jupyter-notebook

N7L*_*end

2023 03-22

2
推荐指数

1
解决办法

228
查看次数

标签统计

python ×5

aiohttp ×4

python-asyncio ×4

asynchronous ×1

jupyter-notebook ×1

machine-learning ×1

rate-limiting ×1

selenium-chromedriver ×1

selenium-webdriver ×1

aiohttp:设置每秒的最大请求数

如何管理单个aiohttp.ClientSession？

aiohttp：按域限制每秒请求数

限制最大并发连接数是否也限制了并发请求数？

为什么我的脚本在从网络上抓取 img 时停止？

标签 统计

标签统计