我想从网站下载/抓取 5000 万条日志记录。我没有一次性下载 5000 万个,而是尝试使用以下代码一次下载 1000 万个,但它一次只能处理 20,000 个(超过这个数量会引发错误),因此它变得非常耗时下载那么多数据。目前下载20000条记录的速度需要3-4分钟,100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 20000/20000 [03:48<00:00, 87.41it/s]那么如何加速呢?
import asyncio\nimport aiohttp\nimport time\nimport tqdm\nimport nest_asyncio\n\nnest_asyncio.apply()\n\n\nasync def make_numbers(numbers, _numbers):\n for i in range(numbers, _numbers):\n yield i\n\n\nn = 0\nq = 10000000\n\n\nasync def fetch():\n # example\n url = "https://httpbin.org/anything/log?id="\n\n async with aiohttp.ClientSession() as session:\n post_tasks = []\n # prepare the coroutines that poat\n async for x in make_numbers(n, q):\n post_tasks.append(do_get(session, url, x))\n # now execute them all at once\n\n responses = [await f for f in tqdm.tqdm(asyncio.as_completed(post_tasks), …Run Code Online (Sandbox Code Playgroud)