Python asyncio / aiohttp：ValueError：Windows上的select（）中的文件描述符过多

Question

Python asyncio / aiohttp：ValueError：Windows上的select（）中的文件描述符过多

Jos*_*sep 4 python python-3.x async-await python-asyncio aiohttp

大家好，我在尝试理解asyncio和aiohttp并使两者正常工作方面遇到困难。不仅我不正确地了解自己在做什么，这时我遇到了一个我不知道如何解决的问题。

我正在使用Windows 10 64位最新更新。

以下代码使用asyncio返回了标题中Content-Type中不包含html的页面列表。

import asyncio
import aiohttp

MAXitems = 30

async def getHeaders(url, session, sema):
    async with session:
        async with sema:
            try:
                async with session.head(url) as response:
                    try:
                        if "html" in response.headers["Content-Type"]:
                            return url, True
                        else:
                            return url, False
                    except:
                        return url, False
            except:
                return url, False


def checkUrlsWithoutHtml(listOfUrls):
    headersWithoutHtml = set()
    while(len(listOfUrls) != 0):
        blockurls = []
        print(len(listOfUrls))
        items = 0
        for num in range(0, len(listOfUrls)):
            if num < MAXitems:
                blockurls.append(listOfUrls[num - items])
                listOfUrls.remove(listOfUrls[num - items])
                items +=1
        loop = asyncio.get_event_loop()
        semaphoreHeaders = asyncio.Semaphore(50)
        session = aiohttp.ClientSession()
        data = loop.run_until_complete(asyncio.gather(*(getHeaders(url, session, semaphoreHeaders) for url in blockurls)))
        for header in data:
            if False == header[1]:
                headersWithoutHtml.add(header)
    return headersWithoutHtml


listOfUrls = ['http://www.google.com', 'http://www.reddit.com']
headersWithoutHtml=  checkUrlsWithoutHtml(listOfUrls)

for header in headersWithoutHtml:
    print(header[0])

Run Code Online (Sandbox Code Playgroud)

当我运行它时，假设有2000个网址（有时），返回的内容如下：

data = loop.run_until_complete(asyncio.gather(*(getHeaders(url, session, semaphoreHeaders) for url in blockurls)))
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 454, in run_until_complete
    self.run_forever()
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 421, in run_forever
    self._run_once()
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 1390, in _run_once
    event_list = self._selector.select(timeout)
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\selectors.py", line 323, in select
    r, w, _ = self._select(self._readers, self._writers, [], timeout)
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\selectors.py", line 314, in _select
    r, w, x = select.select(r, w, w, timeout)
ValueError: too many file descriptors in select()

Run Code Online (Sandbox Code Playgroud)

注意1：我用用户中的USER删除了我的名字。

注意2：不管出于什么原因，reddit.com因为不包含HTML而返回，所以这是一个完全独立的问题，我将尝试解决，但是，如果您发现我的代码中存在其他一些不一致的地方，请予以解决。

注意3：我的代码结构不好，因为我尝试更改许多内容以尝试调试此问题，但是我没有运气。

我在某处听说这是Windows的限制，无法绕过它，问题是：

a）我直接不明白“ select（）中的文件描述符太多”是什么意思。

b）Windows无法处理的我在做什么错？我已经看到人们使用asyncio和aiohttp推送成千上万的请求，但是即使我大手笔，我也无法在没有出现值错误的情况下推送30-50？

编辑：事实证明MAXitems = 10尚未使我崩溃，但是由于我无法遵循模式，所以我不知道为什么或怎么告诉我任何事情。

Edit2：没关系，它需要更多的时间来崩溃，但是即使在MAXitems = 10的情况下，它也最终崩溃了

Answer 1

Jam*_* Ko 7

我也有同样的问题。不能 100% 确定这一定有效，但请尝试替换它：

session = aiohttp.ClientSession()

Run Code Online (Sandbox Code Playgroud)

有了这个：

connector = aiohttp.TCPConnector(limit=60)
session = aiohttp.ClientSession(connector=connector)

Run Code Online (Sandbox Code Playgroud)

默认情况下limit设置为 100 ( docs )，这意味着客户端可以同时打开 100 个连接。正如 Andrew 提到的，Windows 一次只能打开 64 个套接字，因此我们提供了一个低于 64 的数字。

Answer 2

And*_*lov 5

默认情况下，Windows在asyncio循环中只能使用64个套接字。这是对底层select（） API调用的限制。

要增加限制，请使用ProactorEventLoop。安装说明可在此处找到。

归档时间：	8 年，1 月前
查看次数：	3526 次
最近记录：	6 年，4 月前