如何解决 urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=58408): Max retries exceeded with url

Question

如何解决 urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=58408): Max retries exceeded with url

Sum*_*wal 5 selenium webdriver beautifulsoup web-scraping selenium-webdriver

我正在尝试用 selenium 抓取网站的几页并使用结果，但是当我运行该函数两次时

[WinError 10061] No connection could be made because the target machine actively refused it'

Run Code Online (Sandbox Code Playgroud)

第二个函数调用出现错误。这是我的方法：

import os
import re
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup as soup

opts = webdriver.ChromeOptions()
opts.binary_location = os.environ.get('GOOGLE_CHROME_BIN', None)
opts.add_argument("--headless")
opts.add_argument("--disable-dev-shm-usage")
opts.add_argument("--no-sandbox")
browser = webdriver.Chrome(executable_path="CHROME_DRIVER PATH", options=opts)

lst =[]
def search(st):
    for i in range(1,3):
        url = "https://gogoanime.so/anime-list.html?page=" + str(i)
        browser.get(url)
        req = browser.page_source
        sou = soup(req, "html.parser")
        title = sou.find('ul', class_ = "listing")
        title = title.find_all("li")
        for j in range(len(title)):
            lst.append(title[j].getText().lower()[1:])
    browser.quit()
    print(len(lst))
    
search("a")
search("a")

Run Code Online (Sandbox Code Playgroud)

输出

272
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=58408): Max retries exceeded with url: /session/4b3cb270d1b5b867257dcb1cee49b368/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001D5B378FA60>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

Run Code Online (Sandbox Code Playgroud)

Answer 1

Deb*_*anB 13

这个错误信息...

raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=58408): Max retries exceeded with url: /session/4b3cb270d1b5b867257dcb1cee49b368/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001D5B378FA60>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

Run Code Online (Sandbox Code Playgroud)

...意味着无法建立新连接，引发MaxRetryError，因为无法建立连接。

有几件事：

首先也是最重要的是，根据讨论，最大重试次数超出异常令人困惑，回溯有些误导。为了用户的方便，请求包装了异常。原始异常是显示消息的一部分。
retries=0请求从不重试（它为 urllib3设置），因此如果没有MaxRetryError和HTTPConnectionPoolHTTPConnectionPool关键字，错误会更加规范。所以理想的回溯应该是：

ConnectionError(<class 'socket.error'>: [Errno 1111] Connection refused)
Run Code Online (Sandbox Code Playgroud)

根本原因和解决方案

一旦你启动了 webdriver 和 web 客户端会话，接下来def search(st)你将调用get()o 访问一个url，并且在后续行中你还将调用browser.quit() 它用于调用/shutdown端点，随后 webdriver 和 web 客户端实例将被完全销毁关闭所有页面/选项卡/窗口。因此不再存在任何连接。

您可以在以下位置找到一些相关的详细讨论：

PhantomJS Web 驱动程序保留在内存中

Selenium：如何停止 geckodriver 进程影响 PC 内存，而不调用 driver.quit()？

在这种情况下，在下一次迭代（由于for循环）中browser.get()调用时，没有活动连接。因此你会看到错误。

因此，一个简单的解决方案是删除该行并在同一浏览上下文browser.quit()中调用。browser.get(url)

结论

升级到Selenium 3.14.1后，您将能够设置超时并查看规范的回溯，并能够采取所需的操作。

参考

您可以在以下位置找到相关的详细讨论：

MaxRetryError：HTTPConnectionPool：超出最大重试次数（由 ProtocolError（'连接中止。'，错误（111，'连接被拒绝'））引起）

TL; 博士

一些相关的讨论：

添加 max_retries 作为参数

删除了捆绑的 Charade 和 urllib3。

第三方图书馆逐字承诺

我想我通过将 `browser = webdriver.Chrome(executable_path="CHROME_DRIVER PATH", options=opts)` 带入函数内部并在其末尾使用 `browser.quit()` 解决了这两个问题。谢谢您的帮助。 (2认同)

归档时间：	5 年前
查看次数：	11160 次
最近记录：	4 年，9 月前