我正在网站上进行网页抓取,有时在运行脚本时出现此错误:
ReadTimeout: HTTPSConnectionPool(host='...', port=443): Read timed out. (read timeout=10)
Run Code Online (Sandbox Code Playgroud)
我的代码:
url = 'mysite.com'
all_links_page = []
page_one = requests.get(url, headers=getHeaders(), timeout=10)
sleep(2)
if page_one.status_code == requests.codes.ok:
soup_one = BeautifulSoup(page_one.content.decode('utf-8'), 'lxml')
page_links_one = soup_one.select("ul.product_list")
for links_one in page_links_one:
for li in links_one.select("li"):
all_links_page.append(li.a.get("href").strip())
Run Code Online (Sandbox Code Playgroud)
我找到的答案并不令人满意
python beautifulsoup web-scraping python-3.x python-requests
我在尝试访问脚本请求的 url 时显然遇到了这个错误,没有特定的。我不明白这个错误的确切原因,但我想处理它,以免在它发生时中止脚本。
这使重复,但不能解决我的问题:如何避免错误:selenium.common.exceptions.SessionNotCreatedException:消息:会话不是从选项卡创建的崩溃
代码:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--incognito')
chrome_options.add_argument('--headless')
driver = webdriver.Chrome("/driver/chromedriver", options=chrome_options)
Run Code Online (Sandbox Code Playgroud)
错误:
Traceback (most recent call last):
File "scripts/page11.py", line 15, in <module>
driver = webdriver.Chrome(BASE_WEB_DRIVER, options=chrome_options)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 76, in __init__
RemoteWebDriver.__init__(
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line …Run Code Online (Sandbox Code Playgroud) 我在 Debian 8 上的 python 3.7.5 上进行了手动安装,当我运行脚本时出现此错误:
<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1076)>
Run Code Online (Sandbox Code Playgroud)
我在 stackoverflow 中看到了几个关于 MacOS 的问题,就我而言,这个错误出现在 Linux 中。