Selenium 应用程序在 Heroku 上托管时重定向到 Cloudflare 页面

Question

Selenium 应用程序在 Heroku 上托管时重定向到 Cloudflare 页面

raf*_*u38 11 python selenium captcha heroku cloudflare

我制作了一个不和谐的机器人，它使用 selenium 访问网站并获取信息，当我在本地运行代码时，我没有任何问题，但是当我部署到 Heroku 时，我得到的第一个 URL 将我重定向到 page Attention Required! | Cloudflare。

我努力了：

Selenium webdriver：修改 navigator.webdriver 标志以防止 selenium 检测

还有许多其他具有我使用的相同设置的：

options = Options()
options.binary_location = os.environ.get("GOOGLE_CHROME_BIN")
options.add_experimental_option("excludeSwitches", ["enable-logging", "enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("--headless")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")
self.driver = webdriver.Chrome(executable_path=os.environ.get("CHROMEDRIVER_PATH"), options=options)
self.driver.execute_cdp_cmd('Network.setUserAgentOverride', {
    "userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})

Run Code Online (Sandbox Code Playgroud)

但这不起作用，代码仅在本地运行

PS：本地我在Windows上

我重定向到的页面来源： https://gist.github.com/rafalou38/9ae95bd66e86d2171fc8a45cebd9720c

Answer 1

Deb*_*anB 11

如果Selenium驱动的ChromeDriver启动的google-chrome 浏览上下文被重定向到该页面...

需要注意！ | 云耀...

...这意味着Cloudflare程序正在阻止您的程序访问AUT（测试中的应用程序）。

分析

Cloudflare阻止访问的原因可能有以下几种：

Cloudflare已将您的程序识别为机器人，并且访问被拒绝。您可以在网站可以检测到您何时将 selenium 与 chromedriver 一起使用时找到详细的讨论吗？。

由于以下因素，访问可能被拒绝：

Cloudflare正在尝试应对可能的字典攻击。
您的系统 IP 已被Cloudflare列入黑名单，无法使用您的系统挖掘 比特币或门罗币。

在这些情况下，您最终会被重定向到验证码页面。

解决方案

在这些情况下，潜在的解决方案是使用unDetected-chromedriver来初始化Chrome 浏览上下文。

unDetected-chromedriver是一个优化的 Selenium Chromedriver 补丁，它不会触发 Distill Network / Imperva / DataDome / Botprotect.io 等反机器人服务。它会自动下载驱动程序二进制文件并对其进行修补。

代码块：

import undetected_chromedriver as uc
from selenium import webdriver

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
driver = uc.Chrome(options=options)
driver.get('https://bet365.com')

Run Code Online (Sandbox Code Playgroud)

替代解决方案

另一种解决方案是通过Project Honey Pot网站将您的 IP 地址列入白名单，您可以在标题为“注意需要多一步验证码 CloudFlare 错误”的视频中找到详细的端到端流程。

它仍然无法正常工作，我首先尝试了您所输入的内容，然后尝试了我之前的参数，但它没有帮助，我仍然得到这个 Cloudflare 页面 (2认同)

归档时间：	5 年，7 月前
查看次数：	21397 次
最近记录：	3 年，9 月前