带代理的 Selenium 无法工作/选项错误?

Rap*_*898 6 python selenium proxies web-scraping

我有以下工作测试解决方案,可输出 IP 地址和信息 -

\n

现在我想将其与我的 ScraperAPI-Account 和其他代理一起使用。\n但是当我取消注释这两行时 -

\n
# PROXY = f'http://scraperapi:{SCRAPER_API}@proxy-server.scraperapi.com:8001'\n# options.add_argument('--proxy-server=%s' % PROXY) \n
Run Code Online (Sandbox Code Playgroud)\n

该解决方案不再有效 -

\n

我如何将我的代理与 selenium / 该代码一起使用?\n(scraperAPI 建议使用 selenium-wire 模块,但我不\xc2\xb4t 喜欢它,因为它对其他工具的特定版本有一些依赖 - 所以我想无需使用代理即可使用)

\n

这可能吗?

\n
import time\nfrom bs4 import BeautifulSoup\nfrom selenium import webdriver\nfrom selenium.webdriver.chrome.options import Options\nfrom selenium.webdriver.chrome.service import Service\nfrom sys import platform\nimport os, sys\nfrom selenium.webdriver.support.ui import WebDriverWait\nfrom webdriver_manager.chrome import ChromeDriverManager\nfrom fake_useragent import UserAgent\nfrom dotenv import load_dotenv, find_dotenv\n\nWAIT = 10\n\nload_dotenv(find_dotenv()) \nSCRAPER_API = os.environ.get("SCRAPER_API")\n# PROXY = f'http://scraperapi:{SCRAPER_API}@proxy-server.scraperapi.com:8001'\n\nsrv=Service(ChromeDriverManager().install())\nua = UserAgent()\nuserAgent = ua.random\noptions = Options()\noptions.add_argument('--headless')\noptions.add_experimental_option ('excludeSwitches', ['enable-logging'])\noptions.add_argument("start-maximized")\noptions.add_argument('window-size=1920x1080')                                 \noptions.add_argument('--no-sandbox')\noptions.add_argument('--disable-gpu')  \noptions.add_argument(f'user-agent={userAgent}')     \n# options.add_argument('--proxy-server=%s' % PROXY)     \npath = os.path.abspath (os.path.dirname (sys.argv[0]))\nif platform == "win32": cd = '/chromedriver.exe'\nelif platform == "linux": cd = '/chromedriver'\nelif platform == "darwin": cd = '/chromedriver'\ndriver = webdriver.Chrome (service=srv, options=options)    \nwaitWebDriver = WebDriverWait (driver, 10)  \n\nlink = "https://whatismyipaddress.com/"\ndriver.get (link)     \ntime.sleep(WAIT)\nsoup = BeautifulSoup (driver.page_source, 'html.parser')     \ntmpIP = soup.find("span", {"id": "ipv4"})\ntmpP = soup.find_all("p", {"class": "information"})\nfor e in tmpP:\n  tmpSPAN = e.find_all("span")\n  for e2 in tmpSPAN:\n    print(e2.text)\nprint(tmpIP.text)\n\ndriver.quit()\n
Run Code Online (Sandbox Code Playgroud)\n

Deb*_*anB 2

有几件事你需要回顾:

进行这些细微的调整并优化您的代码:

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from fake_useragent import UserAgent
from bs4 import BeautifulSoup

WAIT = 10
srv=Service(ChromeDriverManager().install())
ua = UserAgent()
userAgent = ua.random
options = Options()
options.add_argument('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
options.add_argument("start-maximized")
options.add_argument('window-size=1920x1080')                                 
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')  
options.add_argument(f'user-agent={userAgent}')     
driver = webdriver.Chrome (service=srv, options=options)    
waitWebDriver = WebDriverWait (driver, 10)  

link = "https://whatismyipaddress.com/"
driver.get(link)
driver.save_screenshot("whatismyipaddress.png")
time.sleep(WAIT)
soup = BeautifulSoup (driver.page_source, 'html.parser')     
tmpIP = soup.find("span", {"id": "ipv4"})
tmpP = soup.find_all("p", {"class": "information"})
for e in tmpP:
    tmpSPAN = e.find_all("span")
    for e2 in tmpSPAN:
            print(e2.text)
print(tmpIP.text)
driver.quit()
Run Code Online (Sandbox Code Playgroud)

控制台输出:

[WDM] -

[WDM] - ====== WebDriver manager ======
[WDM] - Current google-chrome version is 96.0.4664
[WDM] - Get LATEST driver version for 96.0.4664
[WDM] - Driver [C:\Users\Admin\.wdm\drivers\chromedriver\win32\96.0.4664.45\chromedriver.exe] found in cache
ISP:
Jio
City:
Pune
Region:
Maharashtra
Country:
India
123.12.234.23
Run Code Online (Sandbox Code Playgroud)

保存的屏幕截图:

WhatismyIP地址


使用代理

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from fake_useragent import UserAgent
from bs4 import BeautifulSoup

WAIT = 10

load_dotenv(find_dotenv()) 
SCRAPER_API = os.environ.get("SCRAPER_API")
PROXY = f'http://scraperapi:{SCRAPER_API}@proxy-server.scraperapi.com:8001'

srv=Service(ChromeDriverManager().install())
ua = UserAgent()
userAgent = ua.random
options = Options()
options.add_argument('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
options.add_argument("start-maximized")
options.add_argument('window-size=1920x1080')                                 
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')  
options.add_argument(f'user-agent={userAgent}') 
options.add_argument('--proxy-server={}'.format(PROXY))
path = os.path.abspath (os.path.dirname (sys.argv[0]))
if platform == "win32": cd = '/chromedriver.exe'
elif platform == "linux": cd = '/chromedriver'
elif platform == "darwin": cd = '/chromedriver'
driver = webdriver.Chrome (service=srv, options=options)    
waitWebDriver = WebDriverWait (driver, 10)  

link = "https://whatismyipaddress.com/"
driver.get(link)
driver.save_screenshot("whatismyipaddress.png")
time.sleep(WAIT)
soup = BeautifulSoup (driver.page_source, 'html.parser')     
tmpIP = soup.find("span", {"id": "ipv4"})
tmpP = soup.find_all("p", {"class": "information"})
for e in tmpP:
    tmpSPAN = e.find_all("span")
    for e2 in tmpSPAN:
            print(e2.text)
print(tmpIP.text)
driver.quit()
Run Code Online (Sandbox Code Playgroud)

注意print(f'http://scraperapi:{SCRAPER_API}@proxy-server.scraperapi.com:8001')并确保SCRAPER_API返回结果。


参考

您可以在以下位置找到一些相关的详细讨论: