Rap*_*898 6 python selenium proxies web-scraping
我有以下工作测试解决方案,可输出 IP 地址和信息 -
\n现在我想将其与我的 ScraperAPI-Account 和其他代理一起使用。\n但是当我取消注释这两行时 -
\n# PROXY = f'http://scraperapi:{SCRAPER_API}@proxy-server.scraperapi.com:8001'\n# options.add_argument('--proxy-server=%s' % PROXY) \nRun Code Online (Sandbox Code Playgroud)\n该解决方案不再有效 -
\n我如何将我的代理与 selenium / 该代码一起使用?\n(scraperAPI 建议使用 selenium-wire 模块,但我不\xc2\xb4t 喜欢它,因为它对其他工具的特定版本有一些依赖 - 所以我想无需使用代理即可使用)
\n这可能吗?
\nimport time\nfrom bs4 import BeautifulSoup\nfrom selenium import webdriver\nfrom selenium.webdriver.chrome.options import Options\nfrom selenium.webdriver.chrome.service import Service\nfrom sys import platform\nimport os, sys\nfrom selenium.webdriver.support.ui import WebDriverWait\nfrom webdriver_manager.chrome import ChromeDriverManager\nfrom fake_useragent import UserAgent\nfrom dotenv import load_dotenv, find_dotenv\n\nWAIT = 10\n\nload_dotenv(find_dotenv()) \nSCRAPER_API = os.environ.get("SCRAPER_API")\n# PROXY = f'http://scraperapi:{SCRAPER_API}@proxy-server.scraperapi.com:8001'\n\nsrv=Service(ChromeDriverManager().install())\nua = UserAgent()\nuserAgent = ua.random\noptions = Options()\noptions.add_argument('--headless')\noptions.add_experimental_option ('excludeSwitches', ['enable-logging'])\noptions.add_argument("start-maximized")\noptions.add_argument('window-size=1920x1080') \noptions.add_argument('--no-sandbox')\noptions.add_argument('--disable-gpu') \noptions.add_argument(f'user-agent={userAgent}') \n# options.add_argument('--proxy-server=%s' % PROXY) \npath = os.path.abspath (os.path.dirname (sys.argv[0]))\nif platform == "win32": cd = '/chromedriver.exe'\nelif platform == "linux": cd = '/chromedriver'\nelif platform == "darwin": cd = '/chromedriver'\ndriver = webdriver.Chrome (service=srv, options=options) \nwaitWebDriver = WebDriverWait (driver, 10) \n\nlink = "https://whatismyipaddress.com/"\ndriver.get (link) \ntime.sleep(WAIT)\nsoup = BeautifulSoup (driver.page_source, 'html.parser') \ntmpIP = soup.find("span", {"id": "ipv4"})\ntmpP = soup.find_all("p", {"class": "information"})\nfor e in tmpP:\n tmpSPAN = e.find_all("span")\n for e2 in tmpSPAN:\n print(e2.text)\nprint(tmpIP.text)\n\ndriver.quit()\nRun Code Online (Sandbox Code Playgroud)\n
有几件事你需要回顾:
首先,似乎有一个错字。get和之间有空格字符()可能会导致:
IndexError: list index out of range
Run Code Online (Sandbox Code Playgroud)
不确定以下行的作用,因为我可以在没有该行的情况下执行。你可能喜欢评论它。
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())
Run Code Online (Sandbox Code Playgroud)
如果您想停止使用SCRAPER_API,请同时注释掉以下行:
SCRAPER_API = os.environ.get("SCRAPER_API")
Run Code Online (Sandbox Code Playgroud)
进行这些细微的调整并优化您的代码:
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from fake_useragent import UserAgent
from bs4 import BeautifulSoup
WAIT = 10
srv=Service(ChromeDriverManager().install())
ua = UserAgent()
userAgent = ua.random
options = Options()
options.add_argument('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
options.add_argument("start-maximized")
options.add_argument('window-size=1920x1080')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
options.add_argument(f'user-agent={userAgent}')
driver = webdriver.Chrome (service=srv, options=options)
waitWebDriver = WebDriverWait (driver, 10)
link = "https://whatismyipaddress.com/"
driver.get(link)
driver.save_screenshot("whatismyipaddress.png")
time.sleep(WAIT)
soup = BeautifulSoup (driver.page_source, 'html.parser')
tmpIP = soup.find("span", {"id": "ipv4"})
tmpP = soup.find_all("p", {"class": "information"})
for e in tmpP:
tmpSPAN = e.find_all("span")
for e2 in tmpSPAN:
print(e2.text)
print(tmpIP.text)
driver.quit()
Run Code Online (Sandbox Code Playgroud)
控制台输出:
[WDM] -
[WDM] - ====== WebDriver manager ======
[WDM] - Current google-chrome version is 96.0.4664
[WDM] - Get LATEST driver version for 96.0.4664
[WDM] - Driver [C:\Users\Admin\.wdm\drivers\chromedriver\win32\96.0.4664.45\chromedriver.exe] found in cache
ISP:
Jio
City:
Pune
Region:
Maharashtra
Country:
India
123.12.234.23
Run Code Online (Sandbox Code Playgroud)
保存的屏幕截图:

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from fake_useragent import UserAgent
from bs4 import BeautifulSoup
WAIT = 10
load_dotenv(find_dotenv())
SCRAPER_API = os.environ.get("SCRAPER_API")
PROXY = f'http://scraperapi:{SCRAPER_API}@proxy-server.scraperapi.com:8001'
srv=Service(ChromeDriverManager().install())
ua = UserAgent()
userAgent = ua.random
options = Options()
options.add_argument('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
options.add_argument("start-maximized")
options.add_argument('window-size=1920x1080')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
options.add_argument(f'user-agent={userAgent}')
options.add_argument('--proxy-server={}'.format(PROXY))
path = os.path.abspath (os.path.dirname (sys.argv[0]))
if platform == "win32": cd = '/chromedriver.exe'
elif platform == "linux": cd = '/chromedriver'
elif platform == "darwin": cd = '/chromedriver'
driver = webdriver.Chrome (service=srv, options=options)
waitWebDriver = WebDriverWait (driver, 10)
link = "https://whatismyipaddress.com/"
driver.get(link)
driver.save_screenshot("whatismyipaddress.png")
time.sleep(WAIT)
soup = BeautifulSoup (driver.page_source, 'html.parser')
tmpIP = soup.find("span", {"id": "ipv4"})
tmpP = soup.find_all("p", {"class": "information"})
for e in tmpP:
tmpSPAN = e.find_all("span")
for e2 in tmpSPAN:
print(e2.text)
print(tmpIP.text)
driver.quit()
Run Code Online (Sandbox Code Playgroud)
注意:
print(f'http://scraperapi:{SCRAPER_API}@proxy-server.scraperapi.com:8001')并确保SCRAPER_API返回结果。
您可以在以下位置找到一些相关的详细讨论:
| 归档时间: |
|
| 查看次数: |
5657 次 |
| 最近记录: |