Its*_*thn 1 proxy selenium python-3.x selenium-chromedriver
我正在创建一个脚本,用于爬网一个网站以收集一些数据,但是问题是,在请求过多后,它们阻止了我,但是使用代理,我可以发送的请求要多于当前的请求。我已经将代理与chrome选项集成在一起--proxy-server
options.add_argument('--proxy-server={}'.format('http://ip:port'))
但是我使用的是付费代理,因此需要身份验证,如下面的屏幕截图所示,它提供了用户名和密码的警告框
options.add_argument('--proxy-server={}'.format('http://username:password@ip:port'))
但这似乎也不起作用。我在寻找解决方案,并在下面找到解决方案,并与chrome扩展代理自动身份验证一起使用,而没有chrome扩展
proxy = {'address': settings.PROXY,
'username': settings.PROXY_USER,
'password': settings.PROXY_PASSWORD}
capabilities = dict(DesiredCapabilities.CHROME)
capabilities['proxy'] = {'proxyType': 'MANUAL',
'httpProxy': proxy['address'],
'ftpProxy': proxy['address'],
'sslProxy': proxy['address'],
'noProxy': '',
'class': "org.openqa.selenium.Proxy",
'autodetect': False,
'socksUsername': proxy['username'],
'socksPassword': proxy['password']}
options.add_extension(os.path.join(settings.DIR, "extension_2_0.crx")) # proxy auth extension
Run Code Online (Sandbox Code Playgroud)
但以上两种方法均不能正常工作,因为似乎在上述代码之后,代理身份验证警报消失了,并且当我通过谷歌搜索我的IP并确认其无效时检查了我的IP时。
请任何可以帮助我在chromedriver上认证代理服务器的人。
Selenium Chrome代理验证
使用Python使用Selenium设置chromedriver代理
如果您需要将代理与python和Selenium库与chromedriver一起使用,则通常使用以下代码(不带任何用户名和密码:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % hostname + ":" + port)
driver = webdriver.Chrome(chrome_options=chrome_options)
Run Code Online (Sandbox Code Playgroud)
除非代理要求身份验证,否则它工作正常。如果代理要求您使用用户名和密码登录,它将不起作用。在这种情况下,您必须使用下面介绍的更棘手的解决方案。顺便说一句,如果您从代理提供者或服务器将服务器IP地址列入白名单,则不应询问代理凭据。
使用Selenium中的Chromedriver进行HTTP代理身份验证
要设置代理身份验证,我们将生成一个特殊文件,并使用以下代码将其动态上传到chromedriver。此代码使用chromedriver将硒配置为使用HTTP代理,该代理需要使用用户/密码对进行身份验证。
import os
import zipfile
from selenium import webdriver
PROXY_HOST = '192.168.3.2' # rotating proxy or host
PROXY_PORT = 8080 # port
PROXY_USER = 'proxy-user' # username
PROXY_PASS = 'proxy-password' # password
manifest_json = """
{
"version": "1.0.0",
"manifest_version": 2,
"name": "Chrome Proxy",
"permissions": [
"proxy",
"tabs",
"unlimitedStorage",
"storage",
"<all_urls>",
"webRequest",
"webRequestBlocking"
],
"background": {
"scripts": ["background.js"]
},
"minimum_chrome_version":"22.0.0"
}
"""
background_js = """
var config = {
mode: "fixed_servers",
rules: {
singleProxy: {
scheme: "http",
host: "%s",
port: parseInt(%s)
},
bypassList: ["localhost"]
}
};
chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
function callbackFn(details) {
return {
authCredentials: {
username: "%s",
password: "%s"
}
};
}
chrome.webRequest.onAuthRequired.addListener(
callbackFn,
{urls: ["<all_urls>"]},
['blocking']
);
""" % (PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASS)
def get_chromedriver(use_proxy=False, user_agent=None):
path = os.path.dirname(os.path.abspath(__file__))
chrome_options = webdriver.ChromeOptions()
if use_proxy:
pluginfile = 'proxy_auth_plugin.zip'
with zipfile.ZipFile(pluginfile, 'w') as zp:
zp.writestr("manifest.json", manifest_json)
zp.writestr("background.js", background_js)
chrome_options.add_extension(pluginfile)
if user_agent:
chrome_options.add_argument('--user-agent=%s' % user_agent)
driver = webdriver.Chrome(
os.path.join(path, 'chromedriver'),
chrome_options=chrome_options)
return driver
def main():
driver = get_chromedriver(use_proxy=True)
#driver.get('https://www.google.com/search?q=my+ip+address')
driver.get('https://httpbin.org/ip')
if __name__ == '__main__':
main()
Run Code Online (Sandbox Code Playgroud)
函数get_chromedriver返回可以在应用程序中使用的已配置的硒Web驱动程序。此代码已经过测试,可以正常工作。
进一步了解Chrome中的onAuthRequired事件。
这是一个快速、创造性的解决方案,不需要修改 selenium 的选项或将文件上传到 chromedriver。它使用 pyautogui(可以使用任何模拟按键的 python 包)来输入代理身份验证详细信息。它还使用线程来处理 chrome 身份验证弹出窗口,否则会暂停脚本。
import time
from threading import Thread
import pyautogui
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
hostname = "HOST_NAME"
port = "PORT"
proxy_username = "USERNAME"
proxy_password = "PASSWORD"
chrome_options = Options()
chrome_options.add_argument('--proxy-server={}'.format(hostname + ":" + port))
driver = webdriver.Chrome(options=chrome_options)
def enter_proxy_auth(proxy_username, proxy_password):
time.sleep(1)
pyautogui.typewrite(proxy_username)
pyautogui.press('tab')
pyautogui.typewrite(proxy_password)
pyautogui.press('enter')
def open_a_page(driver, url):
driver.get(url)
Thread(target=open_a_page, args=(driver, "http://www.example.com/")).start()
Thread(target=enter_proxy_auth, args=(proxy_username, proxy_password)).start()
Run Code Online (Sandbox Code Playgroud)
注意:对于任何严肃的项目或测试套件,我建议选择更强大的解决方案。但是,如果您只是在试验并需要快速有效的解决方案,那么这是一个选择。
我一直在寻找相同的答案,但仅限于 Java 代码,所以这是我的 @itsmnthn Python 代码的变体。
不要忘记将MainTest 类的 String 字段更改为您的 ip、端口、登录名、密码和 chromeDriver 路径。
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import java.io.*;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;
public class MainTest {
private static final String PROXY_HOST = "127.0.0.1";
private static final String PROXY_PORT = "8080";
private static final String PROXY_USER = "login";
private static final String PROXY_PASS = "password";
private static final String CHROMEDRIVER_PATH = "chromeDriverPath";
private static final String PROXY_OPTION_TEMPLATE = "--proxy-server=http://%s";
public static void main(String[] args) throws IOException {
System.setProperty("webdriver.chrome.driver", CHROMEDRIVER_PATH);
ChromeOptions options = new ChromeOptions();
String manifest_json = "{\n" +
" \"version\": \"1.0.0\",\n" +
" \"manifest_version\": 2,\n" +
" \"name\": \"Chrome Proxy\",\n" +
" \"permissions\": [\n" +
" \"proxy\",\n" +
" \"tabs\",\n" +
" \"unlimitedStorage\",\n" +
" \"storage\",\n" +
" \"<all_urls>\",\n" +
" \"webRequest\",\n" +
" \"webRequestBlocking\"\n" +
" ],\n" +
" \"background\": {\n" +
" \"scripts\": [\"background.js\"]\n" +
" },\n" +
" \"minimum_chrome_version\":\"22.0.0\"\n" +
"}";
String background_js = String.format("var config = {\n" +
" mode: \"fixed_servers\",\n" +
" rules: {\n" +
" singleProxy: {\n" +
" scheme: \"http\",\n" +
" host: \"%s\",\n" +
" port: parseInt(%s)\n" +
" },\n" +
" bypassList: [\"localhost\"]\n" +
" }\n" +
"};\n" +
"\n" +
"chrome.proxy.settings.set({value: config, scope: \"regular\"}, function() {});\n" +
"\n" +
"function callbackFn(details) {\n" +
"return {\n" +
"authCredentials: {\n" +
"username: \"%s\",\n" +
"password: \"%s\"\n" +
"}\n" +
"};\n" +
"}\n" +
"\n" +
"chrome.webRequest.onAuthRequired.addListener(\n" +
"callbackFn,\n" +
"{urls: [\"<all_urls>\"]},\n" +
"['blocking']\n" +
");", PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASS);
FileOutputStream fos = new FileOutputStream("proxy_auth_plugin.zip");
ZipOutputStream zipOS = new ZipOutputStream(fos);
createFile("manifest.json", manifest_json);
createFile("background.js", background_js);
File file = new File("proxy_auth_plugin.zip");
writeToZipFile("manifest.json", zipOS);
writeToZipFile("background.js", zipOS);
zipOS.close();
fos.close();
options.addExtensions(file);
WebDriver driver = new ChromeDriver(options);
try {
driver.get("https://2ip.ru");
} finally {
driver.close();
}
}
public static void writeToZipFile(String path, ZipOutputStream zipStream) throws FileNotFoundException, IOException {
System.out.println("Writing file : '" + path + "' to zip file");
File aFile = new File(path);
FileInputStream fis = new FileInputStream(aFile);
ZipEntry zipEntry = new ZipEntry(path);
zipStream.putNextEntry(zipEntry);
byte[] bytes = new byte[1024];
int length;
while ((length = fis.read(bytes)) >= 0) {
zipStream.write(bytes, 0, length);
}
zipStream.closeEntry();
fis.close();
}
public static void createFile(String filename, String text) throws FileNotFoundException {
try (PrintWriter out = new PrintWriter(filename)) {
out.println(text);
}
}
}
Run Code Online (Sandbox Code Playgroud)
小智 5
使用硒线。
文档中的示例代码:
options = {
'proxy': {
'http': 'socks5://user:pass@192.168.10.100:8888',
'https': 'socks5://user:pass@192.168.10.100:8888',
'no_proxy': 'localhost,127.0.0.1'
}
}
driver = webdriver.Chrome(seleniumwire_options=options)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4536 次 |
| 最近记录: |