我想使用Selenium和python捕获我正在浏览的网站的流量,因为使用代理的流量将是https不会让我走得太远.
我的想法是运行带有selenium的phantomJS并使用phantomJS来执行脚本(不是在页面上使用webdriver.execute_script(),而是在phantomJS本身上).我在考虑netlog.js脚本(来自https://github.com/ariya/phantomjs/blob/master/examples/netlog.js).
因为它在命令行中这样工作
phantomjs --cookies-file=/tmp/foo netlog.js https://google.com
Run Code Online (Sandbox Code Playgroud)
用硒必须有类似的方法吗?
提前致谢
更新:
用browsermob-proxy解决了它.
pip3 install browsermob-proxy
Run Code Online (Sandbox Code Playgroud)
Python3代码
from selenium import webdriver
from browsermobproxy import Server
server = Server(<path to browsermob-proxy>)
server.start()
proxy = server.create_proxy({'captureHeaders': True, 'captureContent': True, 'captureBinaryContent': True})
service_args = ["--proxy=%s" % proxy.proxy, '--ignore-ssl-errors=yes']
driver = webdriver.PhantomJS(service_args=service_args)
proxy.new_har()
driver.get('https://google.com')
print(proxy.har) # this is the archive
# for example:
all_requests = [entry['request']['url'] for entry in proxy.har['log']['entries']]
Run Code Online (Sandbox Code Playgroud) 嗨,我有一个硒脚本运行,应该给我性能日志。我有一个方法“ printLog”,应该(显然)打印性能日志。我的代码将能够深入解释我到底想做什么。
static void printLog(String type, RemoteWebDriver driver, String inputURL) {
ChromeOptions cap = new ChromeOptions();
LoggingPreferences logP = new LoggingPreferences();
logP.enable(LogType.PERFORMANCE, Level.ALL);
cap.setCapability(CapabilityType.LOGGING_PREFS, logP);
List<LogEntry> entries = driver.manage().logs().get(type).getAll();
System.out.println("\"Input URL\"," + "\"" + inputURL + "\"");
for (LogEntry entry : entries) {
// Checks whether this is a webtrends tag and whether it was accepted by the
// server
if (entry.getMessage().contains("statse") && entry.getMessage().contains("Network.responseReceived")) {
String statseString = entry.getMessage();
// regex for finding all wt tags: WT\..+?(?=&)
// List<String> allMatches …Run Code Online (Sandbox Code Playgroud)