硒-python。如何捕获网络流量的响应

Ric*_*jah 3 python browser django selenium traffic

我正在使用python Django创建一个Web应用程序。我正在使用硒来启动无头浏览器(phantomjs)并单击几次,直到到达特定页面。我希望捕获网络流量并获得特定网络呼叫的响应。该网络调用实际上是一个HTML文档,作为其响应。

有什么办法可以做到这一点?

hel*_*err 5

您可以访问浏览器或chromedriver日志,它们在网络响应方面略有不同。称为浏览器日志,称为performance驱动程序日志driver。它们返回一个类似json的对象,您可以解析该对象以使用其中的Network方法提取事件:

{'level': 'INFO',
  'message': '{"message":{"method":"Page.frameStoppedLoading","params":{"frameId":"FB10764A3ABF7FFC83110C39C5F7BF77"}},"webview":"C2D13BD13CF743B6D0695B35E9CC935C"}',
  'timestamp': 1538607113832},
 {'level': 'INFO',
  'message': '{"message":{"method":"Page.frameDetached","params":{"frameId":"FB10764A3ABF7FFC83110C39C5F7BF77"}},"webview":"C2D13BD13CF743B6D0695B35E9CC935C"}',
  'timestamp': 1538607113838},
 {'level': 'INFO',
  'message': '{"message":{"method":"Network.requestWillBeSent","params":{"documentURL":"/sf/ask/3684358821/","frameId":"C2D13BD13CF743B6D0695B35E9CC935C","hasUserGesture":false,"initiator":{"type":"other"},"loaderId":"5331BFDC4F466FCED920CFC9F033D2EC","request":{"headers":{"Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"},"initialPriority":"VeryHigh","method":"GET","mixedContentType":"none","referrerPolicy":"no-referrer-when-downgrade","url":"/sf/ask/3684358821/"},"requestId":"5331BFDC4F466FCED920CFC9F033D2EC","timestamp":104499.729,"type":"Document","wallTime":1538607113.838206}},"webview":"C2D13BD13CF743B6D0695B35E9CC935C"}',
  'timestamp': 1538607113839},...}
Run Code Online (Sandbox Code Playgroud)

您需要启用登录DesiredCapabilities,然后使用JSON模块进行解析:

import json
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

caps = DesiredCapabilities.CHROME
caps['loggingPrefs'] = {'performance': 'ALL'}
driver = webdriver.Chrome(desired_capabilities=caps)
driver.get('/sf/ask/3684358821/')

def process_browser_log_entry(entry):
    response = json.loads(entry['message'])['message']
    return response

browser_log = driver.get_log('performance') 
events = [process_browser_log_entry(entry) for entry in browser_log]
events = [event for event in events if 'Network.response' in event['method']]
Run Code Online (Sandbox Code Playgroud)

我不知道是否可以使用此方法访问响应数据本身,但是可以获取响应的URL。

  • 注意:如果您在最近的(~75+)chrome 上无法获取性能日志,请参阅此处:/sf/answers/3957562311/。基本上只需将 `loggingPrefs` 更改为 `goog:loggingPrefs` (10认同)
  • 要获取响应数据,您可以运行以下命令: `driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': msg["message"]["params"]["requestId"]})` (3认同)
  • 以下是提取 JSON 请求的工作示例:https://gist.github.com/lorey/079c5e178c9c9d3c30ad87df7f70491d (3认同)