Seg*_*lin 11 python selenium http xmlhttprequest selenium-webdriver
我使用 Selenium 对网站发出 GET 请求后的数据接收做出反应。网站调用的 API 不是公开的,因此如果我使用请求的 URL 来检索数据,我会得到{"message":"Unauthenticated."}.
到目前为止我所做的就是检索响应的标头。
我在这里发现使用driver.execute_cdp_cmd('Network.getResponseBody', {...})可能可以解决我的问题。
这是我的代码示例:
import json
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
capabilities = DesiredCapabilities.CHROME
capabilities["goog:loggingPrefs"] = {"performance": "ALL"}
driver = webdriver.Chrome(
r"./chromedriver",
desired_capabilities=capabilities,
)
def processLog(log):
log = json.loads(log["message"])["message"]
if ("Network.response" in log["method"] and "params" in log.keys()):
headers = log["params"]["response"]
body = driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': log["params"]["requestId"]})
print(json.dumps(body, indent=4, sort_keys=True))
return log["params"]
logs = driver.get_log('performance')
responses = [processLog(log) for log in logs]
Run Code Online (Sandbox Code Playgroud)
不幸的是,driver.execute_cdp_cmd('Network.getResponseBody', {...})回报:
unknown error: unhandled inspector error: {"code":-32000,"message":"No resource with given identifier found"}
Run Code Online (Sandbox Code Playgroud)
你知道我错过了什么吗?
您知道如何检索响应正文吗?
感谢您的帮助!
Seg*_*lin 11
In order to retrieve response body, you have to listen specifically to Network.responseReceived:
def processLog(log):
log = json.loads(log["message"])["message"]
if ("Network.responseReceived" in log["method"] and "params" in log.keys()):
body = driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': log["params"]["requestId"]})
Run Code Online (Sandbox Code Playgroud)
However, I ended using a different approach relying on requests. I just retrieved the authorization token from the browser console (Network > Headers > Request Headers > Authorization) and used it to get the data I wanted:
import requests
def get_data():
url = "<your_url>"
headers = {
"Authorization": "Bearer <your_access_token>",
"Content-type": "application/json"
}
params = {
key: value,
...
}
r = requests.get(url, headers = headers, params = params)
if r.status_code == 200:
return r.json()
Run Code Online (Sandbox Code Playgroud)
可能有些响应没有主体,因此 selenium 会抛出一个错误,指出找不到给定标识符的“没有资源”。这里的错误消息有点含糊。
尝试这样做:
from selenium.common import exceptions
try:
body = chromedriver.execute_cdp_cmd('Network.getResponseBody', {'requestId': log["params"]["requestId"]})
log['body'] = body
except exceptions.WebDriverException:
print('response.body is null')
Run Code Online (Sandbox Code Playgroud)
这样,没有正文的响应不会使您的脚本崩溃。
| 归档时间: |
|
| 查看次数: |
20689 次 |
| 最近记录: |