我正在尝试从网站中提取所有 URL 网络请求,并在它们之间建立层次关系,即,如果一个 URL 请求正在生成另一个请求。类似于请求链。
如您所知,在 Network 面板中,Requests 表中有一个名为“Initiator”的字段,它告诉您特定请求的来源或父请求(如果有的话)。手动,我可以使用浏览器,转到开发人员工具中的网络面板,加载网站并下载生成的 HAR 文件。例如:
{
"startedDateTime": "2019-11-05T17:38:46.775Z",
"time": 15.676000155508518,
"request": {
"method": "POST",
"url": "https://www.google.com/gen_204?oq=&gs_l=psy-ab.22...0.0..847450...0.0..0.0.0.......0......gws-wiz.",
"httpVersion": "http/2.0",
"headers": [
{
"name": ":path",
"value": "/gen_204?oq=&gs_l=psy-ab.22...0.0..847450...0.0..0.0.0.......0......gws-wiz."
},
{
"name": "sec-fetch-mode",
"value": "no-cors"
},
{
"name": "origin",
"value": "https://www.google.com"
},
{
"name": "accept-encoding",
"value": "gzip, deflate, br"
},
{
"name": "accept-language",
"value": "en-GB,en;q=0.9,en-US;q=0.8,es-US;q=0.7,es;q=0.6"
},
{
"name": "user-agent",
"value": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/76.0.3809.100 Chrome/76.0.3809.100 Safari/537.36"
},
{
"name": "content-type",
"value": "text/plain;charset=UTF-8" …Run Code Online (Sandbox Code Playgroud) python selenium google-chrome-devtools har selenium-webdriver