d33*_*tah 2 python google-chrome headless headless-browser google-chrome-devtools
我正在使用Chromium的无头Web浏览器API。基于chrome_remote_shell源代码,我提出了以下代码:
#!/usr/bin/env python
import json
import requests
import pprint
import websocket
tablist = json.loads(requests.get("http://%s:%s/json" % ("localhost", 9222)).text)
print(tablist)
wsurl = tablist[0]['webSocketDebuggerUrl']
conn = websocket.create_connection(wsurl)
navcom = json.dumps({"id":0, "method":"Network.enable"})
conn.send(navcom)
navcom = json.dumps({"id":1, "method":"Page.navigate", "params":{"url":"https://news.ycombinator.com/"}})
conn.send(navcom)
while True:
packet = json.loads(conn.recv())
if 'method' in packet:
print(packet['method'])
else:
print(packet)
Run Code Online (Sandbox Code Playgroud)
这是示例输出:
[{u'description': u'', u'title': u'Hacker News', u'url': u'https://news.ycombinator.com/', u'webSocketDebuggerUrl': u'ws://localhost:9222/devtools/page/7d03a57d-77a9-4ceb-b645-3b85461de5be', u'type': u'page', u'id': u'7d03a57d-77a9-4ceb-b645-3b85461de5be', u'devtoolsFrontendUrl': u'/devtools/inspector.html?ws=localhost:9222/devtools/page/7d03a57d-77a9-4ceb-b645-3b85461de5be'}]
{u'id': 0, u'result': {}}
Network.requestWillBeSent
{u'id': 1, u'result': {u'frameId': u'21045.1'}}
Network.responseReceived
Network.dataReceived
Network.dataReceived
Network.loadingFinished
Network.requestWillBeSent
Network.requestWillBeSent
Network.requestServedFromCache
Network.responseReceived
Network.dataReceived
Network.loadingFinished
Network.requestWillBeSent
Network.requestServedFromCache
Network.responseReceived
Network.dataReceived
Network.loadingFinished
Network.requestWillBeSent
Network.requestServedFromCache
Network.responseReceived
Network.dataReceived
Network.loadingFinished
Network.responseReceived
Network.dataReceived
Network.loadingFinished
Network.requestWillBeSent
Network.requestServedFromCache
Network.responseReceived
Network.dataReceived
Network.loadingFinished
Run Code Online (Sandbox Code Playgroud)
我注意到我收到了一长串消息,最后一条消息是Network.loadingFinished,但是我为多个requestIds收到了这条消息。如何修改脚本,使其在页面完全加载后终止,并且可以退出循环?
事实证明,我也应该通过Page.enable订阅页面事件:
#!/usr/bin/env python
import json
import requests
import pprint
import websocket
import sys
tablist = json.loads(requests.get("http://%s:%s/json" % ("localhost", 9222)).text)
print(tablist)
wsurl = tablist[0]['webSocketDebuggerUrl']
conn = websocket.create_connection(wsurl)
navcom = json.dumps({"id":0, "method":"Network.enable"})
conn.send(navcom)
navcom = json.dumps({"id":1, "method":"Page.enable"})
conn.send(navcom)
navcom = json.dumps({"id":2, "method":"Page.navigate", "params":{"url":sys.argv[1]}})
conn.send(navcom)
while True:
s = conn.recv()
packet = json.loads(s)
if packet.get('method') == 'Page.loadEventFired':
break
print(s)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
821 次 |
| 最近记录: |