如何判断页面已完成加载?

d33*_*tah 2 python google-chrome headless headless-browser google-chrome-devtools

我正在使用Chromium的无头Web浏览器API。基于chrome_remote_shell源代码,我提出了以下代码:

#!/usr/bin/env python

import json
import requests
import pprint
import websocket

tablist = json.loads(requests.get("http://%s:%s/json" % ("localhost", 9222)).text)
print(tablist)
wsurl = tablist[0]['webSocketDebuggerUrl']
conn = websocket.create_connection(wsurl)
navcom = json.dumps({"id":0, "method":"Network.enable"})
conn.send(navcom)
navcom = json.dumps({"id":1, "method":"Page.navigate", "params":{"url":"https://news.ycombinator.com/"}})
conn.send(navcom)

while True:
    packet = json.loads(conn.recv())
    if 'method' in packet:
        print(packet['method'])
    else:
        print(packet)
Run Code Online (Sandbox Code Playgroud)

这是示例输出:

[{u'description': u'', u'title': u'Hacker News', u'url': u'https://news.ycombinator.com/', u'webSocketDebuggerUrl': u'ws://localhost:9222/devtools/page/7d03a57d-77a9-4ceb-b645-3b85461de5be', u'type': u'page', u'id': u'7d03a57d-77a9-4ceb-b645-3b85461de5be', u'devtoolsFrontendUrl': u'/devtools/inspector.html?ws=localhost:9222/devtools/page/7d03a57d-77a9-4ceb-b645-3b85461de5be'}]
{u'id': 0, u'result': {}}
Network.requestWillBeSent
{u'id': 1, u'result': {u'frameId': u'21045.1'}}
Network.responseReceived
Network.dataReceived
Network.dataReceived
Network.loadingFinished
Network.requestWillBeSent
Network.requestWillBeSent
Network.requestServedFromCache
Network.responseReceived
Network.dataReceived
Network.loadingFinished
Network.requestWillBeSent
Network.requestServedFromCache
Network.responseReceived
Network.dataReceived
Network.loadingFinished
Network.requestWillBeSent
Network.requestServedFromCache
Network.responseReceived
Network.dataReceived
Network.loadingFinished
Network.responseReceived
Network.dataReceived
Network.loadingFinished
Network.requestWillBeSent
Network.requestServedFromCache
Network.responseReceived
Network.dataReceived
Network.loadingFinished
Run Code Online (Sandbox Code Playgroud)

我注意到我收到了一长串消息,最后一条消息是Network.loadingFinished,但是我为多个requestIds收到了这条消息。如何修改脚本,使其在页面完全加载后终止,并且可以退出循环?

d33*_*tah 5

事实证明,我也应该通过Page.enable订阅页面事件:

#!/usr/bin/env python

import json
import requests
import pprint
import websocket
import sys

tablist = json.loads(requests.get("http://%s:%s/json" % ("localhost", 9222)).text)
print(tablist)
wsurl = tablist[0]['webSocketDebuggerUrl']
conn = websocket.create_connection(wsurl)
navcom = json.dumps({"id":0, "method":"Network.enable"})
conn.send(navcom)
navcom = json.dumps({"id":1, "method":"Page.enable"})
conn.send(navcom)
navcom = json.dumps({"id":2, "method":"Page.navigate", "params":{"url":sys.argv[1]}})
conn.send(navcom)

while True:
    s = conn.recv()
    packet = json.loads(s)
    if packet.get('method') == 'Page.loadEventFired':
        break
    print(s)
Run Code Online (Sandbox Code Playgroud)