如何使用 playwright 通过网络应用程序捕获重定向

chh*_*ing 5 playwright playwright-python

当您访问此链接时,该页面将运行一些 javascript,然后自动重定向到pdf。我很难从剧作家那里得到最终的网址。

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://scnv.io/760y", wait_until="networkidle")
    print(page.url)
    page.close()
Run Code Online (Sandbox Code Playgroud)

有没有办法获得最终的网址?

Cha*_*wal 2

有多种方法可以做到这一点。一种方法是使用page.expect_response

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()
    
    # Catch any responses with '.pdf' at the end of the url
    with page.expect_response('**/*.pdf') as response:
        page.goto("https://scnv.io/760y")

    print(response.value.url)
    page.close()
Run Code Online (Sandbox Code Playgroud)

输出

https://qcg-media.s3.amazonaws.com/media/uploads/72778/2022/06/20220622_663043_221.pdf
Run Code Online (Sandbox Code Playgroud)

查看文档的这一部分,详细介绍了 playwright 中处理网络流量的情况。

另请注意,我没有包括在内,wait_until='networkidle'因为这不适合此用例。为了触发该事件,网络必须保持空闲至少 500 毫秒,而本网站在向 pdf 发出请求时不会发生这种情况。因此,如果您要包含该内容,那么代码在捕获我们想要的 url 请求时最多会不一致。