Puppeteer Sharp - js运行完成后获取html

Din*_*naF 6 c# web-scraping puppeteer puppeteer-sharp

我正在使用 .net core 3.1 和 Puppeteer Sharp 2.0.4。我想在 JavaScript 运行完成后从网页获取完整的 HTML 页面。这是我的代码:

await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultRevision);
Browser browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
    Headless = false
});
var page = await browser.NewPageAsync();
page.DefaultTimeout = 0;
var navigation = new NavigationOptions
{
    Timeout = 0,
    WaitUntil = new[] {
        WaitUntilNavigation.DOMContentLoaded }
};
await page.GoToAsync("https://someurl", navigation);
content = await page.GetContentAsync();
Run Code Online (Sandbox Code Playgroud)

JS运行完毕后,内容变量似乎没有HTML 。关于我应该改变什么才能使其发挥作用有什么建议吗?

小智 1

只需替换navigationWaitUntilNavigation.Networkidle2work 即可等待 Javascript 执行完毕。

using PuppeteerSharp;

await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultRevision);
Browser browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
    Headless = true // false if you need to see the browser
});
var page = await browser.NewPageAsync();
page.DefaultTimeout = 5000; // or you can set this as 0
await page.GoToAsync("https://www.google.com", WaitUntilNavigation.Networkidle2);
var content = await page.GetContentAsync();

Console.WriteLine(content);
Run Code Online (Sandbox Code Playgroud)