Din*_*naF 6 c# web-scraping puppeteer puppeteer-sharp
我正在使用 .net core 3.1 和 Puppeteer Sharp 2.0.4。我想在 JavaScript 运行完成后从网页获取完整的 HTML 页面。这是我的代码:
await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultRevision);
Browser browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = false
});
var page = await browser.NewPageAsync();
page.DefaultTimeout = 0;
var navigation = new NavigationOptions
{
Timeout = 0,
WaitUntil = new[] {
WaitUntilNavigation.DOMContentLoaded }
};
await page.GoToAsync("https://someurl", navigation);
content = await page.GetContentAsync();
Run Code Online (Sandbox Code Playgroud)
JS运行完毕后,内容变量似乎没有HTML 。关于我应该改变什么才能使其发挥作用有什么建议吗?
小智 1
只需替换navigation为WaitUntilNavigation.Networkidle2work 即可等待 Javascript 执行完毕。
using PuppeteerSharp;
await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultRevision);
Browser browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true // false if you need to see the browser
});
var page = await browser.NewPageAsync();
page.DefaultTimeout = 5000; // or you can set this as 0
await page.GoToAsync("https://www.google.com", WaitUntilNavigation.Networkidle2);
var content = await page.GetContentAsync();
Console.WriteLine(content);
Run Code Online (Sandbox Code Playgroud)