如何在 Puppeteer 中重新加载页面?

glh*_*e13 23 javascript chromium node.js puppeteer

我想在页面加载不正确或遇到问题时重新加载页面。我试过了,page.reload()但没有用。

for(const sect of sections ){

            // Now collect all the URLs
            const appUrls = await page.$$eval('div.main > ul.app-list > li > div.app-info a.app-info-icon', links => links.map(link => link.href));

            // Visit each URL one by one and collect the data
            for (let appUrl of appUrls) {
                var count = i++;
                try{
                    await page.goto(appUrl);
                    const appName = await page.$eval('div.det-name-int', div => div.innerText.trim());
                    console.log('\n' + count);
                    console.log(appName);
                } catch(e){
                    console.log('\n' + count);
                    console.log('ERROR', e);
                    await page.reload();
                }

            }

        }
Run Code Online (Sandbox Code Playgroud)

它给了我这个错误:

    ERROR Error: Error: failed to find element matching selector "div.det-name-int"
    at ElementHandle.$eval (C:\Users\Administrator\node_modules\puppeteer\lib\JS
Handle.js:418:13)
    at process._tickCallback (internal/process/next_tick.js:68:7)
  -- ASYNC --
    at ElementHandle.<anonymous> (C:\Users\Administrator\node_modules\puppeteer\
lib\helper.js:108:27)
    at DOMWorld.$eval (C:\Users\Administrator\node_modules\puppeteer\lib\DOMWorl
d.js:149:21)
    at process._tickCallback (internal/process/next_tick.js:68:7)
  -- ASYNC --
    at Frame.<anonymous> (C:\Users\Administrator\node_modules\puppeteer\lib\help
er.js:108:27)
    at Page.$eval (C:\Users\Administrator\node_modules\puppeteer\lib\Page.js:329
:29)
    at Page.<anonymous> (C:\Users\Administrator\node_modules\puppeteer\lib\helpe
r.js:109:23)
    at main (C:\Users\Administrator\Desktop\webscrape\text.js:35:43)
    at process._tickCallback (internal/process/next_tick.js:68:7)
Run Code Online (Sandbox Code Playgroud)

部分链接无法成功加载。当我手动刷新这些页面时,它可以工作。所以我希望有一个函数或者方法可以帮助我在出现错误时自动重新加载我的页面。

小智 46

这对我有用:

await page.reload({ waitUntil: ["networkidle0", "domcontentloaded"] });
Run Code Online (Sandbox Code Playgroud)

有关详细信息,请参阅 Puppeteer 文档:https : //github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagereloadoptions


Gry*_*ets 6

您始终可以通过 DOM 重新加载页面,如下所示:

await page.evaluate(() => {
   location.reload(true)
})
Run Code Online (Sandbox Code Playgroud)

或者这里有很多方法可以通过 DOM 使用浏览器 JS 重新加载页面

此外,您可以前后导航您的木偶操作员。像这样:

await page.goBack();
await page.goForward();
Run Code Online (Sandbox Code Playgroud)


glh*_*e13 2

我设法使用 while 循环来解决它。

for (let appUrl of appUrls) {
    var count = i++;

    while(true){
        try{

            await page.goto(appUrl);

            const appName = await page.$eval('div.det-name-int', div => div.innerText.trim());

            console.log('\n' + count);
            console.log('Name: ' , appName);

            break;

            } catch(e){
              console.log('\n' + count);
              console.log('ERROR');
              await page.reload(appUrl);

              continue;
            }

}
Run Code Online (Sandbox Code Playgroud)

  • 尽量避免 while(true) 语句,因为如果你遇到一个根本没有链接的页面,那么你会将主线程锁定在无限循环中。 (7认同)