我如何从 puppeteer 中的 page.evaluate() 返回一个值?

roi*_*tmi 9 node.js puppeteer

我试图从page.evaluate()我使用 Puppeteer 构建的 YouTube 抓取工具的内部获取值。我无法从page.evaluate(). 我如何实现这一目标?这是代码:

let boxes2 = []
        const getData = async() => {
            return await page.evaluate(async () => { // scroll till there's no more room to scroll or you get at least 250 boxes  
                console.log(await new Promise(resolve => {

                    var scrolledHeight = 0  
                    var distance = 100 
                    var timer = setInterval(() => {
                        boxes = document.querySelectorAll("div.style-scope.ytd-item-section-renderer#contents > ytd-video-renderer > div.style-scope.ytd-video-renderer#dismissable")
                        console.log(`${boxes.length} boxes`)
                        var scrollHeight = document.documentElement.scrollHeight
                        window.scrollBy(0, distance)
                        scrolledHeight += distance
                        if(scrolledHeight >= scrollHeight || boxes.length >= 50){
                            clearInterval(timer)
                            resolve(Array.from(boxes))
                        }
                    }, 500)
                }))
            })
        }
        boxes2 = await getData()
        console.log(boxes2)
Run Code Online (Sandbox Code Playgroud)

console.log包装的承诺打印在浏览器的控制台所产生的阵列。我只是无法boxes2在我调用该getData()函数的地方获取该数组。我觉得我错过了一点点,但无法弄清楚它是什么。感谢这里的任何提示。

Vav*_*off 12

小问题是您实际上并没有从 page.evaluate 内部返回数据:

const getData = () => {
    return page.evaluate(async () => { 
        return await new Promise(resolve => { // <-- return the data to node.js from browser
            // scraping
        }))
    })
}
Run Code Online (Sandbox Code Playgroud)

这是 puppeteer 的一个完整的最小工作示例,它将打印 array [ 1, 2, 3 ]

const puppeteer = require('puppeteer');

puppeteer.launch().then(async browser => {
  const page = await browser.newPage();

  boxes2 = [];

  const getData = async() => {
    return await page.evaluate(async () => {
        return await new Promise(resolve => {
          setTimeout(() => {
                resolve([1,2,3]);
          }, 3000)
      })
    })
  }  

  boxes2 = await getData();
  console.log(boxes2)

  await browser.close();
});
Run Code Online (Sandbox Code Playgroud)