我试图从page.evaluate()我使用 Puppeteer 构建的 YouTube 抓取工具的内部获取值。我无法从page.evaluate(). 我如何实现这一目标?这是代码:
let boxes2 = []
const getData = async() => {
return await page.evaluate(async () => { // scroll till there's no more room to scroll or you get at least 250 boxes
console.log(await new Promise(resolve => {
var scrolledHeight = 0
var distance = 100
var timer = setInterval(() => {
boxes = document.querySelectorAll("div.style-scope.ytd-item-section-renderer#contents > ytd-video-renderer > div.style-scope.ytd-video-renderer#dismissable")
console.log(`${boxes.length} boxes`)
var scrollHeight = document.documentElement.scrollHeight
window.scrollBy(0, distance)
scrolledHeight += distance
if(scrolledHeight >= scrollHeight || boxes.length >= 50){
clearInterval(timer)
resolve(Array.from(boxes))
}
}, 500)
}))
})
}
boxes2 = await getData()
console.log(boxes2)
Run Code Online (Sandbox Code Playgroud)
该console.log包装的承诺打印在浏览器的控制台所产生的阵列。我只是无法boxes2在我调用该getData()函数的地方获取该数组。我觉得我错过了一点点,但无法弄清楚它是什么。感谢这里的任何提示。
Vav*_*off 12
小问题是您实际上并没有从 page.evaluate 内部返回数据:
const getData = () => {
return page.evaluate(async () => {
return await new Promise(resolve => { // <-- return the data to node.js from browser
// scraping
}))
})
}
Run Code Online (Sandbox Code Playgroud)
这是 puppeteer 的一个完整的最小工作示例,它将打印 array [ 1, 2, 3 ]:
const puppeteer = require('puppeteer');
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
boxes2 = [];
const getData = async() => {
return await page.evaluate(async () => {
return await new Promise(resolve => {
setTimeout(() => {
resolve([1,2,3]);
}, 3000)
})
})
}
boxes2 = await getData();
console.log(boxes2)
await browser.close();
});
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
10128 次 |
| 最近记录: |