mys*_*elf 3 javascript web-scraping puppeteer
我只是想从某个网站上删除一些东西,我的代码看起来像这样
const puppeteer = require("puppeteer")
const main = async () => {
const browser = await puppeteer.launch({})
const page = await browser.newPage()
await page.goto("https://www.example.com")
await page.waitForSelector(".example")
const titleNode = await page.$$(".example")
titleNode.forEach( el => {
el.getProperties("textContent").then(el => {
console.log(el)
})
})
console.log( titleNode );
browser.close()
}
main()
Run Code Online (Sandbox Code Playgroud)
结果是这样的
[
CDPElementHandle { handle: CDPJSHandle {} },
CDPElementHandle { handle: CDPJSHandle {} },
CDPElementHandle { handle: CDPJSHandle {} },
CDPElementHandle { handle: CDPJSHandle {} },
CDPElementHandle { handle: CDPJSHandle {} },
]
Run Code Online (Sandbox Code Playgroud)
我想使用类“example”获取元素内的实际文本内容如何提取该值我使用 .getProperties 和 .jsonValue 但它不起作用任何帮助将不胜感激
小智 5
Array.prototype.forEach
不是为异步代码设计的,因此不要使用.forEach
,for...of
或map
。
代码 :
const puppeteer = require("puppeteer");
const html = `
<div>
<a>text1</a>
<a class='example'>text2</a>
<a>text3</a>
<a class='example'>text4</a>
<a>text5</a>
<a>text6</a>
</div>
`;
const main = async () => {
const browser = await puppeteer.launch({})
const page = await browser.newPage()
await page.setContent(html);
const titleNode = await page.$$(".example");
let result = [];
for(let t of titleNode) {
result.push(await t.evaluate(x => x.textContent));
}
let result2 = await Promise.all(titleNode.map(async (t) => {
return await t.evaluate(x => x.textContent);
}))
console.log({result : result, result2 : result2});
}
main();
Run Code Online (Sandbox Code Playgroud)