如何从 CDPElementHandle 获取可读值

mys*_*elf 3 javascript web-scraping puppeteer

我只是想从某个网站上删除一些东西,我的代码看起来像这样

const puppeteer = require("puppeteer") 
const main = async () => {
const browser = await puppeteer.launch({})
const page = await browser.newPage()
await page.goto("https://www.example.com")
await page.waitForSelector(".example") 
const titleNode = await page.$$(".example")
titleNode.forEach(  el => {
  el.getProperties("textContent").then(el => {
          console.log(el)
  })
})
 console.log( titleNode );
 browser.close()
}
main()
Run Code Online (Sandbox Code Playgroud)

结果是这样的

[
    CDPElementHandle { handle: CDPJSHandle {} },
    CDPElementHandle { handle: CDPJSHandle {} },
    CDPElementHandle { handle: CDPJSHandle {} },
    CDPElementHandle { handle: CDPJSHandle {} },
    CDPElementHandle { handle: CDPJSHandle {} },
]
Run Code Online (Sandbox Code Playgroud)

我想使用类“example”获取元素内的实际文本内容如何提取该值我使用 .getProperties 和 .jsonValue 但它不起作用任何帮助将不胜感激

小智 5

Array.prototype.forEach不是为异步代码设计的,因此不要使用.forEach,for...ofmap

代码 :

const puppeteer = require("puppeteer");

const html = `
    <div>
        <a>text1</a>
        <a class='example'>text2</a>
        <a>text3</a>
        <a class='example'>text4</a>
        <a>text5</a>
        <a>text6</a>
    </div>
`;

const main = async () => {
    const browser = await puppeteer.launch({})
    const page = await browser.newPage()
    await page.setContent(html);

    const titleNode = await page.$$(".example");

    let result = [];
    for(let t of titleNode) {
        result.push(await t.evaluate(x => x.textContent));
    }

    let result2 = await Promise.all(titleNode.map(async (t) => {
        return await t.evaluate(x => x.textContent);
    }))


    console.log({result : result, result2 : result2});
}

main();
Run Code Online (Sandbox Code Playgroud)