jus*_*rld 3 javascript css puppeteer
我的 HTML 代码如下所示:
<div class="class1" data-id="id1">
<span class="class2">
"text1"
</span>
</div>
<div class="class1" data-id="id2">
<span class="class2">
"text2"
</span>
</div>
<div class="class1" data-id="id1">
<span class="class2">
"text1"
</span>
</div>
<div class="class1" data-id="id3">
<span class="class2">
"text3"
</span>
</div>
Run Code Online (Sandbox Code Playgroud)
我正在尝试编写 Puppetetter 代码来获取data-id和span内部文本对,这将导致类似的结果:
id1: text1,
id2: text2,
id3: text3
Run Code Online (Sandbox Code Playgroud)
我尝试过的:
const allClass1InPage = await this.page.$$(".class1");
for (const class1El of allClass1InPage) {
await elem.$eval(".class2", (class2El) =>
console.debug(`${???}: ${???}`)
);
}
Run Code Online (Sandbox Code Playgroud)
我不知道的是:
data-id从 an 中取出ElementHandle(例如,如果是 an ,class1El通常我会这样做)?.dataset.idElement有一些方法可以做到这一点。
\n方式 1 \xe2\x80\x94 最接近你的 \xe2\x80\x94 一个混合的,有点纠结。
\n方式 2 \xe2\x80\x94 使用纯 puppeteer (JSHandle/ElementHandle) API。它更加一致,但非常冗长。
\n方式 3 \xe2\x80\x94 使用纯浏览器 (Web) API。如果您只需要一些可序列化的数据,这似乎是最简单的方法。
\n\'use strict\';\n\nconst html = `\n <!doctype html>\n <html>\n <head><meta charset=\'UTF-8\'><title>Test</title></head>\n <body>\n <div class="class1" data-id="id1">\n <span class="class2">\n "text1"\n </span>\n </div>\n <div class="class1" data-id="id2">\n <span class="class2">\n "text2"\n </span>\n </div>\n <div class="class1" data-id="id1">\n <span class="class2">\n "text1"\n </span>\n </div>\n <div class="class1" data-id="id3">\n <span class="class2">\n "text3"\n </span>\n </div>\n </body>\n </html>`;\n\nconst puppeteer = require(\'puppeteer\');\n\n(async function main() {\n try {\n const browser = await puppeteer.launch();\n const [page] = await browser.pages();\n\n await page.goto(`data:text/html,${html}`);\n\n // Way 1.\n {\n const allClass1InPage = await page.$$(".class1");\n for (const class1El of allClass1InPage) {\n console.debug(await class1El.$eval(".class2", class2El =>\n `${class2El.parentNode.dataset.id}: ${class2El.innerText}`\n ));\n }\n }\n\n console.log();\n\n // Way 2.\n {\n const allClass1InPage = await page.$$(\'.class1\');\n for (const class1El of allClass1InPage) {\n const datasetHandle = await class1El.getProperty(\'dataset\');\n const idHandle = await datasetHandle.getProperty(\'id\');\n const id = await idHandle.jsonValue();\n\n const spanHandle = await class1El.$(\'.class2\');\n const textHandle = await spanHandle.getProperty(\'innerText\');\n const text = await textHandle.jsonValue();\n\n console.log(`${id}: ${text}`);\n }\n }\n\n console.log();\n\n // Way 3.\n {\n const data = await page.evaluate(\n () => [...document.querySelectorAll(\'.class1\')].map(element =>\n `${element.dataset.id}: ${element.querySelector(\'.class2\').innerText}`)\n );\n console.log(data.join(\'\\n\'));\n }\n\n await browser.close();\n } catch (err) {\n console.error(err);\n }\n})();\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
1423 次 |
| 最近记录: |