按类名收集元素,然后单击每个元素 - Puppeteer

Ric*_*wis 13 javascript node.js puppeteer

使用Puppeteer,我想获取具有特定类名的页面上的所有元素,然后循环并单击每个元素

使用jQuery我可以实现这一点

var elements = $("a.showGoals").toArray();

for (i = 0; i < elements.length; i++) {
  $(elements[i]).click();
}
Run Code Online (Sandbox Code Playgroud)

如何使用Puppeteer实现这一目标?

更新

在下面尝试了Chridam的答案,但我无法开始工作(尽管回答有用,所以感谢到那里)所以我尝试了下面这个工作

 await page.evaluate(() => {
   let elements = $('a.showGoals').toArray();
   for (i = 0; i < elements.length; i++) {
     $(elements[i]).click();
   }
});
Run Code Online (Sandbox Code Playgroud)

the*_*ton 19

Iterating puppeteer async methods in for loop vs. Array.map()/Array.forEach()

As all puppeteer methods are asynchronous it doesn't matter how we iterate over them. I've made a comparison and a rating of the most commonly recommended and used options.

For this purpose, I have created a React.Js example page with a lot of React buttons here (I just call it Lot Of React Buttons). Here (1) we are able set how many buttons to be rendered on the page; (2) we can activate the black buttons to turn green by clicking on them. I consider it an identical use case as the OP's, and it is also a general case of browser automation (we expect something to happen if we do something on the page). Let's say our use case is:

Scenario outline: click all the buttons with the same selector
  Given I have <no.> black buttons on the page
  When I click on all of them
  Then I should have <no.> green buttons on the page
Run Code Online (Sandbox Code Playgroud)

There is a conservative and a rather extreme scenario. To click no. = 132 buttons is not a huge CPU task, no. = 1320 can take a bit of time.


I. Array.map

In general, if we only want to perform async methods like elementHandle.click in iteration, but we don't want to return a new array: it is a bad practice to use Array.map. Map method execution is going to finish before all the iteratees are executed completely because Array iteration methods execute the iteratees synchronously, but the puppeteer methods, the iteratees are: asynchronous.

Code example

Scenario outline: click all the buttons with the same selector
  Given I have <no.> black buttons on the page
  When I click on all of them
  Then I should have <no.> green buttons on the page
Run Code Online (Sandbox Code Playgroud)

Specialties

  • returns another array
  • parallel execution inside the .map method
  • fast

132 buttons scenario result: ?

Duration: 891 ms

By watching the browser in headful mode it looks like it works, but if we check when the page.screenshot happened: we can see the clicks were still in progress. It is due to the fact the Array.map cannot be awaited by default. It is only luck that the script had enough time to resolve all clicks on all elements until the browser was not closed.

1320 buttons scenario result: ?

Duration: 6868 ms

If we increase the number of elements of the same selector we will run into the following error: UnhandledPromiseRejectionWarning: Error: Node is either not visible or not an HTMLElement, because we already reached await page.screenshot() and await browser.close(): the async clicks are still in progress while the browser is already closed.


II. Array.forEach

All the iteratees will be executed, but forEach is going to return before all of them finish execution, which is not the desirable behavior in many cases with async functions. In terms of puppeteer it is a very similar case to Array.map, except: for Array.forEach does not return a new array.

Code example

const elHandleArray = await page.$$('button')

elHandleArray.map(async el => {
  await el.click()
})

await page.screenshot({ path: 'clicks_map.png' })
await browser.close()
Run Code Online (Sandbox Code Playgroud)

Specialties

  • parallel execution inside the .forEach method
  • fast

132 buttons scenario result: ?

Duration: 1058 ms

By watching the browser in headful mode it looks like it works, but if we check when the page.screenshot happened: we can see the clicks were still in progress.

1320 buttons scenario result: ?

Duration: 5111 ms

If we increase the number of elements with the same selector we will run into the following error: UnhandledPromiseRejectionWarning: Error: Node is either not visible or not an HTMLElement, because we already reached await page.screenshot() and await browser.close(): the async clicks are still in progress while the browser is already closed.


III. page.$$eval + forEach

The best performing solution is a slightly modified version of bside's answer. The page.$$eval (page.$$eval(selector, pageFunction[, ...args])) runs Array.from(document.querySelectorAll(selector)) within the page and passes it as the first argument to pageFunction. It functions as a wrapper over forEach hence it can be awaited perfectly.

Code example

const elHandleArray = await page.$$('button')

elHandleArray.forEach(async el => {
  await element.click()
})

await page.screenshot({ path: 'clicks_foreach.png' })
await browser.close()
Run Code Online (Sandbox Code Playgroud)

Specialties

  • no side-effects of using async puppeteer method inside a .forEach method
  • parallel execution inside the .forEach method
  • extremely fast

132 buttons scenario result: ?

Duration: 711 ms

By watching the browser in headful mode we see the effect is immediate, also the screenshot is taken only after every element has been clicked, every promise has been resolved.

1320 buttons scenario result: ?

Duration: 3445 ms

Works just like in case of 132 buttons, extremely fast.


IV. for...of loop

The simplest option, not that fast and executed in sequence. The script won't go to page.screenshot until the loop is not finished.

Code example

await page.$$eval('button', elHandles => elHandles.forEach(el => el.click()))

await page.screenshot({ path: 'clicks_eval_foreach.png' })
await browser.close()
Run Code Online (Sandbox Code Playgroud)

Specialties

  • async behavior works as expected by the first sight
  • execution in sequence inside the loop
  • slow

132 buttons scenario result: ?

Duration: 2957 ms

By watching the browser in headful mode we can see the page clicks are happening in strict order, also the screenshot is taken only after every element has been clicked.

1320 buttons scenario result: ?

Duration: 25 396 ms

Works just like in case of 132 buttons (but it takes more time).


Summary

  • Avoid using Array.map if you only want to perform async events and you aren't using the returned array, use forEach or for-of instead. ?
  • Array.forEach is an option, but you need to wrap it so the next async method only starts after all promises are resolved inside the forEach. ?
  • Combine Array.forEach with $$eval for best performance if the order of async events doesn't matter inside the iteration. ?
  • Use a for/for...of loop if speed is not vital and if the order of the async events does matter inside the iteration. ?

Sources / Recommended materials


bsi*_*des 15

要获取所有元素,您应该使用page.$$方法,该方法[...document.querySelectorAll]与 reqular 浏览器 API 中的(在数组内传播)相同。

然后你可以循环遍历它(map,for,任何你喜欢的)并评估每个链接:

const getThemAll = await page.$$('a.showGoals')
getThemAll.forEach(async link => {
  await page.evaluate(() => link.click())
})
Run Code Online (Sandbox Code Playgroud)

由于您还想对您得到的东西执行操作,我建议使用page.$$evalwhich 将执行与上述相同的操作,然后在一行中对数组中的每个元素运行评估函数。例如:

await page.$$eval('a.showGoals', links => links.forEach(link => link.click()))
Run Code Online (Sandbox Code Playgroud)

为了更好地解释上面的行,$$eval返回一个链接数组,然后它执行一个带有linksas 参数的回调函数,然后它通过forEach方法遍历每个链接,最后click在每个链接中执行该函数。

也请查看官方文档,那里有很好的示例。


chr*_*dam 12

使用page.evaluate执行JS:

const puppeteer = require('puppeteer');

puppeteer.launch().then(async browser => {
    const page = await browser.newPage();
    await page.evaluate(() => {
        let elements = document.getElementsByClassName('showGoals');
        for (let element of elements)
            element.click();
    });
    // browser.close();
});
Run Code Online (Sandbox Code Playgroud)


Gra*_*ler 5

page.$$() / elementHandle.click()

您可以根据给定的选择器page.$$()创建一个ElementHandle数组,然后可以elementHandle.click()单击每个元素:

const elements = await page.$$('a.showGoals');

elements.forEach(async element => {
  await element.click();
});
Run Code Online (Sandbox Code Playgroud)

注意:记住要await单击函数async。否则,您将收到以下错误:

语法错误:await 仅在异步函数中有效