标签: puppeteer

如何在 puppeteer 中获取 div 中的文本

const puppeteer = require("puppeteer");

(async function main() {
    try {
        const browser = await puppeteer.launch({headless: false});
        const page = await browser.newPage();
        page.setUserAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36");

        await page.goto("https://www.qimai.cn/rank/index/brand/all/genre/6014/device/iphone/country/us/date/2019-03-19", {waitUntil: 'load', timeout: 0});
        await page.waitForSelector(".container");
        const sections = await page.$$(".container");

        const freeButton = await page.$('[href="/rank/index/brand/free/device/iphone/country/us/genre/6014/date/2019-03-19"]');
        await freeButton.click();


        // free list

        const appTable = await page.waitForSelector(".data-table");
        const lis = await page.$$(".data-table > tbody > tr > td");

        // go to app content
        const appInfo = …

Run Code Online (Sandbox Code Playgroud)

javascript puppeteer

Koh*_*Jin

2019 03-20

16
推荐指数

5
解决办法

4万
查看次数

firebase 函数 Puppeteer 找不到 Chromium GCP

我已经在谷歌云上使用GCP很长时间了，我想运行一个使用Puppeteer的云函数，但不幸的是，我收到以下错误。

未处理的错误错误：找不到 Chromium（修订版 1069273）。如果出现以下任一情况，就会发生这种情况

您在运行脚本之前没有安装（例如，npm install）或
您的缓存路径配置不正确（即：/root/.cache/puppeteer）。对于 (2)，请查看我们的 Puppeteer 配置指南，网址为https://pptr.dev/guides/configuration。在 ChromeLauncher.resolveExecutablePath (/workspace/node_modules/puppeteer-core/lib/cjs/puppeteer/node/ProductLauncher.js:120:27) 在 ChromeLauncher.executablePath (/workspace/node_modules/puppeteer-core/lib/cjs/puppeteer/节点/ChromeLauncher.js：166：25）在ChromeLauncher.launch（/workspace/node_modules/puppeteer-core/lib/cjs/puppeteer/node/ChromeLauncher.js：70：37）在异步/workspace/lib/index.js ：122：21在异步/workspace/node_modules/firebase-functions/lib/common/providers/https.js:407:26

我的代码是


export const test = functions
  .runWith({
    timeoutSeconds: 120,
    memory: "512MB" || "2GB",
  })
  .https.onCall(async (data, context) => {
    const browser = await puppeteer.launch({ args: ["--no-sandbox"] });
    const page = await browser.newPage();
    await page.goto("https://www.google.com/");

    browser.close();
    return { msg: "all good", status: 200 };
  });

Run Code Online (Sandbox Code Playgroud)

我从这里复制了如何在 GCP 功能中使用 Puppeteer 的示例（在我的机器上工作），我还尝试了不使用 Puppeteer 的其他功能，效果很好（所以我确信问题出在 Puppeteer 上）。我还尝试添加标志“--disable-setuid-sandbox”，但这不起作用。我正在用 Typescript 编写 firebase 函数。我的 package.json …

node.js firebase google-cloud-functions puppeteer

sha*_*ked

lucky-day

16
推荐指数

2
解决办法

5085
查看次数

Puppeteer - 如何填写iframe内的表单？

我必须填写一个iframe内的表单,这里是示例页面.我无法通过简单地使用page.focus()和访问page.type().我试图通过使用获取表单iframe const formFrame = page.mainFrame().childFrames()[0],但是我无法与表单iframe进行交互.

puppeteer

Raz*_*aza

2017 10-16

15
推荐指数

3
解决办法

9093
查看次数

Puppeteer错误:协议错误(Page.captureScreenshot):目标已关闭

我在节点上运行puppeteer@1.12.2时遇到此错误:8-slim容器.

完整的错误:

Error: Protocol error (Page.captureScreenshot): Target closed.
    at Promise (/app/node_modules/puppeteer/lib/Connection.js:183:56)
    at new Promise (<anonymous>)
    at CDPSession.send (/app/node_modules/puppeteer/lib/Connection.js:182:12)
    at Page._screenshotTask (/app/node_modules/puppeteer/lib/Page.js:903:39)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:189:7)
  -- ASYNC --
    at Page.<anonymous> (/app/node_modules/puppeteer/lib/helper.js:108:27)
    at /app/test.js:9:15
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:189:7)

Run Code Online (Sandbox Code Playgroud)

js文件(受GoogleChrome/puppeteer/examples/screenshot.js启发):

const puppeteer = require('puppeteer');
(async() => {
         const browser = await puppeteer.launch({
                headless: true,
                args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-gpu', '--disable-dev-shm-usage']
         });
          const page = await browser.newPage();
          await page.goto('http://google.com');
          await page.screenshot({path: 'example.png'});
          await browser.close();
})();

Run Code Online (Sandbox Code Playgroud)

Dockerfile(受Troubleshooting.md #running- puppeteer -in- docker的启发): …

node.js docker puppeteer

Ale*_*lex

2019 02-13

15
推荐指数

1
解决办法

1705
查看次数

如何使用 Puppeteer 从 XHR 请求中获取 body / json 响应

我想从我用 Puppeteer 抓取的网站获取 JSON 数据，但我不知道如何取回请求的正文。这是我尝试过的：

const puppeteer = require('puppeteer')
const results = [];
(async () => {
    const browser = await puppeteer.launch({
        headless: false
    })
    const page = await browser.newPage()
    await page.goto("https://capuk.org/i-want-help/courses/cap-money-course/introduction", {
        waitUntil: 'networkidle2'
    });

    await page.type('#search-form > input[type="text"]', 'bd14ew')  
    await page.click('#search-form > input[type="submit"]')

    await page.on('response', response => {    
        if (response.url() == "https://capuk.org/ajax_search/capmoneycourses"){
            console.log('XHR response received'); 
            console.log(response.json()); 
        } 
    }); 
})()

Run Code Online (Sandbox Code Playgroud)

这只是返回一个承诺挂起函数。任何帮助都会很棒。

javascript webautomation node.js puppeteer

Rus*_*sty

2020 11-02

15
推荐指数

1
解决办法

2万
查看次数

将 HTML 转换为 PDF 或 PNG，无需 NodeJS 中的无头浏览器实例

长话短说：

NodeJS 中将 HTML 转换为 PDF 或 PNG 而不使用任何无头浏览器实例的任何建议。
任何人都可以在任何生产环境中使用 puppeteer。我想知道在产品中运行无头浏览器的资源利用率和性能如何。

更长的版本：

在 NodeJS 服务器中，我们需要根据请求参数将 HTML 字符串转换为 PDF 或 PNG。我们正在使用 puppeteer 生成部署在谷歌云功能中的 PDF 和 PNG（屏幕截图）。在我本地的 Docker 中运行此应用程序并将内存使用量限制为 100MB，这似乎有效。但是在云函数中，当我们将云函数设置为250MB内存时，它会抛出内存限制异常。作为临时解决方案，我们将云功能升级到 1 GB。

我们希望在没有任何无头浏览器方法的情况下尝试 puppeteer 的任何替代方案。另一个库 PDF-Kit 看起来不错，但它有 canvas api 类型的输入。我们不能直接提供 html。

对此有任何想法或意见

html pdfkit node.js html-pdf puppeteer

Ana*_*rem

lucky-day

15
推荐指数

1
解决办法

7983
查看次数

Webscraping TimeoutError：超过 30000 毫秒的导航超时

我正在尝试使用 puppeteer 从公司网站中提取一些表格。

但我不明白为什么浏览器打开 Chromium 而不是我的默认 Chrome，然后导致“TimeoutError：超过 30000 毫秒的导航超时”，不让我有足够的时间使用 CSS Selector。我找不到引用此的文档。

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({headless: false});
  const page = await browser.newPage()
  await page.goto('https://www....com');
//search tearm
  await page.type("#search_term","Brazil");

  //await page.screenshot({path: 'sc2.png'});
  //await browser.close();
})();

Run Code Online (Sandbox Code Playgroud)

node.js platformio puppeteer

Tha*_*ang

2023 09-09

15
推荐指数

2
解决办法

6万
查看次数

Cloud Functions Puppeteer 无法打开浏览器

我在 GCF 中的设置：

npm install --save puppeteer从项目云 shell安装
像这样编辑 package.json ：

{ "dependencies": { "puppeteer": "^19.2.2" } }
将medium.com中的代码粘贴到index.js中： https://gist.githubusercontent.com/Alezco/b9b7ce4ec7ee7f208818e395225fcbbe/raw/8554acc8b311a10e272f5d1b98dce3400945bb00/index.js
使用 2 GB RAM 部署，0-3 个实例，最长 500 秒超时

构建或打开 URL 后出现以下错误：

内部服务器错误
找不到 Chromium（修订版 1056772）。如果出现以下情况，就会发生这种情况： 1. 您在运行脚本之前没有执行安装（例如npm install）或 2. 您的缓存路径配置不正确（即：/workspace/.cache/puppeteer）。对于 (2)，请查看我们有关配置 puppeteer 的指南：https://pptr.dev/guides/configuration。

当我运行时，npm listwebdriver 和 puppeteer 都已安装。我怀疑这条路径有问题，但我不知道它应该通向哪里。executablePath然后我可以为 puppeteer.launch() 提供可能解决问题的参数。我尝试重新安装 puppeteer 并更改配置。没有运气。

node.js google-cloud-functions puppeteer

smi*_*007

lucky-day

15
推荐指数

1
解决办法

1万
查看次数

Puppeteer:获取内部HTML

有没有人知道如何获取元素的innerHTML或文本.甚至更好; 如何单击具有特定innerHTML的元素.这是如何使用普通的javascript:

var found = false
$(selector).each(function() {
                if (found) return;
                else if ($(this).text().replace(/[^0-9]/g, '') === '5' {
                    $(this).trigger('click');
                    found = true
                }

Run Code Online (Sandbox Code Playgroud)

在此先感谢您的帮助!

javascript selenium webautomation node.js puppeteer

Noa*_*oah

lucky-day

14
推荐指数

4
解决办法

2万
查看次数