如何从puppeteer抓取JSON?

Amy*_*oin 2 node.js scrape puppeteer

我登录到一个站点,它提供了一个浏览器cookie。

我去一个URL,它是一个json响应。

输入后如何刮取页面await page.goto('blahblahblah.json');

Rip*_*ppo 10

另一种不给您的方法intermittent issues是评估主体,使其变为可用,然后将其作为JSON返回,例如

const puppeteer = require('puppeteer'); 

async function run() {

    const browser = await puppeteer.launch( {
        headless: false  //change to true in prod!
    }); 

    const page = await browser.newPage(); 

    await page.goto('https://raw.githubusercontent.com/GoogleChrome/puppeteer/master/package.json');

    var content = await page.content(); 

    innerText = await page.evaluate(() =>  {
        return JSON.parse(document.querySelector("body").innerText); 
    }); 

    console.log("innerText now contains the JSON");
    console.log(innerText);

    //I will leave this as an excercise for you to
    //  write out to FS...

    await browser.close(); 

};

run(); 
Run Code Online (Sandbox Code Playgroud)


Pas*_*asi 3

您可以拦截网络响应,如下所示:

const puppeteer = require('puppeteer');
const fs = require('fs');
(async () => {
  const browser = await puppeteer.launch()
  const page = await browser.newPage()
  page.on('response', async response => {
    console.log('got response', response._url)
    const data = await response.buffer()
    fs.writeFileSync('/tmp/response.json', data)
  })
  await page.goto('https://raw.githubusercontent.com/GoogleChrome/puppeteer/master/package.json', {waitUntil: 'networkidle0'})
  await browser.close()
})()
Run Code Online (Sandbox Code Playgroud)