PUPPETEER-无法使用page.evaluate（（）=> document.querySelectorAll（））在某些网站上提取元素

Question

PUPPETEER-无法使用page.evaluate（（）=> document.querySelectorAll（））在某些网站上提取元素

max*_*701 0 javascript node.js puppeteer

我正在尝试选择console.log()终端中网站的所有链接的NodeList。但是我无法访问某些网站-google.com，facebook.com，instagram.com。

我知道元素在那里，因为我当然可以将它们记录在实际的Chromium控制台中，该控制台使用单独加载document.querySelectorAll('a')。但是当我尝试在Node终端中提取和记录链接时，使用

const links = await page.evaluate(() => document.querySelectorAll('a'))
console.log(links)

Run Code Online (Sandbox Code Playgroud)

我懂了 undefined

但是，对于大多数网站（例如yahoo.com，linkedin.com）而言，情况并非如此。这里是：

const URL = 'https://instagram.com/';
const scrape = async () => {
    const browser = await puppeteer.launch({
        headless: false
    });
    const page = await browser.newPage();
    await page.setViewport({
        width: 1240,
        height: 680
    });
    await page.goto(URL, { waitUntil: 'domcontentloaded' });
    await page.waitFor(6000);
    const links = await page.evaluate(() => document.querySelectorAll('a'));
    console.log(links);
    await page.screenshot({
        path: 'ig.png'
    });
    await browser.close();
};

Run Code Online (Sandbox Code Playgroud)

bypassBotDetectionSystem()如本文所建议，我尝试添加函数，但是没有用。我认为这不是问题所在，因为就像我说的那样，我可以轻松浏览Chromium中的内容。

感谢帮助！

Answer 1

Yev*_*kov 5

您正在尝试DOM使用page.evaluate方法返回元素，但是这是不可能的，因为如果传递给函数的函数page.evaluate返回一个不可序列化的值，那么page.evaluate将undefined根据您的情况解析为。

如果要获取的数组，则可以使用page。$$方法ElementHandle。

例：

const links = await page.$$('a'); // returns <Promise<Array<ElementHandle>>>

Run Code Online (Sandbox Code Playgroud)

但是，如果只想获取attribute的所有值（例如href），则可以使用page。$$ eval方法，该方法Array.from(document.querySelectorAll(selector))在页面内运行并将其作为第一个参数传递给pageFunction

例：

const hrefs = await page.$$eval('a', links => links.map(link => link.href));
console.log(hrefs);

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，3 月前
查看次数：	106 次
最近记录：	6 年前

PUPPETEER-无法使用page.evaluate（（）=&gt; document.querySelectorAll（））在某些网站上提取元素

PUPPETEER-无法使用page.evaluate（（）=> document.querySelectorAll（））在某些网站上提取元素