小编Lio*_*oRz的帖子

更改 puppeteer-extra 上的用户代理似乎没有生效

我正在尝试使用 puppeteer 抓取不同的网站。由于我使用puppeteer-extra（对于他们的Stealth-plugin），我决定使用他们的anonymize-ua 插件来随机更改默认用户代理以进一步减少检测。

我尝试按照他们的解释进行操作，但是当我记录浏览器的实际用户代理时，它似乎没有生效。

下面附上我正在做的一个例子：

import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import UserAgent from 'user-agents';

const scrape = async (url: string) => {
    // Set stealth plugin
    const stealthPlugin = StealthPlugin();
    puppeteer.use(stealthPlugin);

    // Create random user-agent to be set through plugin
    const userAgent = new UserAgent({ platform: 'MacIntel', deviceCategory: 'desktop' });
    const userAgentStr = userAgent.toString();
    console.log(`User Agent: ${userAgentStr}`);

    const anonymizeUserAgentPlugin = require('puppeteer-extra-plugin-anonymize-ua')({
        customFn: () => userAgentStr 
    });
    puppeteer.use(anonymizeUserAgentPlugin);

    puppeteer
        .launch({ headless: …

Run Code Online (Sandbox Code Playgroud)

user-agent chromium typescript puppeteer

Lio*_*oRz

lucky-day

7
推荐指数

1
解决办法

5001
查看次数

Using AWS DynamoDBDocumentClient for Marshall/Unmarshall Complex Objects Throws an Error

I'm using AWS SDK (v3) in my NodeJS/Typescript application, specifically their DynamoDBDocumentClient to easily marshall/unmarshall my entities to reduce the amount of code needed to query the database.

As my entities are complex objects, meaning that an instance holds, for example, another class-type, or a array of them; I couldn't find any tutorials online to explain what I'm missing (maybe I'm not and that is how things need to be done) as the document-client makes me marshall them …

node.js amazon-dynamodb typescript dynamodb-queries aws-sdk-js-v3

Lio*_*oRz

lucky-day

6
推荐指数

0
解决办法

1686
查看次数

木偶-协议错误（Page.navigate）：目标已关闭

如您在下面的示例代码中所看到的，我将Puppeteer与Node中的一组工作人员一起使用，以通过给定的URL运行多个网站截图请求：

const cluster = require('cluster');
const express = require('express');
const bodyParser = require('body-parser');
const puppeteer = require('puppeteer');

async function getScreenshot(domain) {
    let screenshot;
    const browser = await puppeteer.launch({ args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'] });
    const page = await browser.newPage();

    try {
        await page.goto('http://' + domain + '/', { timeout: 60000, waitUntil: 'networkidle2' });
    } catch (error) {
        try {
            await page.goto('http://' + domain + '/', { timeout: 120000, waitUntil: 'networkidle2' });
            screenshot = await page.screenshot({ type: 'png', encoding: 'base64' });
        } …

Run Code Online (Sandbox Code Playgroud)

node.js web-scraping node-cluster google-chrome-headless puppeteer

Lio*_*oRz

lucky-day

5
推荐指数

5
解决办法

5819
查看次数