DUM*_*SER 4 javascript automation caching node.js puppeteer
所以我正在与 puppeteer 合作来实现自动化,它工作得很好,但是当我加载网站时,它比我的正常网站需要更多的时间来加载,我尝试使用这个进行缓存
const puppeteer = require('puppeteer');
let time = new Date()
async function test() {
const browser = await puppeteer.launch({
headless: true,
executablePath:"D:\\Desktop\\node_modules\\puppeteer\\.local-chromium\\win64-848005\\chrome-win\\chrome.exe",
args: ['--no-sandbox'],
});
const page = await browser.newPage();
const response = await page.goto('https://example.com/');
console.log(`${new Date() -time }`)
console.log(response);
await browser.close();
}
Run Code Online (Sandbox Code Playgroud)
它适用于 example.com 存储了缓存并且加载速度更快,但我的目标网站似乎不允许缓存存储
还有其他方法可以加快进程吗?
如果您只是希望网站在抓取时加载速度更快,并且不依赖某些图像或 JavaScript,则可以阻止这些资源。
按资源类型阻止
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', (req) => {
if (req.resourceType() === 'image') {
req.abort();
} else {
req.continue();
}
});
await page.goto('https://bbc.com');
await page.screenshot({path: 'no-images.png', fullPage: true});
await browser.close();
})();
Run Code Online (Sandbox Code Playgroud)
按域阻止
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: true,
});
const page = await browser.newPage();
const options = {
waitUntil: 'networkidle2',
timeout: 30000,
};
// Before: Normal navigtation
await page.goto('https://theverge.com', options);
await page.screenshot({path: 'before.png', fullPage: true});
const metrics = await page.metrics();
console.info(metrics);
// After: Navigation with some domains blocked
// Array of third-party domains to block
const blockedDomains = [
'https://pagead2.googlesyndication.com',
'https://creativecdn.com',
'https://www.googletagmanager.com',
'https://cdn.krxd.net',
'https://adservice.google.com',
'https://cdn.concert.io',
'https://z.moatads.com',
'https://cdn.permutive.com'];
await page.setRequestInterception(true);
page.on('request', (request) => {
const url = request.url();
if (blockedDomains.some((d) => url.startsWith(d))) {
request.abort();
} else {
request.continue();
}
});
await page.goto('https://theverge.com', options);
await page.screenshot({path: 'after.png', fullPage: true});
const metricsAfter = await page.metrics();
console.info(metricsAfter);
await browser.close();
})();
Run Code Online (Sandbox Code Playgroud)
来源: https: //github.com/addyosmani/puppeteer-webperf