Cat*_*ton 85 javascript evaluate web-scraping puppeteer
我正在尝试将变量传递给Puppeteer中的page.evaluate()
函数,但是当我使用以下非常简化的示例时,变量是未定义的.evalVar
我是Puppeteer的新手,找不到任何构建的例子,所以我需要帮助将该变量传递给page.evaluate()
函数,以便我可以在里面使用它.
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
const evalVar = 'WHUT??';
try {
await page.goto('https://www.google.com.au');
await page.waitForSelector('#fbar');
const links = await page.evaluate((evalVar) => {
console.log('evalVar:', evalVar); // appears undefined
const urls = [];
hrefs = document.querySelectorAll('#fbar #fsl a');
hrefs.forEach(function(el) {
urls.push(el.href);
});
return urls;
})
console.log('links:', links);
} catch (err) {
console.log('ERR:', err.message);
} finally {
// browser.close();
}
})();
Run Code Online (Sandbox Code Playgroud)
flo*_*zia 132
你必须将变量作为参数传递给pageFunction
这样的:
const links = await page.evaluate((evalVar) => {
console.log(evalVar); // should be defined now
…
}, evalVar);
Run Code Online (Sandbox Code Playgroud)
参数也可以序列化:https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pageevaluatepagefunction-args.
Meh*_*ash 42
我鼓励你坚持这种风格,因为它更方便和可读.
let name = 'jack';
let age = 33;
let location = 'Berlin/Germany';
await page.evaluate(({name, age, location}) => {
console.log(name);
console.log(age);
console.log(location);
},{name, age, location});
Run Code Online (Sandbox Code Playgroud)
Gra*_*ler 25
你可以通过一个变量来page.evaluate()
使用的语法如下:
await page.evaluate(example => { /* ... */ }, example);
Run Code Online (Sandbox Code Playgroud)
注意:
()
除非要传递多个变量,否则不需要将变量括起来.
您可以通过多个变量来page.evaluate()
使用的语法如下:
await page.evaluate((example_1, example_2) => { /* ... */ }, example_1, example_2);
Run Code Online (Sandbox Code Playgroud)
注意:
{}
不必包含变量.
对于 pass a function
,有两种方法可以做到。
// 1. Defined in evaluationContext
await page.evaluate(() => {
window.yourFunc = function() {...};
});
const links = await page.evaluate(() => {
const func = window.yourFunc;
func();
});
// 2. Transform function to serializable(string). (Function can not be serialized)
const yourFunc = function() {...};
const obj = {
func: yourFunc.toString()
};
const otherObj = {
foo: 'bar'
};
const links = await page.evaluate((obj, aObj) => {
const funStr = obj.func;
const func = new Function(`return ${funStr}.apply(null, arguments)`)
func();
const foo = aObj.foo; // bar, for object
window.foo = foo;
debugger;
}, obj, otherObj);
Run Code Online (Sandbox Code Playgroud)
您可以添加devtools: true
到启动选项进行测试
我花了很长时间才弄清楚console.log()
in evaluate()
无法在节点控制台中显示。
参考:https : //github.com/GoogleChrome/puppeteer/issues/1944
everything that is run inside the page.evaluate function is done in the context of the browser page. The script is running in the browser not in node.js so if you log it will show in the browsers console which if you are running headless you will not see. You also can't set a node breakpoint inside the function.
Hope this can help.
归档时间: |
|
查看次数: |
38749 次 |
最近记录: |