Sal*_*yeb 4 javascript node.js web-scraping puppeteer
I am facing this problem in puppeteer in a for loop when i go on another page to get data, then when i go back it comes me this error line:
Error "We have an error Error: the execution context was destroyed, probably because of a navigation."
Run Code Online (Sandbox Code Playgroud)
It's a directory page that contains 15 companies per page and then I want to visit each company to get information.
try {
const browser = await pupputer.launch({
headless: false,
devtools: true,
defaultViewport: {
width: 1100,
height: 1000
}
});
const page = await browser.newPage();
await page.goto('MyLink');
await page.waitForSelector('.list-firms');
for (var i = 1; i < 10; i++) {
const listeCompanies = await page.$$('.list-firms > div.firm');
for (const companie of listeCompanies) {
const name = await companie.$eval('.listing-body > h3 > a', name => name.innerText);
const link = await companie.$eval('.listing-body > h3 > a', link => link.href);
await Promise.all([
page.waitForNavigation(),
page.goto(link),
page.waitForSelector('.firm-panel'),
]);
const info = await page.$eval('#info', e => e.innerText);
const data = [{
name: name,
information: info,
}];
await page.goBack();
}
await Promise.all([
page.waitForNavigation(),
page.click('span.page > a[rel="next"]')
]);
}
} catch (e) {
console.log('We have error', e);
}
Run Code Online (Sandbox Code Playgroud)
I managed to only get the data of the first company.
该错误表示您正在访问由于导航而变得过时/无效的数据。在脚本中,错误引用了变量listeCompanies:
const listeCompanies = await page.$$('.list-firms > div.firm');
Run Code Online (Sandbox Code Playgroud)
首先,请在循环中使用此变量,然后通过page.goto和进行导航,然后循环尝试从变量中获取下一项listeCompanies。但是在导航发生之后,该变量中的元素句柄不再存在,因此引发了错误。这就是为什么第一次迭代有效的原因。
有多种解决方法。
page.goBack这是最干净的方法。您一次提取第一页中的信息,然后遍历提取的数据。在nameLinkList将与一个数组name和link值(例如[{name: '..', link: '..'}, {name: '..', link: '..'}])。page.goBack由于已经提取了数据,因此也不需要在循环末尾调用。
const nameLinkList = await page.$$eval(
'.list-firms > div.firm',
(firms => firms.map(firm => {
const a = firm.querySelector('.listing-body > h3 > a');
return {
name: a.innerText,
link: a.href
};
}))
);
for (const {name, link} of arr) {
await Promise.all([
page.waitForNavigation(),
page.goto(link),
page.waitForSelector('.firm-panel'),
]);
const info = await page.$eval('#info', e => e.innerText);
const data = [{
name: name,
information: info,
}];
}
Run Code Online (Sandbox Code Playgroud)
在这种情况下,您的浏览器将有两个打开的页面。第一个仅用于读取数据,第二个用于导航。
const page2 = await browser.newPage();
for (const companie of listeCompanies ){
const name = await companie.$eval('.listing-body > h3 > a', name => name.innerText);
const link = await companie.$eval('.listing-body > h3 > a', link => link.href);
await Promise.all([
page2.goto(link),
page2.waitForSelector('.firm-panel'),
]);
const info = await page2.$eval('#info', e => e.innerText);
// ...
}
Run Code Online (Sandbox Code Playgroud)
返回“主页”后,您只需在这里重新执行选择器即可。请注意,for..of在替换数组时,必须将其更改为迭代器循环。
let listeCompanies = await page.$$('.list-firms > div.firm');
for (let i = 0; i < listeCompanies.length; i++){
// ...
await page.goBack();
listeCompanies = await page.$$('.list-firms > div.firm');
}
Run Code Online (Sandbox Code Playgroud)
我建议选择选项1,因为这也减少了必要的导航请求,因此可以加快脚本的速度。