我试图从亚马逊抓取产品信息,但遇到了问题。当蜘蛛到达页面末尾时它会停止,我想为我的程序添加一种方法来一般搜索页面的下 3 页。我正在尝试编辑 start_urls,但我无法从函数解析内部执行此操作。此外,这没什么大不了的,但程序出于某种原因两次请求相同的信息。提前致谢。
import scrapy
from scrapy import Spider
from scrapy import Request
class ProductSpider(scrapy.Spider):
product = input("What product are you looking for? Keywords help for specific products: ")
name = "Product_spider"
allowed_domains=['www.amazon.ca']
start_urls = ['https://www.amazon.ca/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords='+product]
#so that websites will not block access to the spider
download_delay = 30
def parse(self, response):
temp_url_list = []
for i in range(3,6):
next_url = response.xpath('//*[@id="pagn"]/span['+str(i)+']/a/@href').extract()
next_url_final = response.urljoin(str(next_url[0]))
start_urls.append(str(next_url_final))
# xpath is similar to an address that is used to find certain …Run Code Online (Sandbox Code Playgroud)