我创建了一个简单的 scrapy 项目,在其中,我从初始站点 example.com/full 获取了总页码。现在我需要抓取从 example.com/page-2 开始到 100 的所有页面(如果总页数为 100)。我怎样才能做到这一点?
任何意见将是有益的。
代码:
import scrapy
class AllSpider(scrapy.Spider):
name = 'all'
allowed_domains = ['example.com']
start_urls = ['https://example.com/full/']
total_pages = 0
def parse(self, response):
total_pages = response.xpath("//body/section/div/section/div/div/ul/li[6]/a/text()").extract_first()
#urls = ('https://example.com/page-{}'.format(i) for i in range(1,total_pages))
print(total_pages)
Run Code Online (Sandbox Code Playgroud)
更新#1:
我尝试使用它,urls = ('https://example.com/page-{}'.format(i) for i in range(1,total_pages))但它不起作用,可能是我做错了什么。
更新#2:我已经像这样更改了我的代码
class AllSpider(scrapy.Spider):
name = 'all'
allowed_domains = ['sanet.st']
start_urls = ['https://sanet.st/full/']
total_pages = 0
def parse(self, response):
total_pages = response.xpath("//body/section/div/section/div/div/ul/li[6]/a/text()").extract_first()
for page in range(2, int(total_pages)):
url = …Run Code Online (Sandbox Code Playgroud)