Scrapy不抓取下一页url

Question

Scrapy不抓取下一页url

我的蜘蛛没有爬行第 2 页，但 XPath 返回正确的下一页链接，这是到下一页的绝对链接。

这是我的代码

from scrapy import Spider
from scrapy.http import Request, FormRequest



class MintSpiderSpider(Spider):

    name = 'Mint_spider'
    allowed_domains = ['example.com']
    start_urls = ['http://www.example.com/']

    def parse(self, response):
        urls =  response.xpath('//div[@class = "post-inner post-hover"]/h2/a/@href').extract()

        for url in urls:
            yield Request(url, callback=self.parse_lyrics)

        next_page_url = response.xpath('//li[@class="next right"]/a/@href').extract_first()
        if next_page_url:
            yield scrapy.Request(next_page_url, callback=self.parse)


    def parse_foo(self, response):
        info = response.xpath('//*[@class="songinfo"]/p/text()').extract()
        name =  response.xpath('//*[@id="lyric"]/h2/text()').extract()

        yield{
            'name' : name,
            'info': info
        }

Run Code Online (Sandbox Code Playgroud)

Answer 1

Adr*_*uer 5

问题是这next_page_url是一个列表，并且它需要是一个字符串形式的 url。您需要使用该extract_first()函数而不是extract()in next_page_url = response.xpath('//li[@class="next right"]/a/@href').extract()。

更新

你必须这样做，import scrapy因为你正在使用yield scrapy.Request(next_page_url, callback=self.parse)

归档时间：	7 年，2 月前
查看次数：	2745 次
最近记录：	7 年，2 月前