小编jac*_*ite的帖子

相对url到绝对url scrapy

需要帮助将相对url转换为scrapy spider中的绝对url.我需要将我的起始页面上的链接转换为绝对URL以获取已绘制项目的图像,这些图像位于起始页面上.我没有成功尝试不同的方法来实现这一点,我陷入了困境.有什么建议吗?

class ExampleSpider(scrapy.Spider):
    name = "example"
    allowed_domains = ["example.com"]
    start_urls = [
        "http://www.example.com/billboard",
        "http://www.example.com/billboard?page=1"
    ]

def parse(self, response): 
        image_urls = response.xpath('//div[@class="content"]/section[2]/div[2]/div/div/div/a/article/img/@src').extract()
        relative_url = response.xpath(u'''//div[contains(concat(" ", normalize-space(@class), " "), " content ")]/a/@href''').extract()     

        for image_url,url in zip(image_urls,absolute_urls):
            item = ExampleItem()
            item['image_urls'] = image_urls

        request = Request(url, callback=self.parse_dir_contents)
        request.meta['item'] = item
        yield request
Run Code Online (Sandbox Code Playgroud)

scrapy

10
推荐指数
1
解决办法
3967
查看次数

标签 统计

scrapy ×1