小编Sam*_*ikh的帖子

Mongodb替换字符串中的单词

所以我有一个mongodb文档,其中包含这样的字段

Image : http://static14.com/p/Inc.5-Black-Sandals-5131-2713231-7-zoom.jpg
Run Code Online (Sandbox Code Playgroud)

我想用一些其他文本替换字符串中的缩放,以便:

   Image : http://static14.com/p/Inc.5-Black-Sandals-5131-2713231-7-product2.jpg
Run Code Online (Sandbox Code Playgroud)

有可能吗?

mongodb mongodb-query

5
推荐指数
1
解决办法
6803
查看次数

Scrapy打电话给另一个网址

我正在使用scrapy来抓一个网站.我从列表页面获取所有产品.现在我想去产品的每个网址,但我没有得到满意的结果.这是我的代码:

import scrapy
from scrapy.http import Request

from tutorial.items import DmozItem

class DmozSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domain = ["test.com"]
    start_urls = [
            "http://www.test.com/?page=1"
        ]

    page_index = 1

    def parse(self,response):
        products = response.xpath('//li')
        items = []
        if products:
            for product in products:
                item = DmozItem()
                    item['link'] = product.xpath('@data-url').extract()
                item['sku'] = product.xpath('@data-sku').extract()
                item['brand'] = product.xpath('.//span[contains(@class, "qa-brandName")]/text()').extract()
                item['img'] = product.xpath('.//img[contains(@class, "itm-img")]/@src').extract()
                page_url = "http://www.jabong.com/Lara-Karen-Black-Sweaters-893039.html"                
                request = Request(url=page_url,callback=self.parse_page2,
                headers={"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"})
                request.meta['item'] = item
                item['other'] = request
                    yield item
        else:
            return
        self.page_index += 1
            if self.page_index: …
Run Code Online (Sandbox Code Playgroud)

python scrapy web-scraping

2
推荐指数
1
解决办法
2061
查看次数

如何停止scrapy爬虫

如果满足某些条件我想停止蜘蛛我试着这样做: raise CloseSpider('Some Text')

sys.exit("SHUT DOWN EVERYTHING!")
Run Code Online (Sandbox Code Playgroud)

但它并没有停止.这是代码编写引发异常而不是返回也不会工作,因为蜘蛛继续爬行:

import scrapy
from scrapy.http import Request

from tutorial.items import DmozItem
from scrapy.exceptions import CloseSpider
import sys

class DmozSpider(scrapy.Spider):
    name = "tutorial"
    allowed_domain = ["jabong.com"]
    start_urls = [
            "http://www.jabong.com/women/shoes/sandals/?page=1"
        ]

    page_index = 1

    def parse(self,response):
        products = response.xpath('//li')

        if products:
            for product in products:
                item = DmozItem()
                item_url = product.xpath('@data-url').extract()
                item_url = "http://www.jabong.com/" + item_url[0] if item_url else ''   
                if item_url:
                        request=Request(url=item_url,callback=self.parse_page2,meta={"item":item},
                                headers={"Accept":
                        "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"})
                    request.meta['item'] = item
                    yield request
        else:
            return

        self.page_index …
Run Code Online (Sandbox Code Playgroud)

python scrapy

1
推荐指数
1
解决办法
3344
查看次数

标签 统计

python ×2

scrapy ×2

mongodb ×1

mongodb-query ×1

web-scraping ×1