小编War*_*ord的帖子

Xpath错误 - 蜘蛛错误处理

所以我正在构建这个蜘蛛并且它爬行很好,因为我可以登录到shell并浏览HTML页面并测试我的Xpath查询.

不知道我做错了什么.任何帮助,将不胜感激.我已经重新安装了Twisted,但没有.

我的蜘蛛看起来像这样 -

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from spider_scrap.items import spiderItem

class spider(BaseSpider):
name="spider1"
#allowed_domains = ["example.com"]
start_urls = [                  
              "http://www.example.com"
            ]

def parse(self, response):
 items = [] 
    hxs = HtmlXPathSelector(response)
    sites = hxs.select('//*[@id="search_results"]/div[1]/div')

    for site in sites:
        item = spiderItem()
        item['title'] = site.select('div[2]/h2/a/text()').extract                            item['author'] = site.select('div[2]/span/a/text()').extract    
        item['price'] = site.select('div[3]/div[1]/div[1]/div/b/text()').extract()     
    items.append(item)
    return items
Run Code Online (Sandbox Code Playgroud)

当我运行蜘蛛 - scrapy爬行Spider1时,我收到以下错误 -

    2012-09-25 17:56:12-0400 [scrapy] DEBUG: Enabled item pipelines:
    2012-09-25 17:56:12-0400 [Spider1] INFO: Spider opened
    2012-09-25 17:56:12-0400 [Spider1] INFO: Crawled …
Run Code Online (Sandbox Code Playgroud)

xpath scrapy

2
推荐指数
1
解决办法
9235
查看次数

标签 统计

scrapy ×1

xpath ×1