小编Lan*_*son的帖子

Scrapy:递归爬网生成DEBUG:Crawled(200)并且没有项目输出

我正在尝试让我的第一个Scrapy递归蜘蛛在一个非常简单的站点上运行但是在JSON文件中得到了DEBUG:Crawled(200)问题.

我从网上抽了一个例子并尝试过.我真的不知道,问题出在哪里.谁能帮我这个?

蜘蛛代码:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.item import Item

class rgfMedlem(CrawlSpider):
    name = "rgfMedlem"
    allowed_domains = ["rgf.no"]
    start_urls = ["http://rgf.no/medlem/index.php"]

    rules = (
        Rule(SgmlLinkExtractor(allow=('index.php', ))),

        Rule(SgmlLinkExtractor(allow=('\?s=', )), callback='parse_item'),
    )

    def parse_item(self, response):
        hxs = HtmlXPathSelector(response)
        rows = hxs.select('//span[@class="innhold"]/table/tr')
        items = []
        item = SasItem()

        for row in rows:
            print "har ar jag"
            item['agent'] = row.select('td/b/text()').extract()
            item['org'] = row.select('td/b/text()').extract()
            item['link'] = rows.select('td/a/@href').extract()
            item['produkt'] = rows.select('td/b/text()').extract()
            items.append(item)

        return items
Run Code Online (Sandbox Code Playgroud)

蜘蛛爬行日志文件

2014-02-22 …
Run Code Online (Sandbox Code Playgroud)

python recursion web-crawler scrapy

4
推荐指数
1
解决办法
2120
查看次数

标签 统计

python ×1

recursion ×1

scrapy ×1

web-crawler ×1