我正在尝试让我的第一个Scrapy递归蜘蛛在一个非常简单的站点上运行但是在JSON文件中得到了DEBUG:Crawled(200)问题.
我从网上抽了一个例子并尝试过.我真的不知道,问题出在哪里.谁能帮我这个?
蜘蛛代码:
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.item import Item
class rgfMedlem(CrawlSpider):
name = "rgfMedlem"
allowed_domains = ["rgf.no"]
start_urls = ["http://rgf.no/medlem/index.php"]
rules = (
Rule(SgmlLinkExtractor(allow=('index.php', ))),
Rule(SgmlLinkExtractor(allow=('\?s=', )), callback='parse_item'),
)
def parse_item(self, response):
hxs = HtmlXPathSelector(response)
rows = hxs.select('//span[@class="innhold"]/table/tr')
items = []
item = SasItem()
for row in rows:
print "har ar jag"
item['agent'] = row.select('td/b/text()').extract()
item['org'] = row.select('td/b/text()').extract()
item['link'] = rows.select('td/a/@href').extract()
item['produkt'] = rows.select('td/b/text()').extract()
items.append(item)
return items
Run Code Online (Sandbox Code Playgroud)
蜘蛛爬行日志文件
2014-02-22 …Run Code Online (Sandbox Code Playgroud)