如何从scrapy运行中获取统计数据？

Question

如何从scrapy运行中获取统计数据？

Ani*_*ish 2 python mysql scrapy web-scraping

我按照scrapy docs中的示例从外部文件运行scrapy spider.我想获取Core API提供的统计信息,并在爬网完成后将其存储到mysql表中.

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from test.spiders.myspider import *
from scrapy.utils.project import get_project_settings
from test.pipelines import MySQLStorePipeline
import datetime

spider = MySpider()


def run_spider(spider):        
    settings = get_project_settings()
    crawler = Crawler(settings)
    crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
    crawler.configure()
    crawler.crawl(spider)
    crawler.start()
    log.start()
    reactor.run()
    mysql_insert = MySQLStorePipeline()
        mysql_insert.cursor.execute(
            'insert into crawler_stats(sites_id, start_time,end_time,page_scraped,finish_reason) 
              values(%s,%s,%s, %s,%s)',
                  (1,datetime.datetime.now(),datetime.datetime.now(),100,'test'))

    mysql_insert.conn.commit()

run_spider(spider)

Run Code Online (Sandbox Code Playgroud)

如何在上面的代码中获取start_time,end_time,pages_scraped,finish_reason等统计信息的值？

Answer 1

ale*_*cxe 5

从crawler.stats收藏家那里获取它们:

stats = crawler.stats.get_stats()

Run Code Online (Sandbox Code Playgroud)

示例代码(在spider_closed信号处理程序中收集统计信息):

def callback(spider, reason):
    stats = spider.crawler.stats.get_stats()  # stats is a dictionary

    # write stats to the database here

    reactor.stop()


def run_spider(spider):        
    settings = get_project_settings()
    crawler = Crawler(settings)
    crawler.signals.connect(callback, signal=signals.spider_closed)
    crawler.configure()
    crawler.crawl(spider)
    crawler.start()
    log.start()
    reactor.run()


run_spider(spider)

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，11 月前
查看次数：	2893 次
最近记录：	10 年，11 月前