有人可以向我解释暂停/恢复功能的Scrapy工作原理吗?
scrapy我正在使用的版本是0.24.5
该文件并没有提供太多细节.
我有以下简单的蜘蛛:
class SampleSpider(Spider):
name = 'sample'
def start_requests(self):
yield Request(url='https://colostate.textbookrack.com/listingDetails?lst_id=1053')
yield Request(url='https://colostate.textbookrack.com/listingDetails?lst_id=1054')
yield Request(url='https://colostate.textbookrack.com/listingDetails?lst_id=1055')
def parse(self, response):
with open('responses.txt', 'a') as f:
f.write(response.url + '\n')
Run Code Online (Sandbox Code Playgroud)
我正在运行它:
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from scrapyproject.spiders.sample_spider import SampleSpider
spider = SampleSpider()
settings = get_project_settings()
settings.set('JOBDIR', '/some/path/scrapy_cache')
settings.set('DOWNLOAD_DELAY', 10)
crawler = Crawler(settings)
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()
Run Code Online (Sandbox Code Playgroud)
如您所见,我启用了JOBDIR选项,以便保存爬行状态.
我设置为DOWNLOAD_DELAY,10 seconds以便我可以在处理请求之前停止蜘蛛.我原以为我下次运行蜘蛛时,会不会重新生成请求.事实并非如此.
我在scrapy_cache文件夹中看到一个名为requests.queue的文件夹.但是,这总是空的. …
这是为什么代码给我以下错误上IE:"未知方法//作者[@select = - >的concat( 'TES'< - , 'TS').?
function a()
{
try
{
var xml ='<?xml version="1.0"?><book><author select="tests">blah</author></book>';
var doc = new ActiveXObject("Microsoft.XMLDOM");
doc.loadXML(xml);
node = doc.selectSingleNode("//author[@select = concat('tes','ts')]");
if(node == null)
{
alert("Node is null");
}
else
{
alert("Node is NOT null");
}
} catch(e)
{
alert(e.message);
}
}
Run Code Online (Sandbox Code Playgroud)