现在我有2只蜘蛛,我想做的是
1去url1,如果url2出现,请致电蜘蛛2用url2.还url1使用管道保存内容.2去url2做某事.由于两只蜘蛛的复杂性,我希望将它们分开.
我尝试过使用的scrapy crawl:
def parse(self, response):
p = multiprocessing.Process(
target=self.testfunc())
p.join()
p.start()
def testfunc(self):
settings = get_project_settings()
crawler = CrawlerRunner(settings)
crawler.crawl(<spidername>, <arguments>)
Run Code Online (Sandbox Code Playgroud)
它确实加载了设置但没有抓取:
2015-08-24 14:13:32 [scrapy] INFO: Enabled extensions: CloseSpider, LogStats, CoreStats, SpiderState
2015-08-24 14:13:32 [scrapy] INFO: Enabled downloader middlewares: DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, HttpAuthMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-08-24 14:13:32 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, …Run Code Online (Sandbox Code Playgroud) 在使用Python的max()内置方法时,我发现了一件有趣的事情....
input_one = u'A????;B??;??;D??????;E??????????????????'
input_two = u'????;??;??;??????;??????????????????'
input_en = u'test;test,test,test;testtesttest;testtesttesttest'
input_ja = u'??????;???;???????;????????????'
input_ja_mixed = u'a??????;b???;c???????;d????????????'
input_ascii = u'egfwergreger;@#@$fgdfdfdfdsfsdfsdf;sdfsdfsfsdfs233'
def test_length(input):
lengths = []
for i in input:
lengths.append(len(i))
index = find_index(input, max(lengths))
return input[index]
def find_index(input, to_find):
for index, value in enumerate(input):
print('index: %s, length: %s, value: %s' % (index, len(value), value))
if len(value) == to_find:
return index
def test_one(input):
input = input.split(';')
print('input:', input)
print('using test_length: ', test_length(input))
print('using max():', max(input))
Run Code Online (Sandbox Code Playgroud)
如果max()用于查找仅包含英文字母的列表中的max元素,则效果很好.
但是,如果元素与符号(如@ …