有:
from twisted.internet import reactor
from scrapy.crawler import CrawlerProcess
Run Code Online (Sandbox Code Playgroud)
我总是成功地运行这个过程:
process = CrawlerProcess(get_project_settings())
process.crawl(*args)
# the script will block here until the crawling is finished
process.start()
Run Code Online (Sandbox Code Playgroud)
但是因为我已将此代码移动到web_crawler(self)函数中,如下所示:
def web_crawler(self):
# set up a crawler
process = CrawlerProcess(get_project_settings())
process.crawl(*args)
# the script will block here until the crawling is finished
process.start()
# (...)
return (result1, result2)
Run Code Online (Sandbox Code Playgroud)
并开始使用类实例化调用该方法,如:
def __call__(self):
results1 = test.web_crawler()[1]
results2 = test.web_crawler()[0]
Run Code Online (Sandbox Code Playgroud)
和运行:
test()
Run Code Online (Sandbox Code Playgroud)
我收到以下错误:
Traceback (most recent call last):
File "test.py", line 573, in <module> …Run Code Online (Sandbox Code Playgroud)