Bj *_*icz 6 twisted nose scrapy python-2.7
是否可以创建scrapy-pipeline的集成测试?我无法弄清楚如何做到这一点.特别是我正在尝试为FilesPipeline编写一个测试,我也希望它能够将我的模拟响应持久化到Amazon S3.
这是我的测试:
def _mocked_download_func(request, info):
return Response(url=response.url, status=200, body="test", request=request)
class FilesPipelineTests(unittest.TestCase):
def setUp(self):
self.settings = get_project_settings()
crawler = Crawler(self.settings)
crawler.configure()
self.pipeline = FilesPipeline.from_crawler(crawler)
self.pipeline.open_spider(None)
self.pipeline.download_func = _mocked_download_func
@defer.inlineCallbacks
def test_file_should_be_directly_available_from_s3_when_processed(self):
item = CrawlResult()
item['id'] = "test"
item['file_urls'] = ['http://localhost/test']
result = yield self.pipeline.process_item(item, None)
self.assertEquals(result['files'][0]['path'], "full/002338a87aab86c6b37ffa22100504ad1262f21b")
Run Code Online (Sandbox Code Playgroud)
我总是遇到以下错误:
DirtyReactorAggregateError: Reactor was unclean.
Run Code Online (Sandbox Code Playgroud)
如何使用扭曲和scrapy创建适当的测试?
现在,我在没有调用 的情况下进行了管道测试from_crawler,因此它们并不理想,因为它们没有测试 的功能from_crawler,但它们可以工作。
我通过使用一个空Spider实例来完成它们:
from scrapy.spiders import Spider
# some other imports for my own stuff and standard libs
@pytest.fixture
def mqtt_client():
client = mock.Mock()
return client
def test_mqtt_pipeline_does_return_item_after_process(mqtt_client):
spider = Spider(name='spider')
pipeline = MqttOutputPipeline(mqtt_client, 'dummy-namespace')
item = BasicItem()
item['url'] = 'http://example.com/'
item['source'] = 'dummy source'
ret = pipeline.process_item(item, spider)
assert ret is not None
Run Code Online (Sandbox Code Playgroud)
(其实我忘了打电话open_spider())
您还可以看看 scrapy 本身如何测试管道,例如MediaPipeline:
class BaseMediaPipelineTestCase(unittest.TestCase):
pipeline_class = MediaPipeline
settings = None
def setUp(self):
self.spider = Spider('media.com')
self.pipe = self.pipeline_class(download_func=_mocked_download_func,
settings=Settings(self.settings))
self.pipe.open_spider(self.spider)
self.info = self.pipe.spiderinfo
def test_default_media_to_download(self):
request = Request('http://url')
assert self.pipe.media_to_download(request, self.info) is None
Run Code Online (Sandbox Code Playgroud)
您还可以查看他们的其他单元测试。对我来说,这些总是对如何对 scrapy 组件进行单元测试的良好启发。
如果你也想测试这个from_crawler功能,你可以看看他们的Middleware测试。在这些测试中,他们经常用于from_crawler创建中间件,例如 OffsiteMiddleware。
from scrapy.spiders import Spider
from scrapy.utils.test import get_crawler
class TestOffsiteMiddleware(TestCase):
def setUp(self):
crawler = get_crawler(Spider)
self.spider = crawler._create_spider(**self._get_spiderargs())
self.mw = OffsiteMiddleware.from_crawler(crawler)
self.mw.spider_opened(self.spider)
Run Code Online (Sandbox Code Playgroud)
我假设这里的关键组件是get_crawler从调用scrapy.utils.test。似乎他们提取了一些您需要执行的调用才能拥有测试环境。
| 归档时间: |
|
| 查看次数: |
409 次 |
| 最近记录: |