scrapy管道返回延迟的整合测试

Bj *_*icz 6 twisted nose scrapy python-2.7

是否可以创建scrapy-pipeline的集成测试?我无法弄清楚如何做到这一点.特别是我正在尝试为FilesPipeline编写一个测试,我也希望它能够将我的模拟响应持久化到Amazon S3.

这是我的测试:

def _mocked_download_func(request, info):
    return Response(url=response.url, status=200, body="test", request=request)

class FilesPipelineTests(unittest.TestCase):

    def setUp(self):
        self.settings = get_project_settings()
        crawler = Crawler(self.settings)
        crawler.configure()
        self.pipeline = FilesPipeline.from_crawler(crawler)
        self.pipeline.open_spider(None)
        self.pipeline.download_func = _mocked_download_func

    @defer.inlineCallbacks
    def test_file_should_be_directly_available_from_s3_when_processed(self):
        item = CrawlResult()
        item['id'] = "test"
        item['file_urls'] = ['http://localhost/test']
        result = yield self.pipeline.process_item(item, None)
        self.assertEquals(result['files'][0]['path'], "full/002338a87aab86c6b37ffa22100504ad1262f21b")
Run Code Online (Sandbox Code Playgroud)

我总是遇到以下错误:

DirtyReactorAggregateError: Reactor was unclean.
Run Code Online (Sandbox Code Playgroud)

如何使用扭曲和scrapy创建适当的测试?

Auf*_*gel 2

现在,我在没有调用 的情况下进行了管道测试from_crawler,因此它们并不理想,因为它们没有测试 的功能from_crawler,但它们可以工作。

我通过使用一个空Spider实例来完成它们:

from scrapy.spiders import Spider
# some other imports for my own stuff and standard libs

@pytest.fixture
def mqtt_client():
    client = mock.Mock()

    return client

def test_mqtt_pipeline_does_return_item_after_process(mqtt_client):
    spider = Spider(name='spider')
    pipeline = MqttOutputPipeline(mqtt_client, 'dummy-namespace')

    item = BasicItem()
    item['url'] = 'http://example.com/'
    item['source'] = 'dummy source'

    ret = pipeline.process_item(item, spider)

    assert ret is not None
Run Code Online (Sandbox Code Playgroud)

(其实我忘了打电话open_spider()

您还可以看看 scrapy 本身如何测试管道,例如MediaPipeline

class BaseMediaPipelineTestCase(unittest.TestCase):

    pipeline_class = MediaPipeline
    settings = None

    def setUp(self):
        self.spider = Spider('media.com')
        self.pipe = self.pipeline_class(download_func=_mocked_download_func,
                                    settings=Settings(self.settings))
        self.pipe.open_spider(self.spider)
        self.info = self.pipe.spiderinfo

    def test_default_media_to_download(self):
        request = Request('http://url')
        assert self.pipe.media_to_download(request, self.info) is None
Run Code Online (Sandbox Code Playgroud)

您还可以查看他们的其他单元测试。对我来说,这些总是对如何对 scrapy 组件进行单元测试的良好启发。

如果你也想测试这个from_crawler功能,你可以看看他们的Middleware测试。在这些测试中,他们经常用于from_crawler创建中间件,例如 OffsiteMiddleware

from scrapy.spiders import Spider
from scrapy.utils.test import get_crawler

class TestOffsiteMiddleware(TestCase):

    def setUp(self):
        crawler = get_crawler(Spider)
        self.spider = crawler._create_spider(**self._get_spiderargs())
        self.mw = OffsiteMiddleware.from_crawler(crawler)
        self.mw.spider_opened(self.spider)
Run Code Online (Sandbox Code Playgroud)

我假设这里的关键组件是get_crawler从调用scrapy.utils.test。似乎他们提取了一些您需要执行的调用才能拥有测​​试环境。