相关疑难解决方法(0)

如何整合Flask&Scrapy?

我正在使用scrapy来获取数据,我想使用flask web框架在网页中显示结果.但我不知道如何在烧瓶应用程序中调用蜘蛛.我试图用来CrawlerProcess调用我的蜘蛛,但是我得到了这样的错误:

ValueError
ValueError: signal only works in main thread

Traceback (most recent call last)
File "/Library/Python/2.7/site-packages/flask/app.py", line 1836, in __call__
return self.wsgi_app(environ, start_response)
File "/Library/Python/2.7/site-packages/flask/app.py", line 1820, in wsgi_app
response = self.make_response(self.handle_exception(e))
File "/Library/Python/2.7/site-packages/flask/app.py", line 1403, in handle_exception
reraise(exc_type, exc_value, tb)
File "/Library/Python/2.7/site-packages/flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "/Library/Python/2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/Library/Python/2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/Library/Python/2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request() …
Run Code Online (Sandbox Code Playgroud)

python scrapy flask

17
推荐指数
3
解决办法
8858
查看次数

为Scrapy构建RESTful Flask API

API应该允许包含用户想要抓取的URL的任意HTTP get请求,然后Flask应该返回scrape的结果.

以下代码适用于第一个http请求,但在反应器停止后,它将不会重新启动.我甚至可能不会以正确的方式解决这个问题,但我只是想在Heroku上放置一个RESTful scrapy API,到目前为止,我所能想到的就是它.

有没有更好的方法来构建这个解决方案?或者如何scrape_it在不停止扭曲的反应堆(不能再次启动)的情况下允许返回?

from flask import Flask
import os
import sys
import json

from n_grams.spiders.n_gram_spider import NGramsSpider

# scrapy api
from twisted.internet import reactor
import scrapy
from scrapy.crawler import CrawlerRunner
from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals

app = Flask(__name__)


def scrape_it(url):
    items = []
    def add_item(item):
        items.append(item)

    runner = CrawlerRunner()

    d = runner.crawl(NGramsSpider, [url])
    d.addBoth(lambda _: reactor.stop()) # <<< TROUBLES HERE ???

    dispatcher.connect(add_item, signal=signals.item_passed)

    reactor.run(installSignalHandlers=0) # the script will block here until the …
Run Code Online (Sandbox Code Playgroud)

python twisted heroku scrapy flask

8
推荐指数
2
解决办法
4351
查看次数

当使用twisted和trial启动两个等效的unittest时,ReactorNotRestartable

我有两个测试类(TrialTest1TrialTest2)写在两个文件(test_trial1.pytest_trial2.py)中大多数相同(唯一的区别是类名):

from twisted.internet import reactor
from twisted.trial import unittest


class TrialTest1(unittest.TestCase):

    def setUp(self):
        print("setUp()")

    def test_main(self):
        print("test_main")
        reactor.callLater(1, self._called_by_deffered1)
        reactor.run()

    def _called_by_deffered1(self):
        print("_called_by_deffered1")
        reactor.callLater(1, self._called_by_deffered2)

    def _called_by_deffered2(self):
        print("_called_by_deffered2")
        reactor.stop()

    def tearDown(self):
        print("tearDown()")
Run Code Online (Sandbox Code Playgroud)

当我完全运行每个测试时,一切都很好.但是当我启动它时,我有以下输出:

setUp()
test_main
_called_by_deffered1
_called_by_deffered2
tearDown()
setUp()
test_main
tearDown()

Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/twisted/internet/defer.py", line 137, in maybeDeferred
    result = f(*args, **kw)
  File "/usr/lib/python2.7/site-packages/twisted/internet/utils.py", line 203, in runWithWarningsSuppressed
    reraise(exc_info[1], exc_info[2])
  File "/usr/lib/python2.7/site-packages/twisted/internet/utils.py", line 199, in runWithWarningsSuppressed
    result …
Run Code Online (Sandbox Code Playgroud)

python testing unit-testing twisted trial

6
推荐指数
1
解决办法
6778
查看次数

标签 统计

python ×3

flask ×2

scrapy ×2

twisted ×2

heroku ×1

testing ×1

trial ×1

unit-testing ×1