在我的爬网过程中,某些页面因意外重定向而失败,并且未返回任何响应.如何捕获此类错误并使用原始URL重新安排请求,而不是使用重定向的URL?
在我问这里之前,我在Google上做了很多搜索.看起来有两种方法可以解决这个问题.一个是下载中间件中的catch异常,另一个是在spider的请求中处理errback中的下载异常.对于这两个问题,我有一些问题.
Run Code Online (Sandbox Code Playgroud)class ProxyMiddleware(object): def process_request(self, request, spider): request.meta['proxy'] = "http://192.168.10.10" log.msg('>>>> Proxy %s'%(request.meta['proxy'] if request.meta['proxy'] else ""), level=log.DEBUG) def process_exception(self, request, exception, spider): log_msg('Failed to request url %s with proxy %s with exception %s' % (request.url, proxy if proxy else 'nil', str(exception))) #retry again. return request
对于方法2,我不知道如何将外部参数传递给spider中的errback函数.我不知道如何从这个errback函数中检索原始url来重新安排请求.
下面是我尝试使用方法2的示例:
Run Code Online (Sandbox Code Playgroud)class ProxytestSpider(Spider): name = "proxytest" allowed_domains = ["baidu.com"] start_urls = ( 'http://www.baidu.com/', ) def make_requests_from_url(self, url): starturl = url request = Request(url, dont_filter=True,callback = self.parse, errback = self.download_errback) print "make …
我正在尝试使用django-mssql连接到带有Django 1.4.2的MS SQL Server 2008 R2这些是我的数据库设置:
DATABASE_ENGINE = 'sqlserver_ado'
DATABASE_NAME = 'dbtest'
DATABASE_USER = 'App'
DATABASE_PASSWORD = '*********'
DATABASE_HOST = 'localhost'
DATABASE_OPTIONS = {
'provider': 'SQLNCLI10',
'extra_params': 'DataTypeCompatibility=80;MARS Connection=True;',
}
DATABASES = {
'default': {
'ENGINE': DATABASE_ENGINE,
'NAME': DATABASE_NAME,
'USER': DATABASE_USER,
'PASSWORD': DATABASE_PASSWORD,
'HOST': DATABASE_HOST,
'OPTIONS' : DATABASE_OPTIONS,
},
}
Run Code Online (Sandbox Code Playgroud)
这是我尝试syncdb时遇到的错误
Traceback (most recent call last):
File "C:\Python27\DataSatellite\manage.py", line 11, in <module>
execute_manager(settings)
File "C:\Python27\lib\site-packages\django\core\management\__init__.py", line 459, in execute_manager
utility.execute()
File "C:\Python27\lib\site-packages\django\core\management\__init__.py", line 382, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "C:\Python27\lib\site-packages\django\core\management\base.py", line …Run Code Online (Sandbox Code Playgroud)