我的互联网连接是通过具有身份验证的代理,当我尝试运行 scraoy 库以制作更简单的示例时,例如:
scrapy shell http://stackoverflow.com
Run Code Online (Sandbox Code Playgroud)
一切正常,直到您使用 XPath 选择器请求某些内容,响应是下一个:
>>> hxs.select('//title')
[<HtmlXPathSelector xpath='//title' data=u'<title>ERROR: Cache Access Denied</title'>]
Run Code Online (Sandbox Code Playgroud)
或者,如果您尝试运行在项目中创建的任何蜘蛛,则会出现以下错误:
C:\Users\Victor\Desktop\test\test>scrapy crawl test
2012-08-11 17:38:02-0400 [scrapy] INFO: Scrapy 0.16.5 started (bot: test)
2012-08-11 17:38:02-0400 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetCon
sole, CloseSpider, WebService, CoreStats, SpiderState
2012-08-11 17:38:02-0400 [scrapy] DEBUG: Enabled downloader middlewares: HttpAut
hMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, De
faultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpProxyMiddlewa
re, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2012-08-11 17:38:02-0400 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMi
ddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddle
ware
2012-08-11 17:38:02-0400 …Run Code Online (Sandbox Code Playgroud)