Moh*_*umi 7 python scrapy web-scraping python-3.x
题:
如何代理scrapy
请求socks5
?
我知道我可以
polipo
用来将Socks
代理转换为Http
代理
但:
我想设置一个中间件或一些更改 scrapy.Request
import scrapy
class BaseSpider(scrapy.Spider):
"""a base class that implements major functionality for crawling application"""
start_urls = ('https://google.com')
def start_requests(self):
proxies = {
'http': 'socks5://127.0.0.1:1080',
'https': 'socks5://127.0.0.1:1080'
}
for url in self.start_urls:
yield scrapy.Request(
url=url,
callback=self.parse,
meta={'proxy': proxies} # proxy should be string not dict
)
def parse(self, response):
# do ...
pass
Run Code Online (Sandbox Code Playgroud)
我应该分配给proxies
变量什么?
小智 9
有可能的。
$ pip3 install pproxy
Run Code Online (Sandbox Code Playgroud)
跑
$ pproxy -l http://:8181 -r socks5://127.0.0.1:9150 -vv
Run Code Online (Sandbox Code Playgroud)
创建中间件 ( middlewares.py
)
class ProxyMiddleware(object):
def process_request(self, request, spider):
request.meta['proxy'] = "http://127.0.0.1:8181"
Run Code Online (Sandbox Code Playgroud)
将其分配给DOWNLOADER_MIDDLEWARES
( settings.py
)
DOWNLOADER_MIDDLEWARES = {
'PROJECT_NAME_HERE.middlewares.ProxyMiddleware': 350
}
Run Code Online (Sandbox Code Playgroud)