我在我的代码中使用scrapy splash 来生成javascript-html 代码。
飞溅把这个 render.html 还给我
{
"error": 400,
"type": "BadOption",
"description": "Incorrect HTTP API arguments",
"info": {
"type": "argument_required",
"argument": "url",
"description": "Required argument is missing: url"
}
}
Run Code Online (Sandbox Code Playgroud)
而且我无法通过 javascript 生成的 html 获得响应。这是我的spider.py
class ThespiderSpider(scrapy.Spider):
name = 'thespider'
#allowed_domains = ['https://www.empresia.es/empresa/repsol/']
start_urls = ['https://www.empresia.es/empresa/repsol/']
def start_requests(self):
yield scrapy.Request( 'http://example.com', self.fake_start_requests )
def fake_start_requests(self, response):
for url in self.start_urls:
yield SplashRequest( url, self.parse,
args={'wait': 1.5, 'http_method': 'POST'},
endpoint='render.html'
)
def parse(self, response):
open_in_browser(response)
title = response.css("title").extract()
# har …Run Code Online (Sandbox Code Playgroud)