我曾经使用启动表单请求登录其中一个网站。然而,开发人员改变了它,添加了更多的 JavaScript,我不知道我做错了什么。我添加了 javascript,该网站也使用了它。
class MySpider(scrapy.Spider):
name = "lost"
start_urls = ["mysite",] ###########changed main loggin form
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(
url,
self.parse,
args={'wait': 1},
)
def parse(self, response):
return SplashFormRequest.from_response(
response,
formdata={'mail': 'mymail', 'pass': 'mypasswd'},
callback=self.after_login
)
def after_login(self,response):
print('This is body '+response.body+' The end of body')
### Going to film list ######
if "Username" in response.body:
self.logger.error("##Success##")
Run Code Online (Sandbox Code Playgroud)
JavaScript:
$(document).ready(function(){
$('input[name="mail"],input[name="pass"]').keydown(function (e)
{
if(e.keyCode == 13)
{
login();
}
});
});
function login()
{
mail = $('input[name="mail"]').val(); …Run Code Online (Sandbox Code Playgroud) 看来我走到了死胡同。有没有办法在 asyncio 循环中运行scrapy spider?例如在下面的代码中:
import asyncio
from scrapy.crawler import CrawlerProcess
from myscrapy import MySpider
import scrapy
async def do_some_work():
process = CrawlerProcess()
await process.crawl(MySpider)
loop = asyncio.get_even_loop()
loop.run_until_complete(do_some_work())
Run Code Online (Sandbox Code Playgroud)
这导致我的错误:
raise TypeError('A Future, a coroutine or an awaitable is required')
TypeError: A Future, a coroutine or an awaitable is required
Run Code Online (Sandbox Code Playgroud)
我确实明白在 await 之后应该有另一个协程。有什么方法可以绕过它并仍然使其异步工作?谢谢