如何使用scrapy登录没有form元素的scrapy

0 python scrapy

我尝试登录一些网站,但似乎他们不使用表单来显示登录对话框。所以当使用 FormRequest 时,我收到错误

\n\n
raise ValueError("No <form> element found in %s" % response)\n
Run Code Online (Sandbox Code Playgroud)\n\n

那么在这种情况下我该如何使用 scrapy 登录呢?

\n\n

我尝试在这个网站中找到一个表单元素(使用 chrome devtool 和 xpath //form ),但结果为零

\n\n

它的登录元素是

\n\n
<div class="loginModalBody">\n            <div class="coverLoginModal">\n                <p class="loginModalTitle">Login </p>\n\n                    <div class=""><p class="login-msg"></p></div>\n\n                    <!-- Email -->\n                    <div class="loginCoverInputText">\n                        <input class="loginInputText" id="email-login" role="presentation" autocomplete="off" type="email" name="loginEmail" placeholder="E-mail">\n                        <span class="loginNameInputText">E-mail</span>\n                        <span class="loginLineInputText"></span>\n                        <!-- Error email -->\n                        <div class="dontEnterEmail loginErrorInput"><p class="loginError">Vui l\xc3\xb2ng nh\xe1\xba\xadp email<span class="loginIconError"></span></p></div>\n                        <div class="loginEmailInvalid loginErrorInput"><p class="loginError">Invalid email<span class="loginIconError"></span></p></div>\n                    </div>\n\n                    <!-- Password -->\n                    <div class="loginCoverInputText">\n                        <input class="loginInputText" id="password-login" autocomplete="new-password" type="password" name="loginPassword" placeholder="Password">\n                        <span class="loginNameInputText">Password</span>\n                        <span class="loginLineInputText"></span>\n                        <!-- Error password -->\n                        <div class="dontEnterPassword loginErrorInput"><p class="loginError">Enter password<span class="loginIconError"></span></p></div>\n                    </div>\n\n\n                    <!-- Remember password -->\n                    <label class="loginRememberPassword" id="login-remember-pass" for="loginRememberPassword"><input id="loginRememberPassword" type="checkbox" name="loginRememberPassword"><span></span>Ghi nh\xe1\xbb\x9b m\xe1\xba\xadt kh\xe1\xba\xa9u</label>\n                    <p class="loginForgotPassword forgot-password"> <a href="javascript:void(0)" data-dismiss="modal"><span></span>forgot pass</a></p>\n\n                    <button class="loginButtonSubmit btn-login" id="btn-login-system" type="button">Login</button>\n\n\n                <p class="loginDontAccount">Do not have account? <a class="not-account" href="javascript:void(0)" data-dismiss="modal" data-toggle="modal" data-target="#modal-signup-system">Register!</a></p>\n                <p class="loginOr">Or</p>\n\n                <button type="button" class="loginByGoogle" onclick="open_login_g()">Login with Google</button>\n                <button type="button" class="loginByFacebook" onclick="open_login_f()">Login with Facebook</button>\n\n            </div>\n        </div>\n
Run Code Online (Sandbox Code Playgroud)\n\n

我使用的代码是

\n\n
class Spider(scrapy.Spider):\n    name = "card"\n    start_urls = ["https://website/auth/signin"]\n    login_user = "foo"\n    login_pass = "bar"\n\n    def parse(self, response):\n        \'\'\'Parse login page\'\'\'\n        open_in_browser(response)\n        return FormRequest.from_response(\n            response,\n            formdata={\n                \'email\':"username",\n                \'password\': "pass"\n            },\n            callback=self.parse_home\n        )\n\n    def parse_home(self, response):\n        open_in_browser(response)\n        print response\n
Run Code Online (Sandbox Code Playgroud)\n

Dan*_*kin 5

网络抓取与请求和响应有关,因此您所需要做的就是模拟所有用户请求。FormRequest只是帮助我们避免额外的表单工作。在这种情况下,您需要发出正确的登录请求

  1. 转到所需页面并在浏览器中打开开发者工具(例如 Chrome)
  2. 检查选项卡preserve log中的选项Network
  3. 在页面上填写凭据并按下login按钮。
  4. 找出登录请求(按下按钮后)
  5. 检查Headers请求中的选项卡并找出请求类型和参数(可以是带有一些查询字符串参数的 GET 或带有一些查询字符串参数的 POST)Form Data
  6. 在您的代码中尝试使用简单的 scrapy请求而不是重现登录请求FormRequest