Win*_*ton 6 python forms scrapy web-scraping scrapy-spider
我尝试使用scrapy来完成登录并收集我的项目提交计数.这是代码.
from scrapy.item import Item, Field
from scrapy.http import FormRequest
from scrapy.spider import Spider
from scrapy.utils.response import open_in_browser
class GitSpider(Spider):
name = "github"
allowed_domains = ["github.com"]
start_urls = ["https://www.github.com/login"]
def parse(self, response):
formdata = {'login': 'username',
'password': 'password' }
yield FormRequest.from_response(response,
formdata=formdata,
clickdata={'name': 'commit'},
callback=self.parse1)
def parse1(self, response):
open_in_browser(response)
Run Code Online (Sandbox Code Playgroud)
运行代码后
scrapy runspider github.py
Run Code Online (Sandbox Code Playgroud)
它应该显示表单的结果页面,该页面应该是在同一页面中失败的登录页面,因为用户名和密码是假的.但是它显示了搜索页面.日志文件位于pastebin中
如何修复代码?提前致谢.
您的问题是FormRequest.from_response()使用不同的表单 - "搜索表单".但是,您希望它使用"登录表单".提供一个formnumber论点:
yield FormRequest.from_response(response,
formnumber=1,
formdata=formdata,
clickdata={'name': 'commit'},
callback=self.parse1)
Run Code Online (Sandbox Code Playgroud)
以下是我在应用更改后在浏览器中看到的内容(使用"假"用户):

使用网络驱动程序的解决方案。
\n\nfrom selenium import webdriver\nfrom selenium.webdriver.common.action_chains import ActionChains\nimport time\nfrom scrapy.contrib.spiders import CrawlSpider\n\nclass GitSpider(CrawlSpider):\n\n name = "gitscrape"\n allowed_domains = ["github.com"]\n start_urls = ["https://www.github.com/login"]\n\n def __init__(self):\n self.driver = webdriver.Firefox()\n\n def parse(self, response):\n self.driver.get(response.url)\n login_form = self.driver.find_element_by_name(\'login\')\n password_form = self.driver.find_element_by_name(\'password\')\n commit = self.driver.find_element_by_name(\'commit\')\n login_form.send_keys("yourlogin")\n password_form.send_keys("yourpassword")\n actions = ActionChains(self.driver)\n actions.click(commit)\n actions.perform()\n # by this point you are logged to github and have access \n #to all data in the main men\xc3\xb9\n time.sleep(3)\n self.driver.close()\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
9284 次 |
| 最近记录: |