使用Scrapy模拟JavaScript按钮单击

Dár*_*eto 0 javascript python web-crawler scrapy

我的目的是在这个网页上运行scrapy爬虫:http://visit.rio/en/o-que-fazer/outdoors/ .但是,id ="container"上有一些资源只能通过JavaScript按钮("VER MAIS")加载.我读过一些关于硒的东西,但我什么都没有.

Raf*_*ida 7

你读得对,你最好的选择是使用Firefox浏览器scrapy + selenium或像PhantomJS这样的无头浏览器来加快抓取速度.

示例改编自/sf/answers/1258549981/

import scrapy
from selenium import webdriver

class ProductSpider(scrapy.Spider):
    name = "product_spider"
    allowed_domains = ['visit.rio']
    start_urls = ['http://visit.rio/en/o-que-fazer/outdoors']

    def __init__(self):
        self.driver = webdriver.Firefox()
    def parse(self, response):
        self.driver.get(response.url)

        while True:
            next = self.driver.find_element_by_xpath('//div[@id="show_more"]/a')

            try:
                next.click()

                # get the data and write it to scrapy items
            except:
                break

        self.driver.close()
Run Code Online (Sandbox Code Playgroud)