小编beb*_*boy的帖子

使用scrapy点击网站上的按钮

我想问一下(爬行)从这个站点点击下一步按钮(更改网站的编号页面)(然后爬行更多直到页码末尾)如何

我尝试将刮削与硒结合使用,但它仍然出错并说 "line 22 self.driver = webdriver.Firefox() ^ IndentationError: expected an indented block"

我不知道为什么会这样,我觉得我的代码很好。有人能解决这个问题吗?

这是我的来源:

from selenium import webdriver
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from now.items import NowItem
class MySpider(BaseSpider):
name = "nowhere"
allowed_domains = ["n0where.net"]
start_urls = ["https://n0where.net/"]

def parse(self, response):
    for article in response.css('.loop-panel'):
        item = NowItem()
        item['title'] = article.css('.article-title::text').extract_first()
        item['link'] = article.css('.loop-panel>a::attr(href)').extract_first()
        item['body'] ='' .join(article.css('.excerpt p::text').extract()).strip()
        #item['date'] = article.css('[itemprop="datePublished"]::attr(content)').extract_first()
        yield item

def __init__(self):
    self.driver = webdriver.Firefox()

    def parse2(self, response):
    self.driver.get(response.url)

    while True: …
Run Code Online (Sandbox Code Playgroud)

python selenium web-crawler scrapy

3
推荐指数
1
解决办法
5886
查看次数

标签 统计

python ×1

scrapy ×1

selenium ×1

web-crawler ×1