我想问一下(爬行)从这个站点点击下一步按钮(更改网站的编号页面)(然后爬行更多直到页码末尾)如何
我尝试将刮削与硒结合使用,但它仍然出错并说 "line 22
self.driver = webdriver.Firefox()
^
IndentationError: expected an indented block"
我不知道为什么会这样,我觉得我的代码很好。有人能解决这个问题吗?
这是我的来源:
from selenium import webdriver
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from now.items import NowItem
class MySpider(BaseSpider):
name = "nowhere"
allowed_domains = ["n0where.net"]
start_urls = ["https://n0where.net/"]
def parse(self, response):
for article in response.css('.loop-panel'):
item = NowItem()
item['title'] = article.css('.article-title::text').extract_first()
item['link'] = article.css('.loop-panel>a::attr(href)').extract_first()
item['body'] ='' .join(article.css('.excerpt p::text').extract()).strip()
#item['date'] = article.css('[itemprop="datePublished"]::attr(content)').extract_first()
yield item
def __init__(self):
self.driver = webdriver.Firefox()
def parse2(self, response):
self.driver.get(response.url)
while True: …Run Code Online (Sandbox Code Playgroud)