如何在Jupyter中运行Scrapy项目?

4th*_*ace 8 python scrapy jupyter

在Mac上,我安装了Jupyter,当我jupyter notebook从Scrapy项目的根文件夹中键入时,它会打开笔记本.此时我可以浏览所有项目文件.

如何从笔记本中执行项目?

如果我单击"运行"选项卡,在"终端"下,我会看到:

There are no terminals running.
Run Code Online (Sandbox Code Playgroud)

Pau*_*ira 7

有两种主要方法可以实现这一目标:

1.在Files选项卡下打开一个新终端:New> Terminal
然后只需运行spider:scrapy crawl [options] <spider>

2.创建一个新笔记本并使用CrawlerProcessCrawlerRunner类在单元格中运行:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

process = CrawlerProcess(get_project_settings())

process.crawl('your-spider')
process.start() # the script will block here until the crawling is finished
Run Code Online (Sandbox Code Playgroud)

Scrapy docs - 从脚本运行Scrapy


sus*_*097 7

不需要终端来运行 Spyder 类。只需在您的jupyter-notebook单元格中添加以下代码:

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    # Your spider definition
    ...

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished
Run Code Online (Sandbox Code Playgroud)

有关更多信息,请参见此处