4th*_*ace 8 python scrapy jupyter
在Mac上,我安装了Jupyter,当我jupyter notebook
从Scrapy项目的根文件夹中键入时,它会打开笔记本.此时我可以浏览所有项目文件.
如何从笔记本中执行项目?
如果我单击"运行"选项卡,在"终端"下,我会看到:
There are no terminals running.
Run Code Online (Sandbox Code Playgroud)
有两种主要方法可以实现这一目标:
1.在Files选项卡下打开一个新终端:New> Terminal
然后只需运行spider:scrapy crawl [options] <spider>
2.创建一个新笔记本并使用CrawlerProcess
或CrawlerRunner
类在单元格中运行:
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
process = CrawlerProcess(get_project_settings())
process.crawl('your-spider')
process.start() # the script will block here until the crawling is finished
Run Code Online (Sandbox Code Playgroud)
不需要终端来运行 Spyder 类。只需在您的jupyter-notebook
单元格中添加以下代码:
import scrapy
from scrapy.crawler import CrawlerProcess
class MySpider(scrapy.Spider):
# Your spider definition
...
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished
Run Code Online (Sandbox Code Playgroud)
有关更多信息,请参见此处
归档时间: |
|
查看次数: |
8945 次 |
最近记录: |