SMP*_*GRP 4 python macos lxml scrapy web-scraping
我无法执行以下操作:
from scrapy.selector import Selector
Run Code Online (Sandbox Code Playgroud)
错误是:
文件"/Desktop/KSL/KSL/spiders/spider.py",第1行,来自scrapy.selector import Selector ImportError:无法导入名称Selector
好像LXML没有安装在我的机器上,但确实如此.另外,我认为这是scrapy内置的默认模块.也许不吧?
思考?
use*_*604 10
请尝试导入HtmlXPathSelector.
from scrapy.selector import HtmlXPathSelector
Run Code Online (Sandbox Code Playgroud)
然后使用.select()方法解析出你的html.例如,
sel = HtmlXPathSelector(response)
site_names = sel.select('//ul/li')
Run Code Online (Sandbox Code Playgroud)
如果您正在关注Scrapy站点(http://doc.scrapy.org/en/latest/intro/tutorial.html)上的教程,则更新后的示例如下所示:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
class DmozSpider(BaseSpider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
sel = HtmlXPathSelector(response)
sites = sel.select('//ul/li')
for site in sites:
title = site.select('a/text()').extract()
link = site.select('a/@href').extract()
desc = site.select('text()').extract()
print title, link, desc
Run Code Online (Sandbox Code Playgroud)
希望这可以帮助!
| 归档时间: |
|
| 查看次数: |
7556 次 |
| 最近记录: |