NGl*_*oom 0 python xpath web-crawler scrapy
我想从新闻网站RSS Feed中提取内容,如下所示
<item>
<title>BPS: Kartu Bansos Bantu Turunkan Angka Gini Ratio</title>
<media:content url="/image.jpg" expression="full" type="image/jpeg"/> </item>
Run Code Online (Sandbox Code Playgroud)
但是引发错误当使用像media.xpath('// media:content')之类的xpath解析信息时使用像media:content这样的 内容
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/parsel/selector.py", line 183, in xpath
six.reraise(ValueError, ValueError(msg), sys.exc_info()[2])
File "/usr/local/lib/python2.7/site-packages/parsel/selector.py", line 179, in xpath
smart_strings=self._lxml_smart_strings)
File "src/lxml/lxml.etree.pyx", line 1587, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:57923)
File "src/lxml/xpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:167084)
File "src/lxml/xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:166043)
ValueError: XPath error: Undefined namespace prefix in //media:content
Run Code Online (Sandbox Code Playgroud)
有人知道我该怎么办?谢谢 :)
您需要先media
通过调用register_namespace(prefix, namespace)
选择器告诉xpath 前缀映射到哪个命名空间,例如:
selector.register_namespace('media', 'http://the.namespace.of/media')
Run Code Online (Sandbox Code Playgroud)
或者如果您只想使用本地名称,您可以使用:
item.xpath("//*[local-name()='content']")
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1788 次 |
最近记录: |