sil*_*nJa 12 python windows parsing lxml
我在Windows平台上安装了lxml2.2.2(即使用python版本2.6.5).我尝试了这个简单的命令:
from lxml.html import parse
p= parse(‘http://www.google.com’).getroot()
Run Code Online (Sandbox Code Playgroud)
但我收到以下错误:
Traceback (most recent call last):
File “”, line 1, in p=parse(‘http://www.google.com’).getroot()
File “C:\Python26\lib\site-packages\lxml-2.2.2-py2.6-win32.egg\lxml\html_init_.py”, line 661, in parse return etree.parse(filenameorurl, parser, baseurl=baseurl, **kw)
File “lxml.etree.pyx”, line 2698, in lxml.etree.parse (src/lxml/lxml.etree.c:49590)
File “parser.pxi”, line 1491, in lxml.etree.parseDocument (src/lxml/lxml.etree.c:71205) File “parser.pxi”, line 1520, in lxml.etree.parseDocumentFromURL (src/lxml/lxml.etree.c:71488)
File “parser.pxi”, line 1420, in lxml.etree.parseDocFromFile (src/lxml/lxml.etree.c:70583)
File “parser.pxi”, line 975, in lxml.etree.BaseParser.parseDocFrom
File (src/lxml/lxml.etree.c:67736)
File “parser.pxi”, line 539, in lxml.etree.ParserContext.handleParseResultDoc (src/lxml/lxml.etree.c:63820)
File “parser.pxi”, line 625, in lxml.etree.handleParseResult (src/lxml/lxml.etree.c:64741)
File “parser.pxi”, line 563, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64056)
IOError: Error reading file ‘http://www.google.com’: failed to load external entity “http://www.google.com”
Run Code Online (Sandbox Code Playgroud)
我对下一步做什么一无所知,因为我是python的新手.请指导我解决这个错误.提前致谢!!:)
Mat*_*ttH 13
lxml.html.parse 不会提取网址.
以下是使用urllib2的方法:
>>> from urllib2 import urlopen
>>> from lxml.html import parse
>>> page = urlopen('http://www.google.com')
>>> p = parse(page)
>>> p.getroot()
<Element html at 1304050>
Run Code Online (Sandbox Code Playgroud)
更新
史蒂文是对的.lxml.etree.parse应该接受并加载网址.我错过了.我试过删除这个答案,但我不被允许.
我撤回了关于它没有提取URL的声明.
| 归档时间: |
|
| 查看次数: |
9622 次 |
| 最近记录: |