Mar*_*333 16 python lxml beautifulsoup easy-install
我在mac 10.7.5上使用python 2,7.5,beautifulsoup 4.2.1.我将使用lxml库解析xml页面,如beautifulsoup教程中所述.但是,当我运行我的代码时,它会显示出来
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested:
lxml,xml. Do you need to install a parser library?
Run Code Online (Sandbox Code Playgroud)
我确信我已经通过所有方法安装了lxml:easy_install,pip,port等.我试着在我的代码中添加一行,看看是否安装了lxml:
import lxml
Run Code Online (Sandbox Code Playgroud)
然后python可以成功浏览此代码并再次显示上一条错误消息,发生在同一行.
所以我很确定已经安装了lxml,但没有正确安装.所以我决定卸载lxml,然后使用'正确'方法重新安装.但是当我输入时
easy_install -m lxml
Run Code Online (Sandbox Code Playgroud)
表明:
Searching for lxml
Best match: lxml 3.2.1
Processing lxml-3.2.1-py2.7-macosx-10.6-intel.egg
Using /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/lxml-
3.2.1-py2.7-macosx-10.6-intel.egg
Because this distribution was installed --multi-version, before you can
import modules from this package in an application, you will need to
'import pkg_resources' and then use a 'require()' call similar to one of
these examples, in order to select the desired version:
pkg_resources.require("lxml") # latest installed version
pkg_resources.require("lxml==3.2.1") # this exact version
pkg_resources.require("lxml>=3.2.1") # this version or higher
Processing dependencies for lxml
Finished processing dependencies for lxml
Run Code Online (Sandbox Code Playgroud)
所以我不知道如何继续卸载...
我在google上查了很多关于这个问题的帖子但是我找不到任何有用的信息.
这是我的代码:
import mechanize
from bs4 import BeautifulSoup
import lxml
class count:
def __init__(self,protein):
self.proteinCode = protein
self.br = mechanize.Browser()
def first_search(self):
#Test 0
soup = BeautifulSoup(self.br.open("http://www.ncbi.nlm.nih.gov/protein/21225921?report=genbank&log$=prottop&blast_rank=1&RID=YGJHMSET015"), ['lxml','xml'])
return
if __name__=='__main__':
proteinCode = sys.argv[1]
gogogo = count(proteinCode)
Run Code Online (Sandbox Code Playgroud)
我想知道:
osa*_*osa 13
我使用的是BeautifulSoup 4.3.2和OS X 10.6.8.我也有一个安装不当的问题lxml.以下是我发现的一些事情:
首先,检查一下这个相关的问题:删除了MacPorts,现在Python已经坏了
现在,为了检查BeautifulSoup 4的安装程序,请尝试
>>> import bs4
>>> bs4.builder.builder_registry.builders
Run Code Online (Sandbox Code Playgroud)
如果您没有看到自己喜欢的构建器,则表明它未安装,您将看到上述错误("找不到树构建器......").
而且,仅仅因为你可以import lxml,并不意味着一切都是完美的.
尝试
>>> import lxml
>>> import lxml.etree
Run Code Online (Sandbox Code Playgroud)
要了解发生了什么,请转到bs4安装并打开鸡蛋(tar -xvzf).注意模块bs4.builder.在里面你应该看到像_lxml.py和的文件_html5lib.py.所以你也可以试试
>>> import bs4.builder.htmlparser
>>> import bs4.builder._lxml
>>> import bs4.builder._html5lib
Run Code Online (Sandbox Code Playgroud)
如果出现问题,您将看到为什么无法加载parricular模块.你可以注意到builder/__init__.py它最后是如何加载所有这些模块并忽略未加载的任何模块:
# Builders are registered in reverse order of priority, so that custom
# builder registrations will take precedence. In general, we want lxml
# to take precedence over html5lib, because it's faster. And we only
# want to use HTMLParser as a last result.
from . import _htmlparser
register_treebuilders_from(_htmlparser)
try:
from . import _html5lib
register_treebuilders_from(_html5lib)
except ImportError:
# They don't have html5lib installed.
pass
try:
from . import _lxml
register_treebuilders_from(_lxml)
except ImportError:
# They don't have lxml installed.
pass
Run Code Online (Sandbox Code Playgroud)
如果您在Ubuntu / Debian中使用Python2.7,则对我有用:
$ sudo apt-get build-dep python-lxml
$ sudo pip install lxml
Run Code Online (Sandbox Code Playgroud)
像这样测试:
mona@pascal:~/computer_vision/image_retrieval$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml
Run Code Online (Sandbox Code Playgroud)
FWIW,我遇到了类似的问题(python 3.6,os x 10.12.6)并且能够简单地通过执行来解决它(第一个命令只是表示我在 conda virtualenv 中工作):
$ source activate ml-general
$ pip uninstall lxml
$ pip install lxml
Run Code Online (Sandbox Code Playgroud)
我首先尝试了更复杂的事情,因为 BeautifulSoup 可以通过 Jupyter+iPython 使用相同的命令正常工作,但不能通过 PyCharm 的终端在同一个 virtualenv 中正常工作。只需按照上述方法重新安装 lxml 即可解决问题。
| 归档时间: |
|
| 查看次数: |
32005 次 |
| 最近记录: |