Python pandas的问题:read_html和python3-lxml安装

wow*_*ers 5 python lxml pandas

我正在尝试运行以下代码,但无济于事.据我所知,没有任何语法错误.

import quandl
import pandas as pd

fifty_states =pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
print(fifty_states)
Run Code Online (Sandbox Code Playgroud)

运行此代码时出现以下错误:

Traceback(最近一次调用最后一次):

文件"C:/ Users/Dave/Documents/Python Files/helloworld.py",第15行,在fiddy_states = pd.read_html(' http://simple.wikipedia.org/wiki/List_of_U.S._states ')

文件"C:\ Python35\lib\site-packages\pandas\io\html.py",第874行,在read_html parse_dates,tupleize_cols,thousands,attrs,encoding)

文件"C:\ Python35\lib\site-packages\pandas\io\html.py",第726行,_parse parser = _parser_dispatch(flav)

文件"C:\ Python35\lib\site-packages\pandas\io\html.py",第685行,在_parser_dispatch中引发ImportError("找不到lxml,请安装它")

ImportError:找不到lxml,请安装它

不太清楚为什么会发生这种情况,因为我(应该)拥有运行此代码所需的所有软件包.我在安装lxml和python3-lxml时遇到问题,因为软件包无法安装.作为备份,我安装了以下内容:

python-dev libxml2-dev libxslt1-dev zlib1g-dev

除了'html5lib',我读过它是lxml的合适替代品.

不知道此时还要做什么,因为搜索类似的更正(即安装lxml)不适用于我(我不能通过命令行上的pip以任何格式安装lxml).

任何帮助深表感谢.

编辑:似乎lxml从未在我的计算机上安装过.这很奇怪,因为我无法通过它安装它pip install lxml.这是我在尝试安装时得到的错误日志:

Collecting lxml
  Using cached lxml-3.6.4.tar.gz
Building wheels for collected packages: lxml
  Running setup.py bdist_wheel for lxml ... error
  Complete output from command c:\python35\python.exe -u -c "import setuptools,
tokenize;__file__='C:\\Users\\Dwang\\AppData\\Local\\Temp\\pip-build-738bf61u\\l
xml\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().rep
lace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d C:\Users\Dwang\AppData\Lo
cal\Temp\tmpm9z4yol6pip-wheel- --python-tag cp35:
  Building lxml version 3.6.4.
  Building without Cython.
  ERROR: b"'xslt-config' is not recognized as an internal or external command,\r
\noperable program or batch file.\r\n"
  ** make sure the development packages of libxml2 and libxslt are installed **

  Using build configuration of libxslt
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-3.5
  creating build\lib.win-amd64-3.5\lxml
  copying src\lxml\builder.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\cssselect.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\doctestcompare.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\ElementInclude.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\pyclasslookup.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\sax.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\usedoctest.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\_elementpath.py -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\__init__.py -> build\lib.win-amd64-3.5\lxml
  creating build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\__init__.py -> build\lib.win-amd64-3.5\lxml\includes

  creating build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\builder.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\clean.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\defs.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\diff.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\ElementSoup.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\formfill.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\html5parser.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\soupparser.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\usedoctest.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\_diffcommand.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\_html5builder.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\_setmixin.py -> build\lib.win-amd64-3.5\lxml\html
  copying src\lxml\html\__init__.py -> build\lib.win-amd64-3.5\lxml\html
  creating build\lib.win-amd64-3.5\lxml\isoschematron
  copying src\lxml\isoschematron\__init__.py -> build\lib.win-amd64-3.5\lxml\iso
schematron
  copying src\lxml\lxml.etree.h -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\lxml.etree_api.h -> build\lib.win-amd64-3.5\lxml
  copying src\lxml\includes\c14n.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\config.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\dtdvalid.pxd -> build\lib.win-amd64-3.5\lxml\include
s
  copying src\lxml\includes\etreepublic.pxd -> build\lib.win-amd64-3.5\lxml\incl
udes
  copying src\lxml\includes\htmlparser.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
  copying src\lxml\includes\relaxng.pxd -> build\lib.win-amd64-3.5\lxml\includes

  copying src\lxml\includes\schematron.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
  copying src\lxml\includes\tree.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\uri.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\xinclude.pxd -> build\lib.win-amd64-3.5\lxml\include
s
  copying src\lxml\includes\xmlerror.pxd -> build\lib.win-amd64-3.5\lxml\include
s
  copying src\lxml\includes\xmlparser.pxd -> build\lib.win-amd64-3.5\lxml\includ
es
  copying src\lxml\includes\xmlschema.pxd -> build\lib.win-amd64-3.5\lxml\includ
es
  copying src\lxml\includes\xpath.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\xslt.pxd -> build\lib.win-amd64-3.5\lxml\includes
  copying src\lxml\includes\etree_defs.h -> build\lib.win-amd64-3.5\lxml\include
s
  copying src\lxml\includes\lxml-version.h -> build\lib.win-amd64-3.5\lxml\inclu
des
  creating build\lib.win-amd64-3.5\lxml\isoschematron\resources
  creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\rng
  copying src\lxml\isoschematron\resources\rng\iso-schematron.rng -> build\lib.w
in-amd64-3.5\lxml\isoschematron\resources\rng
  creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl
  copying src\lxml\isoschematron\resources\xsl\RNG2Schtrn.xsl -> build\lib.win-a
md64-3.5\lxml\isoschematron\resources\xsl
  copying src\lxml\isoschematron\resources\xsl\XSD2Schtrn.xsl -> build\lib.win-a
md64-3.5\lxml\isoschematron\resources\xsl
  creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schematr
on-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_abstract
_expand.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-sche
matron-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_dsdl_inc
lude.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schemat
ron-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schematr
on_message.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-s
chematron-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schematr
on_skeleton_for_xslt1.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resource
s\xsl\iso-schematron-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_svrl_for
_xslt1.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schem
atron-xslt1
  copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\readme.txt -
> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
  running build_ext
  building 'lxml.etree' extension
  error: Unable to find vcvarsall.bat

  ----------------------------------------
  Failed building wheel for lxml
  Running setup.py clean for lxml
Failed to build lxml
Installing collected packages: lxml
  Running setup.py install for lxml ... error
    Complete output from command c:\python35\python.exe -u -c "import setuptools
, tokenize;__file__='C:\\Users\\Dwang\\AppData\\Local\\Temp\\pip-build-738bf61u\
\lxml\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().r
eplace('\r\n', '\n'), __file__, 'exec'))" install --record C:\Users\Dwang\AppDat
a\Local\Temp\pip-4_tf2u3a-record\install-record.txt --single-version-externally-
managed --compile:
    Building lxml version 3.6.4.
    Building without Cython.
    ERROR: b"'xslt-config' is not recognized as an internal or external command,
\r\noperable program or batch file.\r\n"
    ** make sure the development packages of libxml2 and libxslt are installed *
*

    Using build configuration of libxslt
    running install
    running build
    running build_py
    creating build
    creating build\lib.win-amd64-3.5
    creating build\lib.win-amd64-3.5\lxml
    copying src\lxml\builder.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\cssselect.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\doctestcompare.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\ElementInclude.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\pyclasslookup.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\sax.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\usedoctest.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\_elementpath.py -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\__init__.py -> build\lib.win-amd64-3.5\lxml
    creating build\lib.win-amd64-3.5\lxml\includes
    copying src\lxml\includes\__init__.py -> build\lib.win-amd64-3.5\lxml\includ
es
    creating build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\builder.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\clean.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\defs.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\diff.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\ElementSoup.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\formfill.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\html5parser.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\soupparser.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\usedoctest.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\_diffcommand.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\_html5builder.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\_setmixin.py -> build\lib.win-amd64-3.5\lxml\html
    copying src\lxml\html\__init__.py -> build\lib.win-amd64-3.5\lxml\html
    creating build\lib.win-amd64-3.5\lxml\isoschematron
    copying src\lxml\isoschematron\__init__.py -> build\lib.win-amd64-3.5\lxml\i
soschematron
    copying src\lxml\lxml.etree.h -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\lxml.etree_api.h -> build\lib.win-amd64-3.5\lxml
    copying src\lxml\includes\c14n.pxd -> build\lib.win-amd64-3.5\lxml\includes
    copying src\lxml\includes\config.pxd -> build\lib.win-amd64-3.5\lxml\include
s
    copying src\lxml\includes\dtdvalid.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
    copying src\lxml\includes\etreepublic.pxd -> build\lib.win-amd64-3.5\lxml\in
cludes
    copying src\lxml\includes\htmlparser.pxd -> build\lib.win-amd64-3.5\lxml\inc
ludes
    copying src\lxml\includes\relaxng.pxd -> build\lib.win-amd64-3.5\lxml\includ
es
    copying src\lxml\includes\schematron.pxd -> build\lib.win-amd64-3.5\lxml\inc
ludes
    copying src\lxml\includes\tree.pxd -> build\lib.win-amd64-3.5\lxml\includes
    copying src\lxml\includes\uri.pxd -> build\lib.win-amd64-3.5\lxml\includes
    copying src\lxml\includes\xinclude.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
    copying src\lxml\includes\xmlerror.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
    copying src\lxml\includes\xmlparser.pxd -> build\lib.win-amd64-3.5\lxml\incl
udes
    copying src\lxml\includes\xmlschema.pxd -> build\lib.win-amd64-3.5\lxml\incl
udes
    copying src\lxml\includes\xpath.pxd -> build\lib.win-amd64-3.5\lxml\includes

    copying src\lxml\includes\xslt.pxd -> build\lib.win-amd64-3.5\lxml\includes
    copying src\lxml\includes\etree_defs.h -> build\lib.win-amd64-3.5\lxml\inclu
des
    copying src\lxml\includes\lxml-version.h -> build\lib.win-amd64-3.5\lxml\inc
ludes
    creating build\lib.win-amd64-3.5\lxml\isoschematron\resources
    creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\rng
    copying src\lxml\isoschematron\resources\rng\iso-schematron.rng -> build\lib
.win-amd64-3.5\lxml\isoschematron\resources\rng
    creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl
    copying src\lxml\isoschematron\resources\xsl\RNG2Schtrn.xsl -> build\lib.win
-amd64-3.5\lxml\isoschematron\resources\xsl
    copying src\lxml\isoschematron\resources\xsl\XSD2Schtrn.xsl -> build\lib.win
-amd64-3.5\lxml\isoschematron\resources\xsl
    creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schema
tron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_abstra
ct_expand.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-sc
hematron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_dsdl_i
nclude.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schem
atron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schema
tron_message.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso
-schematron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schema
tron_skeleton_for_xslt1.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resour
ces\xsl\iso-schematron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_svrl_f
or_xslt1.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-sch
ematron-xslt1
    copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\readme.txt
 -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schematron-xslt
1
    running build_ext
    building 'lxml.etree' extension
    error: Unable to find vcvarsall.bat

    ----------------------------------------
Command "c:\python35\python.exe -u -c "import setuptools, tokenize;__file__='C:\
\Users\\Dwang\\AppData\\Local\\Temp\\pip-build-738bf61u\\lxml\\setup.py';exec(co
mpile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __
file__, 'exec'))" install --record C:\Users\Dwang\AppData\Local\Temp\pip-4_tf2u3
a-record\install-record.txt --single-version-externally-managed --compile" faile
d with error code 1 in C:\Users\Dwang\AppData\Local\Temp\pip-build-738bf61u\lxml
\
Run Code Online (Sandbox Code Playgroud)

ale*_*cxe 6

根据我的理解,根据文档,如果read_html()没有使用lxml,它应该回归html5lib,但它看起来它不会发生在你的情况下,并抛出错误.

尝试明确说明flavor:

fifty_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states', flavor='html5lib`)
Run Code Online (Sandbox Code Playgroud)