没有名为“pyarrow._orc”的模块

rwi*_*atr 6 python anaconda conda pyarrow

我在 Windows 10 上的 Anaconda 中使用 pyarrow.orc 模块时遇到问题。

import pyarrow.orc as orc
Run Code Online (Sandbox Code Playgroud)

抛出异常:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\apps\Anaconda3\envs\ws\lib\site-packages\pyarrow\orc.py", line 23, in <module>
    import pyarrow._orc as _orc
ModuleNotFoundError: No module named 'pyarrow._orc'
Run Code Online (Sandbox Code Playgroud)

另一方面: import pyarrow 工作没有任何问题。

conda list
# packages in environment at C:\apps\Anaconda3\envs\ws:
#
# Name                    Version                   Build  Channel
arrow-cpp                 0.13.0           py37h49ee12d_0
...
numpy                     1.17.3           py37h4ceb530_0
numpy-base                1.17.3           py37hc3f5095_0
...
pip                       19.3.1                   py37_0
pyarrow                   0.13.0           py37ha925a31_0
...
python                    3.7.5                h8c8aaf0_0
...
Run Code Online (Sandbox Code Playgroud)

我已经尝试过其他版本的 pyarrow,结果相同。

conda -V
conda 4.7.12
Run Code Online (Sandbox Code Playgroud)

Wes*_*ney 6

ORC 阅读器在 Windows 上根本不受支持,据我所知也从未如此。目前尚不清楚 C++ 中的 Apache ORC 是否可以使用 Visual Studio C++ 编译器构建。

  • 发布可见性:由于链接问题,`pyarrow.orc` 支持在 `pip` 轮子中被禁用,并且有一个票证 ([ARROW-7811](https://issues.apache.org/jira/browse/ARROW-7811) )寻求社区帮助进行修复。安装 0.15.0 之前的版本或通过 conda 将作为 MacOS/Linux 的解决方法 (2认同)

Phy*_*hy6 5

前面的底线是,\n我也遇到了同样的错误。这对我来说是解决方案:

\n\n
!pip install pyarrow==0.13.0\n
Run Code Online (Sandbox Code Playgroud)\n\n

我不确定这是否仅限于 Windows 10,最近几天我在 AWS Sagemaker 中遇到了同样的错误。这之前在之前的 Sagemaker 实例上运行良好。

\n\n

使用 Jupyter 中的 Conda Packages 菜单,conda_python3 内核显示它已从https://repo.anaconda.com/pkgs/main/linux-64安装了 pyarrow 0.13.0 ,构建为 py36he6710b0_0。

\n\n

然而随后的电话

\n\n
!conda -list\n
Run Code Online (Sandbox Code Playgroud)\n\n

即使重新启动内核后,也未将 pyarrow 显示为 Jupyter conda_python3 内核中的内容。

\n\n

通常在 Sagemaker [Jupyter Notebook] 实例中,我会使用 !pip 命令,因为它们似乎工作得更好,并且没有我有时在 Conda Packages 菜单中发现的超时错误。(而且我不需要担心通过-y标志,安装就会发生)

\n\n

正常!pip install pyarrow工作,但我注意到它从 2019 年 11 月 1 日开始安装 pyarrow 0.15.1

\n\n

也许该版本在加载 _orc 包或其他一些冲突库时出现错误。

\n\n

我的直觉是 pyarrow 0.13.0 和 pyarrow 0.15.1 的 conda 版本有问题。

\n\n

在 Jupyter 单元中我尝试了以下方法:

\n\n
!pip uninstall pyarrow -y\n!pip install pyarrow\nfrom pyarrow import orc\n
Run Code Online (Sandbox Code Playgroud)\n\n

输出:

\n\n
Uninstalling pyarrow-0.15.1:\n  Successfully uninstalled pyarrow-0.15.1\nCollecting pyarrow\n  Downloading https://files.pythonhosted.org/packages/6c/32/ce1926f05679ea5448fd3b98fbd9419d8c7a65f87d1a12ee5fb9577e3a8e/pyarrow-0.15.1-cp36-cp36m-manylinux2010_x86_64.whl (59.2MB)\n     |\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 59.2MB 381kB/s  eta 0:00:01\nRequirement already satisfied: numpy>=1.14 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from pyarrow) (1.14.3)\nRequirement already satisfied: six>=1.0.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from pyarrow) (1.11.0)\nInstalling collected packages: pyarrow\nSuccessfully installed pyarrow-0.15.1\n---------------------------------------------------------------------------\nModuleNotFoundError                       Traceback (most recent call last)\n<ipython-input-6-36378dee5a25> in <module>()\n      1 get_ipython().system(\'pip uninstall pyarrow -y\')\n      2 get_ipython().system(\'pip install pyarrow\')\n----> 3 from pyarrow import orc\n\n~/anaconda3/envs/python3/lib/python3.6/site-packages/pyarrow/orc.py in <module>()\n     23 from pyarrow import types\n     24 from pyarrow.lib import Schema\n---> 25 import pyarrow._orc as _orc\n     26 \n     27 \n\nModuleNotFoundError: No module named \'pyarrow._orc\'\n
Run Code Online (Sandbox Code Playgroud)\n\n

请注意,当您尝试卸载 pyarrow 0.15.1 并安装特定的旧版本(例如 0.13.0)时,您应该在卸载后重新启动内核。有一些不兼容的二进制文件被遗留下来。\n 我没有发布该输出,因为它太长了。

\n\n
pip uninstall pyarrow -y\n
Run Code Online (Sandbox Code Playgroud)\n\n

重新启动内核,然后:

\n\n
!pip install pyarrow==0.13.0\nfrom pyarrow import orc\n
Run Code Online (Sandbox Code Playgroud)\n\n

输出:

\n\n
Collecting pyarrow==0.13.0\n  Using cached https://files.pythonhosted.org/packages/ad/25/094b122d828d24b58202712a74e661e36cd551ca62d331e388ff68bae91d/pyarrow-0.13.0-cp36-cp36m-manylinux1_x86_64.whl\nRequirement already satisfied: numpy>=1.14 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from pyarrow==0.13.0) (1.14.3)\nRequirement already satisfied: six>=1.0.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from pyarrow==0.13.0) (1.11.0)\nInstalling collected packages: pyarrow\nSuccessfully installed pyarrow-0.13.0\n
Run Code Online (Sandbox Code Playgroud)\n\n

现在导入命令没有错误,并且可以再次读取orc文件。

\n