在数据块上安装新版本后，pandas 版本未更新

Question

在数据块上安装新版本后，pandas 版本未更新

use*_*011 5 python python-3.x pandas databricks

当我在数据块上运行 python3.7 代码时，我试图解决熊猫的问题。

错误是：

 ImportError: cannot import name 'roperator' from 'pandas.core.ops' (/databricks/python/lib/python3.7/site-packages/pandas/core/ops.py)

Run Code Online (Sandbox Code Playgroud)

熊猫版本：

pd.__version__
0.24.2

Run Code Online (Sandbox Code Playgroud)

我跑

 from pandas.core.ops import roperator

Run Code Online (Sandbox Code Playgroud)

在我的笔记本电脑上

pandas 0.25.1

Run Code Online (Sandbox Code Playgroud)

所以，我尝试在数据块上升级熊猫。

%sh pip uninstall -y pandas
Successfully uninstalled pandas-1.1.2

%sh pip install pandas==0.25.1
 Collecting pandas==0.25.1
 Downloading pandas-0.25.1-cp37-cp37m-manylinux1_x86_64.whl (10.4 MB)
 Requirement already satisfied: python-dateutil>=2.6.1 in /databricks/conda/envs/databricks-ml/lib/python3.7/site-packages (from pandas==0.25.1) (2.8.0)
 Requirement already satisfied: numpy>=1.13.3 in /databricks/conda/envs/databricks-ml/lib/python3.7/site-packages (from pandas==0.25.1) (1.16.2)
 Requirement already satisfied: pytz>=2017.2 in /databricks/conda/envs/databricks-ml/lib/python3.7/site-packages (from pandas==0.25.1) (2018.9)
 Requirement already satisfied: six>=1.5 in /databricks/conda/envs/databricks-ml/lib/python3.7/site-packages (from python-dateutil>=2.6.1->pandas==0.25.1) (1.12.0)
 Installing collected packages: pandas
 ERROR: After October 2020 you may experience errors when installing or updating packages. 
  This is because pip will change the way that it resolves dependency conflicts.

  We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

  mlflow 1.8.0 requires alembic, which is not installed.
  mlflow 1.8.0 requires prometheus-flask-exporter, which is not installed.
  mlflow 1.8.0 requires sqlalchemy<=1.3.13, which is not installed.
  sklearn-pandas 2.0.1 requires numpy>=1.18.1, but you'll have numpy 1.16.2 which is incompatible.
   sklearn-pandas 2.0.1 requires pandas>=1.0.5, but you'll have pandas 0.25.1 which is incompatible.
   sklearn-pandas 2.0.1 requires scikit-learn>=0.23.0, but you'll have scikit-learn 0.20.3 which is incompatible.
   sklearn-pandas 2.0.1 requires scipy>=1.4.1, but you'll have scipy 1.2.1 which is incompatible.
   Successfully installed pandas-0.25.1

Run Code Online (Sandbox Code Playgroud)

当我运行时：

 import pandas as pd
  pd.__version__

Run Code Online (Sandbox Code Playgroud)

它还是：

 0.24.2

Run Code Online (Sandbox Code Playgroud)

我错过了什么吗？

谢谢

Answer 1

Ale*_*Ott 8

强烈建议通过集群初始化脚本安装库。该%sh命令仅在驱动节点上执行，在执行节点上不执行。而且它也不会影响已经运行的Python实例。

正确的解决方案是使用dbutils.library 命令，如下所示：

dbutils.library.installPyPI("pandas", "1.0.1")
dbutils.library.restartPython()

Run Code Online (Sandbox Code Playgroud)

这会将库安装到所有位置，但需要重新启动 Python 才能获取新库。

另外，虽然可以仅指定包名称，但建议显式指定版本，因为某些库版本可能与运行时不兼容。另外，请考虑使用已更新库版本的较新运行时 - 检查运行时的发行说明以找出开箱即用的安装库版本。

对于较新的 Databricks 运行时，您可以使用新的魔法命令：%pip和%conda来安装依赖项。请参阅文档了解更多详细信息。

归档时间：	5 年，4 月前
查看次数：	1488 次
最近记录：	5 年，4 月前