Jac*_*ski 5 apache-spark apache-spark-sql pyspark
我使用 PySpark 2.4.0,当我在 中执行以下代码时pyspark:
$ ./bin/pyspark
Python 2.7.16 (default, Mar 25 2019, 15:07:04)
...
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.0
/_/
Using Python version 2.7.16 (default, Mar 25 2019 15:07:04)
SparkSession available as 'spark'.
>>> from pyspark.sql.functions import pandas_udf
>>> from pyspark.sql.functions import pandas_udf, PandasUDFType
>>> from pyspark.sql.types import IntegerType, StringType
>>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/x/spark/python/pyspark/sql/functions.py", line 2922, in pandas_udf
return _create_udf(f=f, returnType=return_type, evalType=eval_type)
File "/Users/x/spark/python/pyspark/sql/udf.py", line 47, in _create_udf
require_minimum_pyarrow_version()
File "/Users/x/spark/python/pyspark/sql/utils.py", line 149, in require_minimum_pyarrow_version
"it was not found." % minimum_pyarrow_version)
ImportError: PyArrow >= 0.8.0 must be installed; however, it was not found.
Run Code Online (Sandbox Code Playgroud)
如何修复它?
在这种情况下,错误消息具有误导性,pyarrow未安装。
从官方文档Spark SQL Guide(导致安装 PyArrow)中,您应该简单地执行以下命令之一:
$ conda install -c conda-forge pyarrow
Run Code Online (Sandbox Code Playgroud)
或者
$ pip install pyarrow
Run Code Online (Sandbox Code Playgroud)
在正确的用户和 Python 版本下运行它也很重要。即,如果在 root 下使用 Zeppelin 和 Python3,则可能需要执行
# pip3 install pyarrow
Run Code Online (Sandbox Code Playgroud)
反而
| 归档时间: |
|
| 查看次数: |
28688 次 |
| 最近记录: |