找不到密钥:_PYSPARK_DRIVER_CALLBACK_HOST

bbo*_*boy 8 python apache-spark pyspark

我正在尝试运行此代码:

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder \
        .master("local") \
        .appName("Word Count") \
        .getOrCreate()

df = spark.createDataFrame([
    (1, 144.5, 5.9, 33, 'M'),
    (2, 167.2, 5.4, 45, 'M'),
    (3, 124.1, 5.2, 23, 'F'),
    (4, 144.5, 5.9, 33, 'M'),
    (5, 133.2, 5.7, 54, 'F'),
    (3, 124.1, 5.2, 23, 'F'),
    (5, 129.2, 5.3, 42, 'M'),
   ], ['id', 'weight', 'height', 'age', 'gender'])

df.show()
print('Count of Rows: {0}'.format(df.count()))
print('Count of distinct Rows: {0}'.format((df.distinct().count())))

spark.stop()
Run Code Online (Sandbox Code Playgroud)

并得到一个错误

18/06/22 11:58:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
java.util.NoSuchElementException: key not found: _PYSPARK_DRIVER_CALLBACK_HOST
    ...
Exception: Java gateway process exited before sending its port number
Run Code Online (Sandbox Code Playgroud)

我正在使用PyCharm和MacOS,Python 3.6,Spark 2.3.1

这个错误的可能原因是什么?

hi-*_*zir 13

此错误是版本不匹配的结果.traceback(_PYSPARK_DRIVER_CALLBACK_HOST)中引用的环境变量在更新Py4j依赖关系到0.10.7期间被删除,并在2.3.1中向后移植到2.3分支.

考虑版本信息:

我正在使用PyCharm和MacOS,Python 3.6,Spark 2.3.1

看起来您安装了2.3.1软件包,但SPARK_HOME指向较旧的(2.3.0或更早版本)安装.


SCO*_*EIL 11

我即将呈现的这个解决方案也会处理"找不到密钥:_PYSPARK_DRIVER_CALLBACK_HOST/Java Gateway/PySpark 2.3.1"错误!添加到您的bashrc或/ etc/environment或/ etc/profile

export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
Run Code Online (Sandbox Code Playgroud)

那应该在那里做doobie.你可以提前感谢我.#竖起大拇指 :)