KeyError:SparkConf初始化期间的SPARK_HOME

Dir*_*bar 3 python apache-spark pyspark

我是一个火花新手,我想从命令行运行Python脚本.我已经以交互方式测试了pyspark并且它有效.我在尝试创建sc时遇到此错误:

File "test.py", line 10, in <module>
    conf=(SparkConf().setMaster('local').setAppName('a').setSparkHome('/home/dirk/spark-1.4.1-bin-hadoop2.6/bin'))
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/conf.py", line 104, in __init__
    SparkContext._ensure_initialized()
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/context.py", line 229, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/java_gateway.py", line 48, in launch_gateway
    SPARK_HOME = os.environ["SPARK_HOME"]
  File "/usr/lib/python2.7/UserDict.py", line 23, in __getitem__
    raise KeyError(key)
KeyError: 'SPARK_HOME'
Run Code Online (Sandbox Code Playgroud)

zer*_*323 10

看来这里有两个问题.

第一个是您使用的路径.SPARK_HOME应该指向Spark安装的根目录,所以在你的情况下应该/home/dirk/spark-1.4.1-bin-hadoop2.6不是/home/dirk/spark-1.4.1-bin-hadoop2.6/bin.

第二个问题是你如何使用setSparkHome.如果您检查其目标是docstring

设置工作节点上安装Spark的路径

SparkConf构造函数假定已SPARK_HOME在master上设置.它调用 pyspark.context.SparkContext._ensure_initialized 它调用 pyspark.java_gateway.launch_gateway,它试图acccess SPARK_HOME和失败.

要处理这个问题,您应该SPARK_HOME在创建之前进行设置SparkConf.

import os
os.environ["SPARK_HOME"] = "/home/dirk/spark-1.4.1-bin-hadoop2.6"
conf = (SparkConf().setMaster('local').setAppName('a'))
Run Code Online (Sandbox Code Playgroud)