Dir*_*bar 3 python apache-spark pyspark
我是一个火花新手,我想从命令行运行Python脚本.我已经以交互方式测试了pyspark并且它有效.我在尝试创建sc时遇到此错误:
File "test.py", line 10, in <module>
conf=(SparkConf().setMaster('local').setAppName('a').setSparkHome('/home/dirk/spark-1.4.1-bin-hadoop2.6/bin'))
File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/conf.py", line 104, in __init__
SparkContext._ensure_initialized()
File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/context.py", line 229, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/java_gateway.py", line 48, in launch_gateway
SPARK_HOME = os.environ["SPARK_HOME"]
File "/usr/lib/python2.7/UserDict.py", line 23, in __getitem__
raise KeyError(key)
KeyError: 'SPARK_HOME'
Run Code Online (Sandbox Code Playgroud)
zer*_*323 10
看来这里有两个问题.
第一个是您使用的路径.SPARK_HOME应该指向Spark安装的根目录,所以在你的情况下应该/home/dirk/spark-1.4.1-bin-hadoop2.6不是/home/dirk/spark-1.4.1-bin-hadoop2.6/bin.
第二个问题是你如何使用setSparkHome.如果您检查其目标是docstring
设置工作节点上安装Spark的路径
SparkConf构造函数假定已SPARK_HOME在master上设置.它调用 pyspark.context.SparkContext._ensure_initialized 它调用 pyspark.java_gateway.launch_gateway,它试图acccess SPARK_HOME和失败.
要处理这个问题,您应该SPARK_HOME在创建之前进行设置SparkConf.
import os
os.environ["SPARK_HOME"] = "/home/dirk/spark-1.4.1-bin-hadoop2.6"
conf = (SparkConf().setMaster('local').setAppName('a'))
Run Code Online (Sandbox Code Playgroud)