Tin*_*esh 1 python pyspark jupyter
我想从Jupyter笔记本运行pySpark.我下载并安装了有Juptyer的Anaconda.我创建了以下几行
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("My App")
sc = SparkContext(conf = conf)
Run Code Online (Sandbox Code Playgroud)
我收到以下错误
ImportError Traceback (most recent call last)
<ipython-input-3-98c83f0bd5ff> in <module>()
----> 1 from pyspark import SparkConf, SparkContext
2 conf = SparkConf().setMaster("local").setAppName("My App")
3 sc = SparkContext(conf = conf)
C:\software\spark\spark-1.6.2-bin-hadoop2.6\python\pyspark\__init__.py in <module>()
39
40 from pyspark.conf import SparkConf
---> 41 from pyspark.context import SparkContext
42 from pyspark.rdd import RDD
43 from pyspark.files import SparkFiles
C:\software\spark\spark-1.6.2-bin-hadoop2.6\python\pyspark\context.py in <module>()
26 from tempfile import NamedTemporaryFile
27
---> 28 from pyspark import accumulators
29 from pyspark.accumulators import Accumulator
30 from pyspark.broadcast import Broadcast
ImportError: cannot import name accumulators
Run Code Online (Sandbox Code Playgroud)
我尝试添加以下环境变量PYTHONPATH,它指向spark/python目录,基于在python shell中导入pyspark的Stackoverflow中的答案
但这没有任何帮助
小智 7
这对我有用:
import os
import sys
spark_path = "D:\spark"
os.environ['SPARK_HOME'] = spark_path
os.environ['HADOOP_HOME'] = spark_path
sys.path.append(spark_path + "/bin")
sys.path.append(spark_path + "/python")
sys.path.append(spark_path + "/python/pyspark/")
sys.path.append(spark_path + "/python/lib")
sys.path.append(spark_path + "/python/lib/pyspark.zip")
sys.path.append(spark_path + "/python/lib/py4j-0.9-src.zip")
from pyspark import SparkContext
from pyspark import SparkConf
sc = SparkContext("local", "test")
Run Code Online (Sandbox Code Playgroud)
核实:
In [2]: sc
Out[2]: <pyspark.context.SparkContext at 0x707ccf8>
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4421 次 |
| 最近记录: |