如何在pyspark脚本中访问SparkContext

jav*_*dba 11 python apache-spark pyspark

以下SOF问题如何在Pyspark中运行脚本并在完成后放入IPython shell?讲述如何启动pyspark脚本:

 %run -d myscript.py
Run Code Online (Sandbox Code Playgroud)

但是我们如何访问existsin spark上下文呢?

只是创建一个新的不起作用:

 ---->  sc = SparkContext("local", 1)

 ValueError: Cannot run multiple SparkContexts at once; existing 
 SparkContext(app=PySparkShell, master=local) created by <module> at 
 /Library/Python/2.7/site-packages/IPython/utils/py3compat.py:204
Run Code Online (Sandbox Code Playgroud)

但试图使用现有的...... 那么现有的呢?

In [50]: for s in filter(lambda x: 'SparkContext' in repr(x[1]) and len(repr(x[1])) < 150, locals().iteritems()):
    print s
('SparkContext', <class 'pyspark.context.SparkContext'>)
Run Code Online (Sandbox Code Playgroud)

即SparkContext实例没有变量

Tec*_*ent 41

pyspark.context进口SparkContext

然后调用静态方法SparkContext:

sc = SparkContext.getOrCreate()
Run Code Online (Sandbox Code Playgroud)


Ren*_* B. 9

如果您已经创建了 SparkSession:

spark = SparkSession \
    .builder \
    .appName("StreamKafka_Test") \
    .getOrCreate()
Run Code Online (Sandbox Code Playgroud)

然后您可以像这样访问“现有”SparkContext:

sc = spark.sparkContext
Run Code Online (Sandbox Code Playgroud)