hit*_*_hk 2 python networking machine-learning pyspark jupyter-notebook
当我创建一个 spark 会话时,它抛出了一个错误
无法创建 Spark 会话
使用pyspark,代码片段:
ValueError Traceback (most recent call last)
<ipython-input-13-2262882856df> in <module>()
37 if __name__ == "__main__":
38 conf = SparkConf()
---> 39 sc = SparkContext(conf=conf)
40 # print(sc.version)
41 # sc = SparkContext(conf=conf)
~/anaconda3/lib/python3.5/site-packages/pyspark/context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
131 " note this option will be removed in Spark 3.0")
132
--> 133 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
134 try:
135 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
~/anaconda3/lib/python3.5/site-packages/pyspark/context.py in _ensure_initialized(cls, instance, gateway, conf)
330 " created by %s at %s:%s "
331 % (currentAppName, currentMaster,
--> 332 callsite.function, callsite.file, callsite.linenum))
333 else:
334 SparkContext._active_spark_context = instance
ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=pyspark-shell, master=local[*]) created by __init__ at <ipython-input-7-edf43bdce70a>:33
Run Code Online (Sandbox Code Playgroud)
from pyspark import SparkConf, SparkContext
Run Code Online (Sandbox Code Playgroud)
spark = SparkSession(sc).builder.appName("Detecting-Malicious-URL App").getOrCreate()
Run Code Online (Sandbox Code Playgroud)
这引发了另一个错误,如下所示:
NameError: name 'SparkSession' is not defined
Run Code Online (Sandbox Code Playgroud)
尝试这个 -
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Detecting-Malicious-URL App").getOrCreate()
Run Code Online (Sandbox Code Playgroud)
在 spark 2.0 之前,我们必须创建一个 SparkConf 和 SparkContext 来与 Spark 交互。
而在 Spark 2.0 中,SparkSession 是 Spark SQL 的入口点。现在我们不需要创建 SparkConf、SparkContext 或 SQLContext,因为它们被封装在 SparkSession 中。
有关更多详细信息,请参阅此博客:如何在 Apache Spark 2.0 中使用 SparkSession