我是 Spark 新手。我正在尝试创建 Spark 会话pyspark.sql以加载 .csv 文件。但是,每次我尝试执行第二行(如下所示)时,该命令都会继续执行几个小时并且似乎永远不会生成代码的其他行。代码如下:
from pyspark.sql import SparkSession
sp = SparkSession.builder.appName("solution").config("spark.some.config.option", "some-value").getOrCreate()
df = sp.read.csv('walmart_stock.csv', header= True, inferSchema= True)
df.columns
Run Code Online (Sandbox Code Playgroud)
另外,如果我等待很长时间后杀死内核,则会出现以下异常:
<ipython-input-23-16c3797ce83f> in <module>
----> 1 sp = SparkSession.builder.appName("solution").config("spark.some.config.option", "some-value").getOrCreate()
~\anaconda3\lib\site-packages\pyspark\sql\session.py in getOrCreate(self)
184 sparkConf.set(key, value)
185 # This SparkContext may be an existing one.
--> 186 sc = SparkContext.getOrCreate(sparkConf)
187 # Do not update `SparkConf` for existing `SparkContext`, as it's shared
188 # by all sessions.
~\anaconda3\lib\site-packages\pyspark\context.py in getOrCreate(cls, conf)
369 with SparkContext._lock:
370 …Run Code Online (Sandbox Code Playgroud)