我正在尝试在 Jupyter 笔记本中使用以下代码编写一个数据帧以在本地目录上拼花文件:
rdd1 = rdd.coalesce(partitions)
schema1 = StructType([StructField('date', DateType()), StructField('open', FloatType()), StructField('high', FloatType()),
StructField('low', FloatType()),StructField('close', FloatType()),StructField('adj_close', FloatType()),
StructField('volume', FloatType()), StructField('stock', StringType())])
rddDF = spark.createDataFrame(rdd1,schema=schema1)
spark.conf.set("spark.sql.parquet.compression.codec", "gzip")
rddDF.write.parquet("C:/Users/"User"/Documents/File/Output/rddDF")
Run Code Online (Sandbox Code Playgroud)
我收到以下错误:
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<ipython-input-11-7b2aeb627267> in <module>
16
17 #rddDF.to_parquet("C:/Users/Sabihah/Documents/6. Processing Big Data/Output/rddDF")
---> 18 rddDF.write.parquet("C:/Users/Sabihah/Documents/6. Processing Big Data/Output/rddDF")
19 #rddDF.write.format("parquet").save("C:/Users/Sabihah/Documents/6. Processing Big Data/Output/rddDF")
~\anaconda3\lib\site-packages\pyspark\sql\readwriter.py in parquet(self, path, mode, partitionBy, compression)
883 self.partitionBy(partitionBy)
884 self._set_opts(compression=compression)
--> 885 self._jwrite.parquet(path)
886
887 def text(self, path, compression=None, lineSep=None):
~\anaconda3\lib\site-packages\py4j\java_gateway.py in __call__(self, …
Run Code Online (Sandbox Code Playgroud)