我正在使用以下命令尝试将 spark(使用 Ananaconda 3 Jupyter Notebook 的 2.4.4)数据帧写入 Pyspark 中的镶木地板文件,并收到一条我无法解决的非常奇怪的错误消息。我将不胜感激任何见解。
df.write.mode("overwrite").parquet("test/")
Run Code Online (Sandbox Code Playgroud)
错误信息如下:
--------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<ipython-input-37-2b4a1d75a5f6> in <module>()
1 # df.write.partitionBy("AB").parquet("C:/test.parquet",mode='overwrite')
----> 2 df.write.mode("overwrite").parquet("test/")
3 # df.write.mode('SaveMode.Overwrite').parquet("C:/test.parquet")
C:\spark-2.4.4-bin-hadoop2.7\python\pyspark\sql\readwriter.py in parquet(self, path, mode, partitionBy, compression)
841 self.partitionBy(partitionBy)
842 self._set_opts(compression=compression)
--> 843 self._jwrite.parquet(path)
844
845 @since(1.6)
C:\spark-2.4.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py in __call__(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id, self.name)
1258
1259 for temp_arg in temp_args:
C:\spark-2.4.4-bin-hadoop2.7\python\pyspark\sql\utils.py in deco(*a, **kw)
61 def deco(*a, **kw): …Run Code Online (Sandbox Code Playgroud)