# Creating PySpark Object
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("XMLParser").getOrCreate()
sc=spark.sparkContext
hadoop_conf=sc._jsc.hadoopConfiguration()
hadoop_conf.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoop_conf.set("fs.s3n.awsAccessKeyId", aws_key)
hadoop_conf.set("fs.s3n.awsSecretAccessKey", aws_secret)
Run Code Online (Sandbox Code Playgroud)
然后我可以使用以下代码从我的 s3 存储桶中读取该文件
df = spark.read.format("xml").options(rootTag='returnResult', rowTag="query").load("s3n://bucketName/folder/file.xml")
Run Code Online (Sandbox Code Playgroud)
但是当我尝试使用此代码使用 Delta Lake(镶木地板文件)写回 s3 时
df.write.format("delta").mode('overwrite').save("s3n://bucket/folder/file")
Run Code Online (Sandbox Code Playgroud)
我收到这个错误
Py4JJavaError: An error occurred while calling o778.save.
: java.io.IOException: The error typically occurs when the default LogStore implementation, that
is, HDFSLogStore, is used to write into a Delta table on a non-HDFS storage system.
In order to get the transactional ACID guarantees on table updates, you have …Run Code Online (Sandbox Code Playgroud)