J.H*_*J.H 3 hadoop scala amazon-s3 amazon-emr apache-spark
我有一个在 AWS EMR 上运行的 map-reduce 应用程序,它将一些输出写入不同的(aws 帐户)s3 存储桶。我有权限设置并且作业可以写入外部存储桶,但所有者仍然root来自运行 Hadoop 作业的帐户。我想将其更改为拥有存储桶的外部帐户。
我发现我可以设置fs.s3a.acl.default为bucket-owner-full-control,但这似乎不起作用。这就是我正在做的:
conf.set("fs.s3a.acl.default", "bucket-owner-full-control");
FileSystem fileSystem = FileSystem.get(URI.create(s3Path), conf);
FSDataOutputStream fsDataOutputStream = fileSystem.create(new Path(filePath));
PrintWriter writer = new PrintWriter(fsDataOutputStream);
writer.write(contentAsString);
writer.close();
fsDataOutputStream.close();
Run Code Online (Sandbox Code Playgroud)
任何帮助表示赞赏。
conf.set("fs.s3a.acl.default", "bucket-owner-full-control");
Run Code Online (Sandbox Code Playgroud)
是您设置的正确属性。
因为这是 core-site.xml 中的属性,可以完全控制存储桶所有者。
<property>
<name>fs.s3a.acl.default</name>
<description>Set a canned ACL for newly created and copied objects. Value may be private,
public-read, public-read-write, authenticated-read, log-delivery-write,
bucket-owner-read, or bucket-owner-full-control.</description>
</property>
Run Code Online (Sandbox Code Playgroud)
BucketOwnerFullControl
Specifies that the owner of the bucket is granted Permission.FullControl. The owner of the bucket is not necessarily the same as the owner of the object.
Run Code Online (Sandbox Code Playgroud)
我建议也设置fs.s3.canned.acl为值 BucketOwnerFullControl
对于调试,您可以使用以下代码段来了解实际传递的参数..
for (Entry<String, String> entry: conf) {
System.out.printf("%s=%s\n", entry.getKey(), entry.getValue());
}
Run Code Online (Sandbox Code Playgroud)
出于测试目的,请使用命令行执行此命令
aws s3 cp s3://bucket/source/dummyfile.txt s3://bucket/target/dummyfile.txt --sse --acl bucket-owner-full-control
Run Code Online (Sandbox Code Playgroud)
如果这行得通,那么通过 api 也行。
为了让 Spark 访问 s3 文件系统并设置适当的配置,如下例所示...
val hadoopConf = spark.sparkContext.hadoopConfiguration
hadoopConf.set("fs.s3a.fast.upload","true")
hadoopConf.set("mapreduce.fileoutputcommitter.algorithm.version","2")
hadoopConf.set("fs.s3a.server-side-encryption-algorithm", "AES256")
hadoopConf.set("fs.s3a.canned.acl","BucketOwnerFullControl")
hadoopConf.set("fs.s3a.acl.default","BucketOwnerFullControl")
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2929 次 |
| 最近记录: |