Car*_*cas 4 amazon-s3 amazon-web-services apache-spark
我将Spark与EMR 5.5.0结合使用。如果我使用s3://...URL将简单文件写入s3,则可以正常书写。但是,如果我使用一个s3a://...地址,它将失败并显示Service: Amazon S3; Status Code: 403; Error Code: AccessDenied
使用AWS命令行,我可以cp,mv和rm我写入路径中的任何文件。但是从火花开始,s3aput命令失败。
我们启用了服务器端加密,并且我知道spark知道是因为s3URL有效。有任何想法吗?
此处的 PUT DEBUG失败日志。也许需要注意的重要一点是,我正在执行一个操作,rdd.saveAsTextFile(path)但是put命令说它试图写入/my-bucket/tmp/carlos/testWrite/4/_temporary/0/它只能在拼花地板中执行的操作?不知道该细节是否相关,但我想提一下。
s3a is the actively maintained S3 client in Apache Hadoop. AWS forked their own client off from the Apache s3n:// client many years ago & (presumably) have massively reworked theirs.
They can read and write the same data, but some bits of EMR expect extra methods in the filesystem client which only EMR s3 supports...you cannot safely use s3a.
There's also the original ASF s3:// client which is incompatible with everything else, but was the first code used to connect Hadoop with S3, way before EMR was a product from amazon.
哪个更好?截至2017年8月,S3A可能在ORC和Parquet等列格式的主动读取IO上更快。带有emrfs的EMR S3在弹性和一致性方面可能具有优势。但是开源ASF S3A客户端正在努力解决这些问题
| 归档时间: |
|
| 查看次数: |
4018 次 |
| 最近记录: |