Koe*_*uck 3 hadoop amazon-s3 distcp s3distcp
I have a huge bucket of S3files that I want to put on HDFS. Given the amount of files involved my preferred solution is to use 'distributed copy'. However for some reason I can't get hadoop distcp to take my Amazon S3 credentials. The command I use is:
hadoop distcp -update s3a://[bucket]/[folder]/[filename] hdfs:///some/path/ -D fs.s3a.awsAccessKeyId=[keyid] -D fs.s3a.awsSecretAccessKey=[secretkey] -D fs.s3a.fast.upload=true
Run Code Online (Sandbox Code Playgroud)
However that acts the same as if the '-D' arguments aren't there.
ERROR tools.DistCp: Exception encountered
java.io.InterruptedIOException: doesBucketExist on [bucket]: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint
Run Code Online (Sandbox Code Playgroud)
I've looked at the hadoop distcp documentation, but can't find a solution there on why this isn't working. I've tried -Dfs.s3n.awsAccessKeyId as a flag which didn't work either. I've read how explicitly passing credentials isn't good practice, so maybe this is just some gentil suggestion to do it some other way?
How is one supposed to pass S3 credentials with distcp? Anyone knows?
自上一版本以来,凭证标志的格式似乎已更改。以下命令有效:
hadoop distcp \
-Dfs.s3a.access.key=[accesskey] \
-Dfs.s3a.secret.key=[secretkey] \
-Dfs.s3a.fast.upload=true \
-update \
s3a://[bucket]/[folder]/[filename] hdfs:///some/path
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
3156 次 |
最近记录: |