Google云:使用gsutil将数据从AWS S3下载到GCS

bsm*_*ith 1 google-cloud-storage gsutil

我们的一位合作者已在AWS上提供了一些数据,而我试图使用gsutil将其放入我们的Google云存储桶(只有一些文件对我们有用,所以我不想使用GCS上提供的GUI )。协作者已向我们提供了AWS桶ID,aws访问密钥ID和AWS秘密访问密钥ID。

我浏览了GCE上的文档,并编辑了〜/ .botu文件,以便合并访问密钥。我重新启动终端并尝试执行“ ls”,但出现以下错误:

gsutil ls s3://cccc-ffff-03210/
AccessDeniedException: 403 AccessDenied
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied
Run Code Online (Sandbox Code Playgroud)

我是否还需要配置/运行其他内容?

谢谢!

编辑:

感谢您的答复!

我安装了Cloud SDK,并且可以在我的Google云存储项目上访问和运行所有gsutil命令。我的问题是尝试访问(例如“ ls”命令)与我共享的亚马逊S3。


  1. 我在〜/ .boto文件中取消了两行注释,并放置了访问密钥:


    # To add HMAC aws credentials for "s3://" URIs, edit and uncomment the
    # following two lines:
    aws_access_key_id = my_access_key
    aws_secret_access_key = my_secret_access_key
    
    Run Code Online (Sandbox Code Playgroud)

  1. “ gsutil版本-l”的输出:


    | => gsutil version -l
    
    my_gc_id
    gsutil version: 4.27
    checksum: 5224e55e2df3a2d37eefde57 (OK)
    boto version: 2.47.0
    python version: 2.7.10 (default, Oct 23 2015, 19:19:21) [GCC 4.2.1                                                 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)]
    OS: Darwin 15.4.0
    multiprocessing available: True
    using cloud sdk: True
    pass cloud sdk credentials to gsutil: True
    config path(s): /Users/pc/.boto, /Users/pc/.config/gcloud/legacy_credentials/pc@gmail.com/.boto
    gsutil path: /Users/pc/Documents/programs/google-cloud-        sdk/platform/gsutil/gsutil
    compiled crcmod: True
    installed via package manager: False
    editable install: False
    
    Run Code Online (Sandbox Code Playgroud)

  1. 带-DD选项的输出为:


    => gsutil -DD ls s3://my_amazon_bucket_id
    
    multiprocessing available: True
    using cloud sdk: True
    pass cloud sdk credentials to gsutil: True
    config path(s): /Users/pc/.boto, /Users/pc/.config/gcloud/legacy_credentials/pc@gmail.com/.boto
    gsutil path: /Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gsutil
    compiled crcmod: True
    installed via package manager: False
    editable install: False
    Command being run: /Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=my_gc_id -DD ls s3://my_amazon_bucket_id
    config_file_list: ['/Users/pc/.boto', '/Users/pc/.config/gcloud/legacy_credentials/pc@gmail.com/.boto']
    config: [('debug', '0'), ('working_dir', '/mnt/pyami'), ('https_validate_certificates', 'True'), ('debug', '0'), ('working_dir', '/mnt/pyami'), ('content_language', 'en'), ('default_api_version', '2'), ('default_project_id', 'my_gc_id')]
    DEBUG 1103 08:42:34.664643 provider.py] Using access key found in shared credential file.
    DEBUG 1103 08:42:34.664919 provider.py] Using secret key found in shared credential file.
    DEBUG 1103 08:42:34.665841 connection.py] path=/
    DEBUG 1103 08:42:34.665967 connection.py] auth_path=/my_amazon_bucket_id/
    DEBUG 1103 08:42:34.666115 connection.py] path=/?delimiter=/
    DEBUG 1103 08:42:34.666200 connection.py] auth_path=/my_amazon_bucket_id/?delimiter=/
    DEBUG 1103 08:42:34.666504 connection.py] Method: GET
    DEBUG 1103 08:42:34.666589 connection.py] Path: /?delimiter=/
    DEBUG 1103 08:42:34.666668 connection.py] Data: 
    DEBUG 1103 08:42:34.666724 connection.py] Headers: {}
    DEBUG 1103 08:42:34.666776 connection.py] Host: my_amazon_bucket_id.s3.amazonaws.com
    DEBUG 1103 08:42:34.666831 connection.py] Port: 443
    DEBUG 1103 08:42:34.666882 connection.py] Params: {}
    DEBUG 1103 08:42:34.666975 connection.py] establishing HTTPS connection: host=my_amazon_bucket_id.s3.amazonaws.com, kwargs={'port': 443, 'timeout': 70}
    DEBUG 1103 08:42:34.667128 connection.py] Token: None
    DEBUG 1103 08:42:34.667476 auth.py] StringToSign:
    GET
    
    
    Fri, 03 Nov 2017 12:42:34 GMT
    /my_amazon_bucket_id/
    DEBUG 1103 08:42:34.667600 auth.py] Signature:
    AWS RN8=
    DEBUG 1103 08:42:34.667705 connection.py] Final headers: {'Date': 'Fri, 03 Nov 2017 12:42:34 GMT', 'Content-Length': '0', 'Authorization': u'AWS AK6GJQ:EFVB8F7rtGN8=', 'User-Agent': 'Boto/2.47.0 Python/2.7.10 Darwin/15.4.0 gsutil/4.27 (darwin) google-cloud-sdk/164.0.0'}
    DEBUG 1103 08:42:35.179369 https_connection.py] wrapping ssl socket; CA certificate file=/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/third_party/boto/boto/cacerts/cacerts.txt
    DEBUG 1103 08:42:35.247599 https_connection.py] validating server certificate: hostname=my_amazon_bucket_id.s3.amazonaws.com, certificate hosts=['*.s3.amazonaws.com', 's3.amazonaws.com']
    send: u'GET /?delimiter=/ HTTP/1.1\r\nHost: my_amazon_bucket_id.s3.amazonaws.com\r\nAccept-Encoding: identity\r\nDate: Fri, 03 Nov 2017 12:42:34 GMT\r\nContent-Length: 0\r\nAuthorization: AWS AN8=\r\nUser-Agent: Boto/2.47.0 Python/2.7.10 Darwin/15.4.0 gsutil/4.27 (darwin) google-cloud-sdk/164.0.0\r\n\r\n'
    reply: 'HTTP/1.1 403 Forbidden\r\n'
    header: x-amz-bucket-region: us-east-1
    header: x-amz-request-id: 60A164AAB3971508
    header: x-amz-id-2: +iPxKzrW8MiqDkWZ0E=
    header: Content-Type: application/xml
    header: Transfer-Encoding: chunked
    header: Date: Fri, 03 Nov 2017 12:42:34 GMT
    header: Server: AmazonS3
    DEBUG 1103 08:42:35.326652 connection.py] Response headers: [('date', 'Fri, 03 Nov 2017 12:42:34 GMT'), ('x-amz-id-2', '+iPxKz1dPdgDxpnWZ0E='), ('server', 'AmazonS3'), ('transfer-encoding', 'chunked'), ('x-amz-request-id', '60A164AAB3971508'), ('x-amz-bucket-region', 'us-east-1'), ('content-type', 'application/xml')]
    DEBUG 1103 08:42:35.327029 bucket.py] <?xml version="1.0" encoding="UTF-8"?>
    <Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>6097164508</RequestId><HostId>+iPxKzrWWZ0E=</HostId></Error>
    DEBUG: Exception stack trace:
    Traceback (most recent call last):
      File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 577, in _RunNamedCommandAndHandleExceptions
        collect_analytics=True)
      File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 317, in RunNamedCommand
        return_code = command_inst.RunCommand()
      File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/commands/ls.py", line 548, in RunCommand
        exp_dirs, exp_objs, exp_bytes = ls_helper.ExpandUrlAndPrint(storage_url)
      File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/ls_helper.py", line 180, in ExpandUrlAndPrint
        print_initial_newline=False)
      File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/ls_helper.py", line 252, in _RecurseExpandUrlAndPrint
        bucket_listing_fields=self.bucket_listing_fields):
      File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/wildcard_iterator.py", line 476, in IterAll
        expand_top_level_buckets=expand_top_level_buckets):
      File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/wildcard_iterator.py", line 157, in __iter__
        fields=bucket_listing_fields):
      File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 413, in ListObjects
        self._TranslateExceptionAndRaise(e, bucket_name=bucket_name)
      File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 1471, in _TranslateExceptionAndRaise
        raise translated_exception
    AccessDeniedException: AccessDeniedException: 403 AccessDenied
    
    
    AccessDeniedException: 403 AccessDenied
    
    Run Code Online (Sandbox Code Playgroud)

小智 8

我假设您可以使用gcloud initgcloud auth login或设置gcloud凭据gcloud auth activate-service-account,并且可以成功将对象列出/写入GCS。

从那里,您需要两件事。正确配置的AWS IAM角色将应用于您正在使用的AWS用户,以及正确配置的~/.boto文件。

用于存储桶访问的AWS S3 IAM策略

必须通过授予用户的角色或附加给用户的内联策略来应用这样的策略。

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::some-s3-bucket/*",
                "arn:aws:s3:::some-s3-bucket"
            ]
        }
    ]
}
Run Code Online (Sandbox Code Playgroud)

重要的是您拥有ListBucketGetObject操作,这些操作的资源范围至少包括您希望读取的存储桶(或其前缀)。

.boto文件配置

服务提供商之间的互操作总是有些棘手。在撰写本文时,为了支持AWS Signature V4(所有AWS区域普遍支持的唯一签名),您必须~/.boto在一[s3]组凭证中添加一些额外的属性,而不仅仅是凭证。

[Credentials]
aws_access_key_id = [YOUR AKID]
aws_secret_access_key = [YOUR SECRET AK]
[s3]
use-sigv4=True
host=s3.us-east-2.amazonaws.com
Run Code Online (Sandbox Code Playgroud)

use-sigv4 属性通过gsutil指示Boto使用AWS Signature V4进行请求。当前,不幸的是,这需要在配置中指定主机。找出主机名非常容易,因为它遵循的模式s3.[BUCKET REGION].amazonaws.com

如果您有来自多个S3区域的rsync / cp工作,则可以通过几种方法进行处理。您可以像BOTO_CONFIG运行命令之前一样在多个文件之间进行更改之前设置环境变量。或者,您可以使用顶级参数覆盖每次运行的设置,例如:

gsutil -o s3:host=s3.us-east-2.amazonaws.com ls s3://some-s3-bucket


mho*_*lum 5

1. 生成您的 GCS 凭证

如果您下载Cloud SDK,然后运行gcloud initgcloud auth login, gcloud 应该为您登录的帐户配置 OAuth2 凭据,允许您访问您的 GCS 存储桶(它通过创建一个除了您的~/.boto文件外还加载的 boto 文件来实现这一点,如果存在)。

如果您使用独立的 gsutil,请运行gsutil config以在~/.boto.

2. 将您的 AWS 凭证添加到文件中 ~/.boto

文件的 [Credentials] 部分~/.boto应该填充并取消注释这两行:

aws_access_key_id = IDHERE
aws_secret_access_key = KEYHERE
Run Code Online (Sandbox Code Playgroud)

如果你这样做了:

  • 确保您没有意外交换 key 和 id 的值。
  • 验证您正在加载正确的 boto 文件 - 您可以通过运行gsutil version -l并查找“配置路径:”行来执行此操作。
  • 如果您仍然收到 403,则可能是他们为您提供了错误的存储桶名称,或者对应于无权列出该存储桶内容的帐户的密钥和 ID。