在 AWS 上从 s3 读取文件到 sagemaker 会出现 403 禁止错误,但其他操作可以处理该文件

bil*_*n44 5 python amazon-s3 amazon-web-services pandas amazon-sagemaker

这个命令:

BUCKET_TO_READ='my-bucket'
FILE_TO_READ='myFile'
data_location = 's3://{}/{}'.format(BUCKET_TO_READ, FILE_TO_READ)
df=pd.read_csv(data_location)
Run Code Online (Sandbox Code Playgroud)

失败了

ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
Run Code Online (Sandbox Code Playgroud)

错误,我无法弄清楚为什么。这应该按照/sf/answers/3517142821/工作

这是我对存储桶的权限:

            "Action": [
                "s3:ListMultipartUploadParts",
                "s3:ListBucket",
                "s3:GetObjectVersionTorrent",
                "s3:GetObjectVersionTagging",
                "s3:GetObjectVersionAcl",
                "s3:GetObjectVersion",
                "s3:GetObjectTorrent",
                "s3:GetObjectTagging",
                "s3:GetObjectAcl",
                "s3:GetObject"
Run Code Online (Sandbox Code Playgroud)

这些命令按预期工作:

role = get_execution_role()
region = boto3.Session().region_name
print(role)
print(region)

s3 = boto3.resource('s3')
bucket = s3.Bucket(BUCKET_TO_READ)
print(bucket.creation_date)

for my_bucket_object in bucket.objects.all():
    print(my_bucket_object)
    FILE_TO_READ = my_bucket_object.key
    break

obj = s3.Object(BUCKET_TO_READ, FILE_TO_READ)
print(obj)

Run Code Online (Sandbox Code Playgroud)

所有这些打印语句都工作得很好。

我不确定这是否重要,但每个文件都在一个文件夹中,所以我的 FILE_TO_READ 看起来像folder/file.

该命令应该将文件下载到 sagemaker 也失败并返回 403:

import boto3
s3 = boto3.resource('s3')
s3.Object(BUCKET_TO_READ, FILE_TO_READ).download_file(FILE_TO_READ)
Run Code Online (Sandbox Code Playgroud)

当我打开终端并使用时也会发生这种情况

aws s3 cp AWSURI local_file_name
Run Code Online (Sandbox Code Playgroud)

bil*_*n44 3

原因是我们授予了存储桶而不是对象的权限。这将是授予"Resource": "arn:aws:s3:::bucket-name/"但不是"Resource": "arn:aws:s3:::bucket-name/*"