Kit*_*ers 2 amazon-s3 amazon-web-services aws-glue
我刚刚设置了一个 AWS Glue 爬网程序来爬网 S3 存储桶。我已为爬网程序设置了 IAM 角色,并将托管策略“AWSGlueServiceRole”和“AmazonS3FullAccess”附加到该角色。我已确保爬网程序正在使用该角色。但是,每次运行爬网程序时,我都会在日志中收到类似以下内容的错误消息:
ERROR : Error Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: <omitted>; S3 Extended Request ID: <omitted>) retrieving file at s3://my-bucket/snapshots/snapshot-1/mydb/mydb.mytable/11/part-00000-ffffffff-ffff-ffff-ffff-ffffffffffff-c000.gz.parquet. Tables created did not infer schemas from this file.
我已确认执行角色附加了“AmazonS3ReadOnlyAccess”的 Lambda 能够访问该存储桶。我究竟做错了什么?
编辑:设置“阻止所有公共访问”或禁用它没有明显的效果。
EDIT2:IAM 角色的托管策略文档如下。没有内联策略。
AWSGlueService角色:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"glue:*",
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:ListAllMyBuckets",
"s3:GetBucketAcl",
"ec2:DescribeVpcEndpoints",
"ec2:DescribeRouteTables",
"ec2:CreateNetworkInterface",
"ec2:DeleteNetworkInterface",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcAttribute",
"iam:ListRolePolicies",
"iam:GetRole",
"iam:GetRolePolicy",
"cloudwatch:PutMetricData"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:CreateBucket"
],
"Resource": [
"arn:aws:s3:::aws-glue-*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::aws-glue-*/*",
"arn:aws:s3:::*/*aws-glue-*/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::crawler-public*",
"arn:aws:s3:::aws-glue-*"
]
},
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": [
"arn:aws:logs:*:*:/aws-glue/*"
]
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateTags",
"ec2:DeleteTags"
],
"Condition": {
"ForAllValues:StringEquals": {
"aws:TagKeys": [
"aws-glue-service-resource"
]
}
},
"Resource": [
"arn:aws:ec2:*:*:network-interface/*",
"arn:aws:ec2:*:*:security-group/*",
"arn:aws:ec2:*:*:instance/*"
]
}
]
}
Run Code Online (Sandbox Code Playgroud)
AmazonS3FullAccess:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": "*"
}
]
}
Run Code Online (Sandbox Code Playgroud)
事实证明问题出在 KMS 上。该存储桶包含 Aurora RDS 快照的导出,并且该快照显然是加密写入的。因此,一旦我添加了以下策略,我就确定了:
{
"Version": "2012-10-17",
"Statement": {
"Effect": "Allow",
"Action": [
"kms:Decrypt"
],
"Resource": [
"arn:aws:kms:<region>:<my account id>:key/<my key id>"
]
}
}
Run Code Online (Sandbox Code Playgroud)
这是我附加到该角色的整个托管策略(请注意,该角色也已AWSGlueServiceRole附加):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::my-bucket/snapshots*"
]
},
{
"Effect": "Allow",
"Action": [
"kms:Decrypt"
],
"Resource": [
"arn:aws:kms:<region>:<my account id>:key/<my key id>"
]
}
]
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4295 次 |
| 最近记录: |