Ste*_*ven 6 amazon-web-services terraform aws-glue terraform-provider-aws aws-glue-data-catalog
我对应该如何使用terraform将Athena连接到我的Glue Catalog数据库感到困惑。
我用
resource "aws_glue_catalog_database" "catalog_database" {
name = "${var.glue_db_name}"
}
resource "aws_glue_crawler" "datalake_crawler" {
database_name = "${var.glue_db_name}"
name = "${var.crawler_name}"
role = "${aws_iam_role.crawler_iam_role.name}"
description = "${var.crawler_description}"
table_prefix = "${var.table_prefix}"
schedule = "${var.schedule}"
s3_target {
path = "s3://${var.data_bucket_name[0]}"
}
s3_target {
path = "s3://${var.data_bucket_name[1]}"
}
}
Run Code Online (Sandbox Code Playgroud)
创建一个Glue数据库,然后使用搜寻器来爬行s3存储桶(这里只有两个),但是我不知道如何将Athena查询服务链接到Glue DB。在的terraform文档中Athena,似乎没有一种方法可以将Athena连接到Glue目录,而只能连接到S3存储桶。但是,显然,雅典娜可以与Glue集成在一起。
如何构建Athena数据库以使用我的Glue目录作为数据源而不是S3存储桶?
我们当前让 Glue 抓取一个 S3 存储桶并在 Glue DB 中创建/更新表(然后可以在 Athena 中查询)的基本设置如下所示:
爬虫角色和角色策略:
resource "aws_iam_role" "glue_crawler_role" {
name = "analytics_glue_crawler_role"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "glue.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
resource "aws_iam_role_policy" "glue_crawler_role_policy" {
name = "analytics_glue_crawler_role_policy"
role = "${aws_iam_role.glue_crawler_role.id}"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"glue:*",
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:GetBucketAcl",
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::analytics-product-data",
"arn:aws:s3:::analytics-product-data/*",
]
},
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": [
"arn:aws:logs:*:*:/aws-glue/*"
]
}
]
}
EOF
}
Run Code Online (Sandbox Code Playgroud)
S3 存储桶、胶水数据库和爬虫:
resource "aws_s3_bucket" "product_bucket" {
bucket = "analytics-product-data"
acl = "private"
}
resource "aws_glue_catalog_database" "analytics_db" {
name = "inventory-analytics-db"
}
resource "aws_glue_crawler" "product_crawler" {
database_name = "${aws_glue_catalog_database.analytics_db.name}"
name = "analytics-product-crawler"
role = "${aws_iam_role.glue_crawler_role.arn}"
schedule = "cron(0 0 * * ? *)"
configuration = "{\"Version\": 1.0, \"CrawlerOutput\": { \"Partitions\": { \"AddOrUpdateBehavior\": \"InheritFromTable\" }, \"Tables\": {\"AddOrUpdateBehavior\": \"MergeNewColumns\" } } }"
schema_change_policy {
delete_behavior = "DELETE_FROM_DATABASE"
}
s3_target {
path = "s3://${aws_s3_bucket.product_bucket.bucket}/products"
}
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
765 次 |
| 最近记录: |