sil*_*ger 2 amazon-web-services aws-cloudformation amazon-athena
我有一个使用 cloudformation 中指定的模板定义的 Athena 表:
云信息创建
EventsTable:
Type: AWS::Glue::Table
Properties:
CatalogId: !Ref AWS::AccountId
DatabaseName: !Ref DatabaseName
TableInput:
Description: "My Table"
Name: !Ref TableName
TableType: EXTERNAL_TABLE
StorageDescriptor:
Compressed: True
Columns:
- Name: account_id
Type: string
Comment: "Account Id of the account making the request"
...
InputFormat: org.apache.hadoop.mapred.TextInputFormat
SerdeInfo:
SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
OutputFormat: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
Location: !Sub "s3://${EventsBucketName}/events/"
Run Code Online (Sandbox Code Playgroud)
这运行良好并且可以部署。我还发现我可以根据此文档和此文档创建分区投影
并且可以通过直接创建表来实现这一点,大致如下:
SQL创建
CREATE EXTERNAL TABLE `performance_data.events`
(
`account_id` string,
...
)
PARTITIONED BY (
`day` string)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION
's3://my-bucket/events/'
TBLPROPERTIES (
'has_encrypted_data' = 'false',
'projection.enabled' = 'true',
'projection.day.type' = 'date',
'projection.day.format' = 'yyyy/MM/dd',
'projection.day.range' = '2020/01/01,NOW',
'projection.day.interval' = '1',
'projection.day.interval.unit' = 'DAYS',
'storage.location.template' = 's3://my-bucket/events/${day}/'
)
Run Code Online (Sandbox Code Playgroud)
但我找不到转换为云形成结构的文档。所以我的问题是,如何实现cloudformation中SQL代码所示的分区投影?
我现在有一个可行的解决方案。缺少的部分实际上是缺少参数,解决方案如下:
MyTableResource:
Type: AWS::Glue::Table
Properties:
CatalogId: MyAccountId
DatabaseName: MyDatabase
TableInput:
Description: "My Table"
Name: mytable
TableType: EXTERNAL_TABLE
PartitionKeys:
- Name: day
Type: string
Comment: Day partition
Parameters:
"projection.enabled": "true"
"projection.day.type": "date"
"projection.day.format": "yyyy/MM/dd"
"projection.day.range": "2020/01/01,NOW"
"projection.day.interval": "1"
"projection.day.interval.unit": "DAYS"
"storage.location.template": "s3://my-bucket/events/${day}/"
StorageDescriptor:
Compressed: True
Columns:
...
InputFormat: org.apache.hadoop.mapred.TextInputFormat
SerdeInfo:
Parameters:
serialization.format: '1'
SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
OutputFormat: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
Location: "s3://my-bucket/events/"
Run Code Online (Sandbox Code Playgroud)
关键的补充是:
serialization.format: '1'
Run Code Online (Sandbox Code Playgroud)
现在这完全可以工作,并且可以使用分区进行查询:
serialization.format: '1'
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1565 次 |
| 最近记录: |