aid*_*ald 1 amazon-s3 amazon-web-services aws-cloudformation amazon-athena
根据文档AWS :: Athena :: NamedQuery,尚不清楚如何将Athena附加到同一堆栈中指定的S3存储桶。
如果我必须从示例中进行猜测,我可以想象您可以编写一个模板,例如
Resources:
MyS3Bucket:
Type: AWS::S3::Bucket
... other params ...
AthenaNamedQuery:
Type: AWS::Athena::NamedQuery
Properties:
Database: "db_name"
Name: "MostExpensiveWorkflow"
QueryString: >
CREATE EXTERNAL TABLE db_name.test_table
(...) LOCATION s3://.../path/to/folder/
Run Code Online (Sandbox Code Playgroud)
像上面的模板可以工作吗?创建堆栈后,该表db_name.test_table可用于运行查询吗?
Turns out the way you connect the S3 and Athena is to make a Glue table! How silly of me!! Of course Glue is how you connect things!
Sarcasm aside, this is a template that worked for me when using AWS::Glue::Table and AWS::Glue::Database,
Resources:
MyS3Bucket:
Type: AWS::S3::Bucket
MyGlueDatabase:
Type: AWS::Glue::Database
Properties:
DatabaseInput:
Name: my-glue-database
Description: "Glue beats tape"
CatalogId: !Ref AWS::AccountId
MyGlueTable:
Type: AWS::Glue::Table
Properties:
DatabaseName: !Ref MyGlueDatabase
CatalogId: !Ref AWS::AccountId
TableInput:
Name: my-glue-table
Parameters: { "classification" : "csv" }
StorageDescriptor:
Location:
Fn::Sub: "s3://${MyS3Bucket}/"
InputFormat: "org.apache.hadoop.mapred.TextInputFormat"
OutputFormat: "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
SerdeInfo:
Parameters: { "separatorChar" : "," }
SerializationLibrary: "org.apache.hadoop.hive.serde2.OpenCSVSerde"
StoredAsSubDirectories: false
Columns:
- Name: column0
Type: string
- Name: column1
Type: string
Run Code Online (Sandbox Code Playgroud)
After this, the database and table were in the AWS Athena Console!
| 归档时间: |
|
| 查看次数: |
930 次 |
| 最近记录: |