Tra*_*nan 5 amazon-web-services aws-cloudformation aws-glue
我一直在寻找一个示例,说明如何为包含触发器、作业和爬网程序的胶水工作流设置 Cloudformation,但我找不到太多相关信息。
这是我能从 AWS 找到的唯一信息
{
"Type" : "AWS::Glue::Workflow",
"Properties" : {
"DefaultRunProperties" : Json,
"Description" : String,
"Name" : String,
"Tags" : Json
}
}
Run Code Online (Sandbox Code Playgroud)
下面是一个工作流示例,其中包含一个爬网程序以及爬网程序完成后要运行的一项作业。
它是通过使用WorkflowName标记触发器来定义的。
我相信只能有一个 SCHEDULED 或 ON_DEMAND 触发器来启动工作流程。工作流程中的所有其他触发器都需要在作业/爬网程序上有条件。这可能就是 CloudFormation 知道如何构建 DAG 的方式。
另请参阅如何在DefaultRunProperties中将工作流参数定义为 json 。
---
AWSTemplateFormatVersion: '2010-09-09'
Parameters:
BaseBucket:
Description: Bucket used by my workflow jobs
Type: String
Resources:
MyWorkflow:
Type: AWS::Glue::Workflow
Properties:
DefaultRunProperties:
{
"workflowParameter1": "Foo",
"workflowParameter2": "Bar",
"bucket": { "Fn::Sub": "${BaseBucket}" }
}
Description: Workflow for orchestrating my jobs
Name: MyWorkflowName
WorkflowCrawler:
Type: AWS::Glue::Crawler
Properties:
Name: MyCrawler
Role: MyCrawlerRole
Description: A crawler to run as the first step in the workflow
DatabaseName: MyDatabase
Targets:
S3Targets:
- Path: !Sub "s3://${BaseBucket}/"
WorkflowJob:
Type: AWS::Glue::Job
Properties:
Description: Glue job to run after the crawler
Name: MyWorkflowJob
Role: MyJobRole
Command:
Name: pythonshell
PythonVersion: 3
ScriptLocation: !Sub "s3://${BaseBucket}/my_workflow_job_script.py"
WorkflowStartTrigger:
Type: AWS::Glue::Trigger
Properties:
Name: StartTrigger
Type: ON_DEMAND
Description: Trigger for starting the workflow
Actions:
- CrawlerName: !Ref WorkflowCrawler
WorkflowName: !Ref MyWorkflow
WorkflowJobTrigger:
Type: AWS::Glue::Trigger
Properties:
Name: CrawlerSuccessfulTrigger
Type: CONDITIONAL
StartOnCreation: True
Description: Trigger to start the glue job
Actions:
- JobName: !Ref WorkflowJob
Predicate:
Conditions:
- LogicalOperator: EQUALS
CrawlerName: !Ref WorkflowCrawler
CrawlState: SUCCEEDED
WorkflowName: !Ref MyWorkflow
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3675 次 |
| 最近记录: |