有人能提供一个 AWS Cloudformation AWS::GLUE::WORKFLOW 模板的示例吗?

Tra*_*nan 5 amazon-web-services aws-cloudformation aws-glue

我一直在寻找一个示例,说明如何为包含触发器、作业和爬网程序的胶水工作流设置 Cloudformation,但我找不到太多相关信息。

这是我能从 AWS 找到的唯一信息

{
  "Type" : "AWS::Glue::Workflow",
  "Properties" : {
      "DefaultRunProperties" : Json,
      "Description" : String,
      "Name" : String,
      "Tags" : Json
    }
}
Run Code Online (Sandbox Code Playgroud)

Ant*_*tti 8

下面是一个工作流示例,其中包含一个爬网程序以及爬网程序完成后要运行的一项作业。

它是通过使用WorkflowName标记触发器来定义的。

我相信只能有一个 SCHEDULED 或 ON_DEMAND 触发器来启动工作流程。工作流程中的所有其他触发器都需要在作业/爬网程序上有条件。这可能就是 CloudFormation 知道如何构建 DAG 的方式。

另请参阅如何在DefaultRunProperties中将工作流参数定义为 json 。

---
AWSTemplateFormatVersion: '2010-09-09'

Parameters:
  BaseBucket:
    Description: Bucket used by my workflow jobs
    Type: String

Resources:
  MyWorkflow:
    Type: AWS::Glue::Workflow
    Properties: 
      DefaultRunProperties:
        {
          "workflowParameter1": "Foo",
          "workflowParameter2": "Bar",
          "bucket": { "Fn::Sub": "${BaseBucket}" }
        }
      Description: Workflow for orchestrating my jobs
      Name: MyWorkflowName

  WorkflowCrawler:
    Type: AWS::Glue::Crawler
    Properties:
      Name: MyCrawler
      Role: MyCrawlerRole
      Description: A crawler to run as the first step in the workflow
      DatabaseName: MyDatabase
      Targets:
        S3Targets:
          - Path: !Sub "s3://${BaseBucket}/"

  WorkflowJob:
    Type: AWS::Glue::Job
    Properties:
      Description: Glue job to run after the crawler
      Name: MyWorkflowJob
      Role: MyJobRole
      Command:
        Name: pythonshell
        PythonVersion: 3
        ScriptLocation: !Sub "s3://${BaseBucket}/my_workflow_job_script.py"

  WorkflowStartTrigger:
    Type: AWS::Glue::Trigger
    Properties:
      Name: StartTrigger
      Type: ON_DEMAND
      Description: Trigger for starting the workflow
      Actions:
        - CrawlerName: !Ref WorkflowCrawler
      WorkflowName: !Ref MyWorkflow

  WorkflowJobTrigger:
    Type: AWS::Glue::Trigger
    Properties:
      Name: CrawlerSuccessfulTrigger
      Type: CONDITIONAL
      StartOnCreation: True
      Description: Trigger to start the glue job
      Actions:
        - JobName: !Ref WorkflowJob
      Predicate:
        Conditions:
          - LogicalOperator: EQUALS
            CrawlerName: !Ref WorkflowCrawler
            CrawlState: SUCCEEDED
      WorkflowName: !Ref MyWorkflow
Run Code Online (Sandbox Code Playgroud)