在步骤函数 AWS 中使用日期变量

tal*_*alR 1 amazon-web-services amazon-emr aws-step-functions

我创建了一个用于创建 EMR 集群的步骤函数,我希望步骤中的日期根据我执行步骤函数的日期进行更改。(如果我今天运行 - 2023 年 6 月 13 日,我希望它在 2023 年 6 月 12 日之前运行)我该怎么做?这是我的代码:

{
  "Comment": "A description of my state machine",
  "StartAt": "EMR CreateCluster",
  "States": {
    "EMR CreateCluster": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
      "Parameters": {
        "Name": "IOretrieve",
        "ServiceRole": "EMR_DefaultRole",
        "JobFlowRole": "EMR_EC2_DefaultRole",
        "ReleaseLabel": "emr-6.8.0",
        "Applications": [
          {
            "Name": "Spark"
          }
        ],
        "LogUri": "s3://",
        "VisibleToAllUsers": true,
        "Instances": {
          "Ec2SubnetId": "subnet",
          "Ec2KeyName": "",
          "EmrManagedMasterSecurityGroup": "",
           "EmrManagedSlaveSecurityGroup": "",
          "KeepJobFlowAliveWhenNoSteps": true,
          "InstanceFleets": [
            {
              "InstanceFleetType": "MASTER",
              "Name": "Master",
              "TargetOnDemandCapacity": 1,
              "InstanceTypeConfigs": [
                {
                  "InstanceType": "m5.xlarge"
                }
              ]
            },
            {
              "InstanceFleetType": "CORE",
              "Name": "CORE",
              "TargetOnDemandCapacity": 5,
              "InstanceTypeConfigs": [
                {
                  "InstanceType": "r5.2xlarge"
                }
              ]
            }
          ]
        },
        "BootstrapActions": [
            {
              "Name": "Custom action",
              "ScriptBootstrapAction": {
                "Path": "s3://",
                "Args": []
              }
            }
          ],
        "Configurations": [
            {
                "Classification": "core-site",
                "Properties": {
                    "fs.s3a.connection.maximum": "1000"
                }
            },
            {
                "Classification": "spark",
                "Properties": {
                    "maximizeResourceAllocation": "true"
                }
            }
        ]
      },
      "ResultPath": "$.cluster",
      "Next": "Run first step"
    },
    "Run first step": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
      "Parameters": {
        "ClusterId.$": "$.cluster.ClusterId",
        "Step": {
          "Name": "My first EMR step",
          "HadoopJarStep": {
            "Jar": "command-runner.jar",
            "Args": [
              "spark-submit",
              "--deploy-mode",
              "client",
              "s3://",
              "--local_run",
              "False",
              "--date_path",
              "year=2023/month=06/day=12/"
            ]
          }
        }
      },
      "ResultPath": "$.firstStep",
      "Next": "Run second step"
    },
    "Run second step": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
      "Parameters": {
        "ClusterId.$": "$.cluster.ClusterId",
        "Step": {
          "Name": "My second EMR step",
          "HadoopJarStep": {
            "Jar": "command-runner.jar",
            "Args": [
              "spark-submit",
              "--deploy-mode",
              "client",
              "s3://",
              "--local_run",
              "False",
              "--date_path",
              "year=2023/month=06/day=12/"
            ]
          }
        }
      },
      "ResultPath": "$.secondStep",
      "Next": "EMR TerminateCluster"
    },
    "EMR TerminateCluster": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:terminateCluster",
      "Parameters": {
        "ClusterId.$": "$.cluster.ClusterId"
      },
      "End": true
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

日期路径是我要更改的:“--date_path”,“year=2023/month=06/day=12/”

Den*_*aub 5

AWS Step Functions提供了一些用于数学运算的简单内部函数States.MathRandom,例如和States.MathAdd

然而,在撰写本文时(2023 年 6 月),更复杂的计算(例如获取前一天的日期)无法开箱即用,需要调用外部进程,即Lambda 函数


话虽如此,您可以按照以下步骤检索并格式化当前日期和时间。

步骤1:

使用以下命令从上下文对象中检索特定步骤的执行时间

$$.State.EnteredTime
Run Code Online (Sandbox Code Playgroud)

这将使用以下格式返回日期和时间:

2019-03-26T20:14:13.192Z
Run Code Online (Sandbox Code Playgroud)

第2步:

使用以下方法将执行时间分割到一个数组中States.StringSplit

States.StringSplit($$.State.EnteredTime, '-,T')
Run Code Online (Sandbox Code Playgroud)

这将返回以下数组:

[
  "2019",
  "03",
  "26",
  "20:14:13.192Z"
]`
Run Code Online (Sandbox Code Playgroud)

步骤3:

States.Format使用数组的前三个元素格式化日期路径字符串:

States.Format('year={}/month={}/day={}', States.ArrayGetItem($.date.splitDate, 0), States.ArrayGetItem($.date.splitDate, 1), States.ArrayGetItem($.date.splitDate, 2))
Run Code Online (Sandbox Code Playgroud)

步骤4:

使用以下命令创建 Args 数组States.Array

States.Array('spark-submit', '--deploy-mode', 'client', 's3://', '--local_run', 'False', '--date_path',$.datePath)
Run Code Online (Sandbox Code Playgroud)

为了向您展示这是如何在状态机上下文中工作的,我添加了一个调用到"Format date path": {...}状态机的附加传递状态,并替换了HadoopJarStep.Args任务状态中的属性 ( "Run second step": {...}):

{
  "Comment": "A description of my state machine",
  "StartAt": "EMR CreateCluster",
  "States": {
    "EMR CreateCluster": {
      ...,
      "Next": "Format date path"
    },
    "Format date path": {
      "Type": "Pass",
      "Parameters": {
        "datePath.$": "States.Format('year={}/month={}/day={}', States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 0), States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 1), States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 2))"
      },
      "Next": "Run second step"
    },
    "Run second step": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
      "Parameters": {
        "ClusterId.$": "$.cluster.ClusterId",
        "Step": {
          "Name": "My second EMR step",
          "ClusterId.$": "$.cluster.ClusterId",
          "HadoopJarStep": {
            "Jar": "command-runner.jar",
            "Args.$": "States.Array('spark-submit', '--deploy-mode', 'client', 's3://', '--local_run', 'False', '--date_path',$.datePath)"
          }
        }
      },
      "ResultPath": "$.secondStep",
      "Next": "EMR TerminateCluster"
    },
    "EMR TerminateCluster": {
      ...
    }
  }
}
Run Code Online (Sandbox Code Playgroud)