配置启动模板时,AWS Batch 作业卡在 RUNNABLE 状态

Geo*_*ani 5 templates launch amazon-web-services aws-step-functions aws-batch

我已使用AWS Batch Jobs配置了 Step Function 。所有配置都运行良好,但我需要自定义启动实例。为此,我使用启动模板服务并根据AWS Batch配置中使用的实例类型构建简单(空)配置。当使用Launch Template构建计算环境时,批处理作业卡在RUNNABLE阶段。当我在没有启动模板的情况下运行AWS Batch Job时,一切正常。午餐实例表单模板也可以正常工作。谁能给我任何错误或遗漏的建议?以下是整个堆栈元素的定义。

启动模板定义 在此输入图像描述

计算环境详细信息概述

Compute environment name senet-cluster-r5ad-2xlarge-v3-4
Compute environment ARN arn:aws:batch:eu-central-1:xxxxxxxxxxx:compute-environment/senet-cluster-r5ad-2xlarge-v3-4
ECS Cluster name arn:aws:ecs:eu-central-1:xxxxxxxxxxxx:cluster/senet-cluster-r5ad-2xlarge-v3-4_Batch_3323aafe-d7a4-3cfe-91e5-c1079ee9d02e
Type MANAGED
Status VALID
State ENABLED
Service role arn:aws:iam::xxxxxxxxxxx:role/service-role/AWSBatchServiceRole
Compute resources
Minimum vCPUs 0
Desired vCPUs 0
Maximum vCPUs 25
Instance types r5ad.2xlarge
Allocation strategy BEST_FIT
Launch template lt-023ebdcd5df6073df
Launch template version $Default
Instance rolearn:aws:iam::xxxxxxxxxxx:instance-profile/ecsInstanceRole
Spot fleet role
EC2 Keypair senet-test-keys
AMI id ami-0b418580298265d5c
vpcId vpc-0917ea63
Subnets subnet-49332034, subnet-8902a7e3, subnet-9de503d1
Security groups sg-cdbbd9af, sg-047ea19daf36aa269
Run Code Online (Sandbox Code Playgroud)

AWS 批处理作业定义

{
    "jobDefinitionName": "senet-cluster-job-def-3",
    "jobDefinitionArn": "arn:aws:batch:eu-central-1:xxxxxxxxxxxxxx:job-definition/senet-cluster-job-def-3:9",
    "revision": 9,
    "status": "ACTIVE",
    "type": "container",
    "parameters": {},
    "containerProperties": {
        "image": "xxxxxxxxxxx.dkr.ecr.eu-central-1.amazonaws.com/senet/batch-process:latest",
        "vcpus": 4,
        "memory": 60000,
        "command": [],
        "jobRoleArn": "arn:aws:iam::xxxxxxxxxxxxx:role/AWSS3BatchFullAccess-senet",
        "volumes": [],
        "environment": [
            {
                "name": "BATCH_FILE_S3_URL",
                "value": "s3://senet-batch/senet_jobs.sh"
            },
            {
                "name": "AWS_DEFAULT_REGION",
                "value": "eu-central-1"
            },
            {
                "name": "BATCH_FILE_TYPE",
                "value": "script"
            }
        ],
        "mountPoints": [],
        "ulimits": [],
        "user": "root",
        "resourceRequirements": [],
        "linuxParameters": {
            "devices": []
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

小智 1

对于那些遇到同样问题的人。这是适合我的解决方案。我花了几天时间才弄清楚。

默认的AWS AMI快照需要至少30G的存储空间。当您没有启动模板时,cloudformation 将使用正确的存储大小。

就我而言,我在启动模板中仅定义了 8G 存储空间。当使用启动模板时,作业将陷入可运行状态。

只需将启动模板中的存储更改为大于 30G 的值即可。它会起作用的。

另外,不要忘记启动模板中需要IamInstanceProfileSecurityGroupIds才能开始作业。