如何在出现第一个错误时停止 Cloud-init

chr*_*tfr 5 provisioning configuration-management cloud-init

当我使用 Cloud-init 启动 Linux 服务器时,我有一些脚本,/etc/cloud/cloud.cfg.d/它们按相反的字母顺序运行

# ll /etc/cloud/cloud.cfg.d/
total 28
-rw-r--r-- 1 root root  173 Dec 10 12:38 00-cloudinit-lifecycle-hook.cfg
-rw-r--r-- 1 root root 2120 Jun  1  2021 05_logging.cfg
-rw-r--r-- 1 root root  590 Oct 26 17:55 10_aws_yumvars.cfg
-rw-r--r-- 1 root root   29 Dec  1 18:22 20_amazonlinux_repo_https.cfg
-rw-r--r-- 1 root root  586 Dec 10 12:38 50-cloudinit-tomcat.cfg
-rw-r--r-- 1 root root  585 Dec 10 12:40 60-cloudinit-newrelic.cfg
Run Code Online (Sandbox Code Playgroud)

最后执行的是00-cloudinit-lifecycle-hook.cfg,我使用 .txt 文件完成 Auto Scaling 组的生命周期CONTINUE。如果 ASG 在给定超时后未收到此信号,则它会失败。

问题是,即使出现错误50-cloudinit-tomcat.cfg,它仍然会运行00-cloudinit-lifecycle-hook.cfg而不是停止

如何确保 cloud-init 停止并且永远不会到达最后一个脚本?如果出现任何错误,我希望 ASG 永远不会收到 CONTINUE 信号。

以下是文件:

EC2实例用户数据:

#cloud-config

bootcmd:
  - [cloud-init-per, once, "app-volume", mkfs, -t, "ext4", "/dev/nvme1n1"]

mounts:
   - ["/dev/nvme1n1", "/app-volume", "ext4", "defaults,nofail", "0", "0"]

merge_how:
  - name: list
    settings: [append]
  - name: dict
    settings: [no_replace, recurse_list]
Run Code Online (Sandbox Code Playgroud)

50-cloudinit-tomcat.cfg

#cloud-config
merge_how:
 - name: list
   settings: [append]
 - name: dict
   settings: [no_replace, recurse_list]

runcmd:
  - "#!/bin/bash -e"
  - set +x
  - echo ' '
  - echo '# ===================================='
  - echo '#          Tomcat Cloud Init '
  - echo '#       /etc/cloud/cloud.cfg.d/'
  - echo '# ===================================='
  - echo ' '
  - echo '#===================================='
  - echo '#          Run Ansible'
  - echo '#===================================='
  - echo ' '
  - set -x
  - ansible-playbook /opt/init-config/tomcat/tomcat-config.yaml

Run Code Online (Sandbox Code Playgroud)

当我ansible-playbook /opt/init-config/tomcat/tomcat-config.yaml直接在实例中运行时,我收到错误,并且我知道它返回 2

#cloud-config

bootcmd:
  - [cloud-init-per, once, "app-volume", mkfs, -t, "ext4", "/dev/nvme1n1"]

mounts:
   - ["/dev/nvme1n1", "/app-volume", "ext4", "defaults,nofail", "0", "0"]

merge_how:
  - name: list
    settings: [append]
  - name: dict
    settings: [no_replace, recurse_list]
Run Code Online (Sandbox Code Playgroud)

00-cloudinit-lifecycle-hook.cfg

#cloud-config
merge_how:
 - name: list
   settings: [append]
 - name: dict
   settings: [no_replace, recurse_list]

runcmd:
  - "/opt/lifecycles/lifecycle-hook-continue.sh"

Run Code Online (Sandbox Code Playgroud)

我能想到的另一种选择是,一旦某个 cloud-init 配置出现错误,就发送 ABANDON 信号而不是 CONTINUE。但我在文档中找不到定义是否存在错误