Ada*_*dam 7 containers google-compute-engine
我有一个执行单个大型计算的Docker容器。此计算需要大量内存,并且需要大约12个小时才能运行。
我可以创建适当大小的Google Compute Engine VM,并使用“将容器映像部署到此VM实例”选项来完美运行此作业。但是,一旦作业完成,容器将退出,但VM仍在运行(并且正在充电)。
容器退出时如何使VM退出/停止/删除?
当VM处于其僵尸模式时,仅堆栈驱动器容器处于运行状态:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bfa2feb03180 gcr.io/stackdriver-agents/stackdriver-logging-agent:0.2-1.5.33-1-1 "/entrypoint.sh /u..." 17 hours ago Up 17 hours stackdriver-logging-agent
161439a487c2 gcr.io/stackdriver-agents/stackdriver-metadata-agent:0.2-0.0.17-2 "/bin/sh -c /opt/s..." 17 hours ago Up 17 hours 8000/tcp stackdriver-metadata-agent
Run Code Online (Sandbox Code Playgroud)
我这样创建虚拟机:
gcloud beta compute --project=abc instances create-with-container vm-name \
--zone=us-central1-c --machine-type=custom-1-65536-ext \
--network=default --network-tier=PREMIUM --metadata=google-logging-enabled=true \
--maintenance-policy=MIGRATE \
--service-account=xyz \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--image=cos-stable-69-10895-71-0 --image-project=cos-cloud --boot-disk-size=10GB \
--boot-disk-type=pd-standard --boot-disk-device-name=vm-name \
--container-image=gcr.io/abc/my-image --container-restart-policy=on-failure \
--container-command=python3 \
--container-arg="a" --container-arg="b" --container-arg="c" \
--labels=container-vm=cos-stable-69-10895-71-0
Run Code Online (Sandbox Code Playgroud)
创建VM时,需要向其授予对计算的写访问权限,以便您可以从中删除实例。您还应该在此时设置容器环境变量,例如gce_zone
和gce_project_id
。您将需要他们删除实例。
gcloud beta compute instances create-with-container {NAME} \
--container-env=gce_zone={ZONE},gce_project_id={PROJECT_ID} \
--service-account={SERVICE_ACCOUNT} \
--scopes=https://www.googleapis.com/auth/compute,...
...
Run Code Online (Sandbox Code Playgroud)
然后在容器中,每当您确定任务完成时:
curl "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token" -H "Metadata-Flavor: Google"
Run Code Online (Sandbox Code Playgroud)
这将以看起来像的json进行响应
{
"access_token": "foobarbaz...",
"expires_in": 1234,
"token_type": "Bearer"
}
Run Code Online (Sandbox Code Playgroud)
instances.delete
api端点(注意环境变量)curl -XDELETE -H 'Authorization: Bearer {TOKEN}' https://www.googleapis.com/compute/v1/projects/$gce_project_id/zones/$gce_zone/instances/$HOSTNAME
Run Code Online (Sandbox Code Playgroud)
我根据文森特的答案编写了一个独立的Python函数。
def kill_vm():
"""
If we are running inside a GCE VM, kill it.
"""
# based on /sf/ask/3692383271/
import json
import logging
import requests
# get the token
r = json.loads(
requests.get("http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token",
headers={"Metadata-Flavor": "Google"})
.text)
token = r["access_token"]
# get instance metadata
# based on https://cloud.google.com/compute/docs/storing-retrieving-metadata
project_id = requests.get("http://metadata.google.internal/computeMetadata/v1/project/project-id",
headers={"Metadata-Flavor": "Google"}).text
name = requests.get("http://metadata.google.internal/computeMetadata/v1/instance/name",
headers={"Metadata-Flavor": "Google"}).text
zone_long = requests.get("http://metadata.google.internal/computeMetadata/v1/instance/zone",
headers={"Metadata-Flavor": "Google"}).text
zone = zone_long.split("/")[-1]
# shut ourselves down
logging.info("Calling API to delete this VM, {zone}/{name}".format(zone=zone, name=name))
requests.delete("https://www.googleapis.com/compute/v1/projects/{project_id}/zones/{zone}/instances/{name}"
.format(project_id=project_id, zone=zone, name=name),
headers={"Authorization": "Bearer {token}".format(token=token)})
Run Code Online (Sandbox Code Playgroud)
一个简单的atexit
钩子让我得到我想要的行为:
import atexit
atexit.register(kill_vm)
Run Code Online (Sandbox Code Playgroud)
解决了一段时间后,这里提供了一个很好的完整解决方案。
此解决方案不使用“带有容器映像的启动计算机”选项。相反,它使用启动脚本,该脚本更加灵活。您仍然使用容器优化的OS实例实例。
#!/usr/bin/env bash
# get image name and container parameters from the metadata
IMAGE_NAME=$(curl http://metadata.google.internal/computeMetadata/v1/instance/attributes/image_name -H "Metadata-Flavor: Google")
CONTAINER_PARAM=$(curl http://metadata.google.internal/computeMetadata/v1/instance/attributes/container_param -H "Metadata-Flavor: Google")
# This is needed if you are using a private images in GCP Container Registry
# (possibly also for the gcp log driver?)
sudo HOME=/home/root /usr/bin/docker-credential-gcr configure-docker
# Run! The logs will go to stack driver
sudo HOME=/home/root docker run --log-driver=gcplogs ${IMAGE_NAME} ${CONTAINER_PARAM}
# Get the zone
zoneMetadata=$(curl "http://metadata.google.internal/computeMetadata/v1/instance/zone" -H "Metadata-Flavor:Google")
# Split on / and get the 4th element to get the actual zone name
IFS=$'/'
zoneMetadataSplit=($zoneMetadata)
ZONE="${zoneMetadataSplit[3]}"
# Run compute delete on the current instance. Need to run in a container
# because COS machines don't come with gcloud installed
docker run --entrypoint "gcloud" google/cloud-sdk:alpine compute instances delete ${HOSTNAME} --delete-disks=all --zone=${ZONE}
Run Code Online (Sandbox Code Playgroud)
将脚本放在公共场所。例如,将其放在Cloud Storage上并创建一个公共URL。您不能将gs://
URI用于COS启动脚本。
使用启动实例startup-script-url
,并传递图像名称和参数,例如:
gcloud compute --project=PROJECT_NAME instances create INSTANCE_NAME \
--zone=ZONE --machine-type=TYPE \
--metadata=image_name=IMAGE_NAME,\
container_param="PARAM1 PARAM2 PARAM3",\
startup-script-url=PUBLIC_SCRIPT_URL \
--maintenance-policy=MIGRATE --service-account=SERVICE_ACCUNT \
--scopes=https://www.googleapis.com/auth/cloud-platform --image-family=cos-stable \
--image-project=cos-cloud --boot-disk-size=10GB --boot-disk-device-name=DISK_NAME
Run Code Online (Sandbox Code Playgroud)
(您可能想限制scopes
,为了简单起见,示例使用完全访问权限)
归档时间: |
|
查看次数: |
799 次 |
最近记录: |