ain*_*sti 5 amazon-s3 amazon-web-services amazon-ecs terraform
We want to build an ECS cluster with the following characteristics:
We've already read this post in stackoverflow where it says that we need to set up a private subnet with a route table that points to a NAT Gateway configured in a public subnet, and this public subnet should point to an internet gateway. We already have this configuration. We also have an S3 vpc endpoint configured in the route table.
Bellow, you can see some relevant configurations of the cluster in terraform (for the shake of simplicity I only put the relevant parts):
# Launch template
resource "aws_launch_template" "train-launch-template" {
name_prefix = "{var.project_name}-launch-template-${var.env}"
image_id = "ami-01f62a207c1d180d2"
instance_type = "m5.large"
key_name="XXXXXX"
iam_instance_profile {
name = aws_iam_instance_profile.ecs-instance-profile.name
}
user_data = base64encode(data.template_file.user_data.rendered)
network_interfaces {
associate_public_ip_address = false
security_groups = [aws_security_group.ecs_service.id]
}
}
# Task definition
resource "aws_ecs_task_definition" "task" {
family = "${var.project_name}-${var.env}-train-task"
execution_role_arn = data.aws_iam_role.ecs_task_execution_role.arn
task_role_arn = aws_iam_role.ecs_train_task_role.arn
requires_compatibilities = ["EC2"]
cpu = var.ecs_cpu
network_mode = "awsvpc"
memory = var.ecs_memory
container_definitions = data.template_file.app_definition.rendered
tags = {
Stage = var.env_tag
Project = var.project_name_tag
}
}
# Cluster
resource "aws_ecs_cluster" "cluster" {
name = "${var.project_name}-${var.env}-train-ecs-cluster"
capacity_providers = [aws_ecs_capacity_provider.train-capacity-provider.name]
default_capacity_provider_strategy {
capacity_provider = aws_ecs_capacity_provider.train-capacity-provider.name
}
tags = {
Project = var.project_name_tag
Stage = var.env_tag
}
}
Run Code Online (Sandbox Code Playgroud)
We also have configured all the roles needed for the instances and the task to access to the required resources (S3, ECR, ECS).
The AMI corresponds to an ECS optimized instance (the last version published at this moment in eu-west-1).
In the launch template we've removed the public IP to the instances due to the explanation in this link
我们已经发展到这种配置,试图使其工作,但我们一次又一次面临同样的问题:当任务被触发时,容量提供者启动一个实例,但任务永远不会被放置在容器实例中并保持不变无限期地处于 PROVISIONING 状态。
使用相同的配置,但将实例放入公共子网中,任务将被放入容器实例中,但是,正如第一个链接中警告的那样,任务无法访问互联网。
我们需要一些启示或踪迹来追随。先感谢您。
更新:根据要求,我添加了有关自动缩放的其余部分
resource "aws_autoscaling_group" "train-autoscaling" {
availability_zones = ["eu-west-1b"]
desired_capacity = 0
max_size = 10
min_size = 0
protect_from_scale_in = true
launch_template {
id = aws_launch_template.train-launch-template.id
version = "$Latest"
}
tags = [
{
key = "Project",
value = var.project_name_tag
propagate_at_launch = true
},
{
key = "Stage",
value = var.env_tag
propagate_at_launch = true
}
]
}
resource "aws_ecs_capacity_provider" "train-capacity-provider" {
name = "${var.project_name}-${var.env}-train-capacity-provider"
auto_scaling_group_provider {
auto_scaling_group_arn = aws_autoscaling_group.train-autoscaling.arn
managed_termination_protection = "ENABLED"
managed_scaling {
status = "ENABLED"
target_capacity = 100
maximum_scaling_step_size = 1
minimum_scaling_step_size = 1
}
}
}
data "template_file" "user_data" {
template = "${file("${path.module}/user_data.sh")}"
vars = {
cluster_name = "${var.project_name}-${var.env}-train-ecs-cluster"
}
}
Run Code Online (Sandbox Code Playgroud)
更新 2(AWS 控制台信息):
更新3:
更新4:
来自容器实例的日志。ecs-agent.log
level=info time=2020-08-28T11:09:21Z msg="Loading configuration" module=agent.go
level=info time=2020-08-28T11:09:21Z msg="Amazon ECS agent Version: 1.44.1, Commit: 1f05fbf0" module=agent.go
level=info time=2020-08-28T11:09:21Z msg="Image excluded from cleanup: amazon/amazon-ecs-pause:0.1.0" module=docker_image_manager.go
level=info time=2020-08-28T11:09:21Z msg="Image excluded from cleanup: amazon/amazon-ecs-pause:0.1.0" module=docker_image_manager.go
level=info time=2020-08-28T11:09:21Z msg="Image excluded from cleanup: amazon/amazon-ecs-agent:latest" module=docker_image_manager.go
level=info time=2020-08-28T11:09:21Z msg="Creating root ecs cgroup: /ecs" module=init_linux.go
level=info time=2020-08-28T11:09:21Z msg="Creating cgroup /ecs" module=cgroup_controller_linux.go
level=info time=2020-08-28T11:09:21Z msg="Event stream ContainerChange start listening..." module=eventstream.go
level=info time=2020-08-28T11:09:21Z msg="Loading state!" module=state_manager.go
level=info time=2020-08-28T11:09:23Z msg="Registering Instance with ECS" module=agent.go
level=info time=2020-08-28T11:09:23Z msg="Remaining mem: 7680" module=client.go
level=info time=2020-08-28T11:09:23Z msg="Registered container instance with cluster!" module=client.go
level=info time=2020-08-28T11:09:23Z msg="Registration completed successfully. I am running as 'arn:aws:ecs:eu-west-1:XXXXXXXXXXXXXXXX:container-instance/foqum-read-dev-train-ecs-cluster/95559f936f8d44de9373595009fcd588' in cluster 'foqum-read-dev-train-ecs-cluster'" module=agent.go
level=info time=2020-08-28T11:09:23Z msg="Beginning Polling for updates" module=agent.go
level=info time=2020-08-28T11:09:23Z msg="Initializing stats engine" module=engine.go
level=info time=2020-08-28T11:09:23Z msg="Event stream DeregisterContainerInstance start listening..." module=eventstream.go
level=info time=2020-08-28T11:09:23Z msg="Establishing a Websocket connection to https://ecs-t-X.eu-west-1.amazonaws.com/ws?agentHash=1f05fbf0&agentVersion=1.44.1&cluster=XXXXXXXXX-cluster&containerInstance=arn%3Aaws%3Aecs%3Aeu-west-1%3AXXXXXXXX%3Acontainer-instance%2FXXXXXXXX-cluster%2F95559fXXXXXXde9373595009fcd588&dockerVersion=19.03.6-ce" module=client.go
level=info time=2020-08-28T11:09:23Z msg="NO_PROXY set:XXX.254.169.XXXX,XXXX.254.XXX.2,/var/run/docker.sock" module=client.go
level=info time=2020-08-28T11:09:23Z msg="Establishing a Websocket connection to https://ecs-a-X.eu-west-1.amazonaws.com/ws?agentHash=1f05fbf0&agentVersion=1.44.1&clusterArn=XXXXX-ecs-cluster&containerInstanceArn=arn%3Aaws%3Aecs%3Aeu-west-1%XXXXXX%3Acontainer-instance%2FXXXXX-ecs-cluster%2F9XXXXX6f8d44de9373595009fcd588&dockerVersion=DockerVersion%3A+19.03.6-ce&sendCredentials=true&seqNum=1" module=client.go
level=info time=2020-08-28T11:09:23Z msg="Connected to TCS endpoint" module=handler.go
level=info time=2020-08-28T11:09:23Z msg="Connected to ACS endpoint" module=acs_handler.go
level=info time=2020-08-28T11:20:04Z msg="TCS Websocket connection closed for a valid reason" module=handler.go
level=info time=2020-08-28T11:20:04Z msg="Establishing a Websocket connection to https://ecs-t-X.eu-west-1.amazonaws.com/ws?agentHash=1f05fbf0&agentVersion=1.44.1&cluster=XXXXXXXecs-cluster&containerInstance=arn%3Aaws%3Aecs%3Aeu-west-1%3AXXXXXX3Acontainer-instance%2FZZZXXXXX-ecs-cluster%2F95XXX936f8d44de9373595009fcd588&dockerVersion=19.03.6-ce" module=client.go
level=info time=2020-08-28T11:20:04Z msg="Connected to TCS endpoint" module=handler.go
Run Code Online (Sandbox Code Playgroud)
ecs-init.log
2020-08-28T11:09:19Z [INFO] pre-start
2020-08-28T11:09:20Z [INFO] start
2020-08-28T11:09:20Z [INFO] No existing agent container to remove.
2020-08-28T11:09:20Z [INFO] Starting Amazon Elastic Container Service Agent
Run Code Online (Sandbox Code Playgroud)
最后!!解开了谜团!
问题不在于集群配置。通过 ECS API 调用 run_task 时,您需要指定任务应运行到的子网。
我们的代码在此字段中设置公共子网之一的值。因此,当我们将容器实例更改为与此公共子网对应的可用区时,任务就被放置了。
从代码中更改此调用可以正确放置任务并且可以访问互联网。
| 归档时间: |
|
| 查看次数: |
4554 次 |
| 最近记录: |