如何使用 Terraform 创建健康的 VPC-Native GKE 集群?

Har*_*dic 3 vpc kubernetes google-kubernetes-engine terraform

通过 Terraform,我尝试在单个区域 (europe-north1-b) 中创建一个 VPC 原生 GKE 集群,具有单独的节点池,GKE 集群和节点池位于自己的 VPC 网络中。

\n

我的代码如下所示:

\n
resource "google_container_cluster" "gke_cluster" {\n  description              = "GKE Cluster for personal projects"\n  initial_node_count       = 1\n  location                 = "europe-north1-b"\n  name                     = "prod"\n  network                  = google_compute_network.gke.self_link\n  remove_default_node_pool = true\n  subnetwork               = google_compute_subnetwork.gke.self_link\n\n  ip_allocation_policy {\n    cluster_secondary_range_name  = local.cluster_secondary_range_name\n    services_secondary_range_name = local.services_secondary_range_name\n  }\n}\n\nresource "google_compute_network" "gke" {\n  auto_create_subnetworks         = false\n  delete_default_routes_on_create = false\n  description                     = "Compute Network for GKE nodes"\n  name                            = "${terraform.workspace}-gke"\n  routing_mode                    = "GLOBAL"\n}\n\nresource "google_compute_subnetwork" "gke" {\n  name          = "prod-gke-subnetwork"\n  ip_cidr_range = "10.255.0.0/16"\n  region        = "europe-north1"\n  network       = google_compute_network.gke.id\n\n  secondary_ip_range {\n    range_name    = local.cluster_secondary_range_name\n    ip_cidr_range = "10.0.0.0/10"\n  }\n\n  secondary_ip_range {\n    range_name    = local.services_secondary_range_name\n    ip_cidr_range = "10.64.0.0/10"\n  }\n}\n\nlocals {\n  cluster_secondary_range_name  = "cluster-secondary-range"\n  services_secondary_range_name = "services-secondary-range"\n}\n\nresource "google_container_node_pool" "gke_node_pool" {\n  cluster    = google_container_cluster.gke_cluster.name\n  location   = "europe-north1-b"\n  name       = terraform.workspace\n  node_count = 1\n  \n  node_locations = [\n    "europe-north1-b"\n  ]\n\n  node_config {\n    disk_size_gb    = 100\n    disk_type       = "pd-standard"\n    image_type      = "cos_containerd"\n    local_ssd_count = 0\n    machine_type    = "g1-small"\n    preemptible     = false\n    service_account = google_service_account.gke_node_pool.email\n  }\n}\n\nresource "google_service_account" "gke_node_pool" {\n  account_id   = "${terraform.workspace}-node-pool"\n  description  = "The default service account for pods to use in ${terraform.workspace}"\n  display_name = "GKE Node Pool ${terraform.workspace} Service Account"\n}\n\nresource "google_project_iam_member" "gke_node_pool" {\n  member = "serviceAccount:${google_service_account.gke_node_pool.email}"\n  role   = "roles/viewer"\n}\n
Run Code Online (Sandbox Code Playgroud)\n

但是,每当我应用此 Terraform 代码时,我都会收到以下错误:

\n
google_container_cluster.gke_cluster: Still creating... [24m30s elapsed]\ngoogle_container_cluster.gke_cluster: Still creating... [24m40s elapsed]\n\xe2\x95\xb7\n\xe2\x94\x82 Error: Error waiting for creating GKE cluster: All cluster resources were brought up, but: component "kube-apiserver" from endpoint "gke-xxxxxxxxxxxxxxxxxxxx-yyyy" is unhealthy.\n\xe2\x94\x82 \n\xe2\x94\x82   with google_container_cluster.gke_cluster,\n\xe2\x94\x82   on gke.tf line 1, in resource "google_container_cluster" "gke_cluster":\n\xe2\x94\x82    1: resource "google_container_cluster" "gke_cluster" {\n\xe2\x94\x82 \n\xe2\x95\xb5\n
Run Code Online (Sandbox Code Playgroud)\n

然后我的集群会自动删除。

\n

我发现我的 Terraform 代码/语法没有问题,并且通过 Google Cloud Logging 进行搜索以找到更详细的错误消息,但没有成功。

\n

那么,如何使用 Terraform 创建健康的 VPC-Native GKE 集群?

\n

Har*_*dic 6

事实证明,问题似乎在于拥有较大的子网次要范围。

如问题所示,我有范围:

  • 10.0.0.0/10为了cluster_secondary_range
  • 10.64.0.0/10为了services_secondary_range

这些/10CIDR 分别覆盖4194304IP 地址,我认为这些 IP 地址对于 Google/GKE 来说可能太大而无法处理(?)——特别是因为所有 GKE 文档都使用涵盖更小的集群和服务范围的 CIDR。

我决定缩小这些 CIDR 范围,看看是否有帮助:

  • 10.0.0.0/12为了cluster_secondary_range
  • 10.16.0.0/12为了services_secondary_range

这些/12CIDR1048576分别覆盖 IP 地址。
进行此更改后,我的集群已成功创建:

google_container_cluster.gke_cluster: Creation complete after 5m40s
Run Code Online (Sandbox Code Playgroud)

不确定为什么 Google / GKE 无法处理集群和服务的更大 CIDR 范围,但/12对我来说已经足够好了,并且允许成功创建集群。