无法将容器映像从私有 Artifact Registry 拉取到 GKE Autopilot,即使这些映像位于同一项目中

akr*_*sum 2 kubernetes google-kubernetes-engine google-artifact-registry autopilot

根据下面的文章,我们似乎可以将容器映像从 ArtifactRegistry 拉取到 GKE,而无需任何额外的身份验证(当它们位于同一项目中时)。

https://cloud.google.com/artifact-registry/docs/integrate-gke

https://www.youtube.com/watch?v=BfS7mvPA-og

GKE 出现错误:ImagePullBackOff 和错误:ErrImagePull 错误

但当我尝试时,我遇到了ImagePullBackOff错误。
有没有什么错误?误解?或者我应该使用其他身份验证吗?

复制

在https://console.cloud.google.com上的某些项目中使用 Google Cloud Shell 很方便。

创建工件注册表

gcloud artifacts repositories create test \
    --repository-format=docker \
    --location=asia-northeast2
Run Code Online (Sandbox Code Playgroud)

推送样本图片

gcloud auth configure-docker asia-northeast2-docker.pkg.dev
docker pull nginx
docker tag nginx asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image
docker push asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image
Run Code Online (Sandbox Code Playgroud)

创建 GKE Autopilot 集群

使用 GUI 控制台创建 GKE Autopilot 集群。

几乎所有选项都是默认的,但我更改了这两个。

  • 将集群名称设置为 test。
  • 设置与注册表相同的区域。(在本例中为 asia-northeast2)
  • 启用 Anthos Service Mesh。

将容器映像从 Artifact Registry 部署到 GKE

gcloud container clusters get-credentials test --zone asia-northeast2
kubectl run test --image asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image
Run Code Online (Sandbox Code Playgroud)

检查 Pod 状态

kubectl describe po test
Run Code Online (Sandbox Code Playgroud)
Name:             test
Namespace:        default
Priority:         0
Service Account:  default
Node:             xxxxxxxxxxxxxxxxxxx
Start Time:       Wed, 08 Feb 2023 12:38:08 +0000
Labels:           run=test
Annotations:      autopilot.gke.io/resource-adjustment:
                    {"input":{"containers":[{"name":"test"}]},"output":{"containers":[{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"reque...
                  seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:           Pending
IP:               10.73.0.25
IPs:
  IP:  10.73.0.25
Containers:
  test:
    Container ID:
    Image:          asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image
    Image ID:
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ErrImagePull
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:                500m
      ephemeral-storage:  1Gi
      memory:             2Gi
    Requests:
      cpu:                500m
      ephemeral-storage:  1Gi
      memory:             2Gi
    Environment:          <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-szq85 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-szq85:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 kubernetes.io/arch=amd64:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age   From                                   Message
  ----     ------     ----  ----                                   -------
  Normal   Scheduled  19s   gke.io/optimize-utilization-scheduler  Successfully assigned default/test to xxxxxxxxxxxxxxxxxxx
  Normal   Pulling    16s   kubelet                                Pulling image "asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image"
  Warning  Failed     16s   kubelet                                Failed to pull image "asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image": rpc error: code = Unknown desc = failed to pull and unpack image "asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image:latest": failed to resolve reference "asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image:latest": failed to authorize: failed to fetch oauth token: unexpected status: 403 Forbidden
  Warning  Failed     16s   kubelet                                Error: ErrImagePull
  Normal   BackOff    15s   kubelet                                Back-off pulling image "asia-northeast2-docker.pkg.dev/${PROJECT_NAME}/test/sample-nginx-image"
  Warning  Failed     15s   kubelet                                Error: ImagePullBackOff
Run Code Online (Sandbox Code Playgroud)

然后,我得到了ImagePullBackOff

小智 5

这可能是因为 GKE Autopilot 服务帐户没有访问 Artifact Registry 所需的权限。roles/artifactregistry.reader您可以通过将角色添加到 GKE Autopilot 节点池配置使用的服务帐号来授予所需的权限。此外,您可能需要调整服务帐户的IAM 权限,以便它能够访问私有 Artifact Registry。

\n
gcloud artifacts repositories add-iam-policy-binding <repository-name> \\\n  --location=<location> \\\n  --member=serviceAccount:<nnn>-compute@developer.gserviceaccount.com \\\n  --role="roles/artifactregistry.reader"\n
Run Code Online (Sandbox Code Playgroud)\n

您可以尝试创建一个新的服务帐户并授予其拉取映像所需的权限并尝试拉取一次映像。

\n

简单的故障排除步骤是:

\n
    \n
  1. 您应该确保您的 GKE 集群配置为允许访问 Artifact Registry。您可以通过转到 GKE 仪表板并确保启用 \xe2\x80\x9cAllow access to ArtifactRegistry\xe2\x80\x9d 选项来执行此操作。
  2. \n
  3. ArtifactRegistry 中不存在您尝试拉取的容器映像。您应该检查注册表以确保容器镜像已正确上传并且可以访问。
  4. \n
  5. 您可以查看错误日志以获取有关导致此问题的原因的更多信息。此外,您可以查看 GKE 文档,了解有关排查此问题的更多信息。
  6. \n
\n