Azure 容器注册表映像拉取速度非常慢,映像大小约为 150 MB

HXK*_*HXK 5 azure azure-container-registry azure-aks

将映像部署到 AKS 实例时,从 ACR(高级 SKU)提取映像的速度非常慢,即使是大小约为 150 MB 左右的“小”映像也是如此。

AKS 资源和 ACR 资源均位于加拿大东部地区。

这是一个例子:

root@076fff2831b2:/tmp# kubectl describe pod application-service-59bcf96874-pvrmb
Name:           application-service-59bcf96874-pvrmb
Namespace:      default
Priority:       0
Node:           aks-41067869-1/10.255.13.163
Start Time:     Tue, 11 Feb 2020 18:15:53 -0500
Labels:         app.kubernetes.io/instance=application-service
                app.kubernetes.io/name=application-service
                pod-template-hash=59bcf96874
Annotations:    <none>
Status:         Running
IP:             10.255.13.175
IPs:            <none>
Controlled By:  ReplicaSet/application-service-59bcf96874
Containers:
  application-service:
    Container ID:   docker://0e86526a293d9055d482a09f043f0be68c594244fe4216f8fb190bc2caf6b65b
    Image:          myacr01.azurecr.io/microservices/application-service:0.0.6
    Image ID:       docker-pullable://myacr01.azurecr.io/microservices/application-service@sha256:cfbb3ffa7adc52da9cc0b8d7f78376076ea712025b59df8e406c559d369f4085
    Port:           3000/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 11 Feb 2020 18:35:00 -0500
      Finished:     Tue, 11 Feb 2020 18:35:00 -0500
    Ready:          False
    Restart Count:  5
    Liveness:       http-get https://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get https://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      PORT:                        3000
      undefined:                   undefined
    Mounts:
      /kvmnt from application-service-kv-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from application-service-token-9jk8j (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  application-service-kv-volume:
    Type:       FlexVolume (a generic volume resource that is provisioned/attached using an exec based plugin)
    Driver:     azure/kv
    FSType:
    SecretRef:  &LocalObjectReference{Name:kvcreds,}
    ReadOnly:   false
    Options:    map[keyvaultname:testIt2 keyvaultobjectnames:APPLICATION-SVC-SQLDB-CS;INGESTION-CONSUMER-EHB-CS;INGESTION-PRODUCER-EHB-CS keyvaultobjecttypes:secret;secret;secret tenantid:REMOVED usepodidentity:false usevmmanagedidentity:false]
  application-service-token-9jk8j:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  application-service-token-9jk8j
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                    From                     Message
  ----     ------     ----                   ----                     -------
  Normal   Scheduled  20m                    default-scheduler        Successfully assigned default/application-service-59bcf96874-pvrmb to aks-41067869-1
  Normal   Pulling    20m                    kubelet, aks-41067869-1  Pulling image "myacr01.azurecr.io/microservices/application-service:0.0.6"
  Normal   Pulled     4m39s                  kubelet, aks-41067869-1  Successfully pulled image "myacr01.azurecr.io/microservices/application-service:0.0.6"
  Normal   Started    3m36s (x4 over 4m33s)  kubelet, aks-41067869-1  Started container application-service
  Warning  BackOff    3m4s (x11 over 4m30s)  kubelet, aks-41067869-1  Back-off restarting failed container
  Normal   Pulled     2m52s (x4 over 4m32s)  kubelet, aks-41067869-1  Container image "myacr01.azurecr.io/microservices/application-service:0.0.6" already present on machine
  Normal   Created    2m51s (x5 over 4m33s)  kubelet, aks-41067869-1  Created container application-service
Run Code Online (Sandbox Code Playgroud)

出于隐私原因,修改/删除了一些细节。

然而,需要注意的是,对于来自 ACR 的图像,从“拉动”状态到“拉动”状态需要大约 15m。

这个问题每天都在发生。AKS 实例的 Azure Insights 边栏选项卡显示过去 7 天内节点 CPU 利用率最高为 26%,节点内存利用率最高为 14.32%。

我们如何进一步解决此问题以确定延迟的可能原因?

任何帮助是极大的赞赏。

谢谢!