在过去的几个月里,我们一直遇到一个让我们痛苦的问题。
问题似乎是,有时当我们通过 Kubernetes 执行器请求 pod 时,它无法创建。
例如,spark pod 可能会失败并出现以下错误:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 20m (x3 over 32m) kubelet, k8s-agentpool1-123456789-vmss00000q (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "spark-worker-cc1d28bf3de8428a826c04471e58487c-8577d5d654-2jg89": operation timeout: context deadline exceeded
Normal SandboxChanged 16m (x150 over 159m) kubelet, k8s-agentpool1-123456789-vmss00000q Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 5m7s (x14 over 161m) kubelet, …Run Code Online (Sandbox Code Playgroud)