yee*_*379 5 gpu nvidia kubernetes
我已经能够让 kubernetes 识别我的节点上的 GPU:
$ kubectl get node MY_NODE -o yaml
...
allocatable:
cpu: "48"
ephemeral-storage: "15098429006"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 263756344Ki
nvidia.com/gpu: "8"
pods: "110"
capacity:
cpu: "48"
ephemeral-storage: 16382844Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 263858744Ki
nvidia.com/gpu: "8"
pods: "110"
...
Run Code Online (Sandbox Code Playgroud)
我旋转一个吊舱
Limits:
cpu: 2
memory: 2147483648
nvidia.com/gpu: 1
Requests:
cpu: 500m
memory: 536870912
nvidia.com/gpu: 1
Run Code Online (Sandbox Code Playgroud)
但是,pod 仍处于 PENDING 状态:
Insufficient nvidia.com/gpu.
Run Code Online (Sandbox Code Playgroud)
我正确指定了资源吗?
你在K8S中安装了NVIDIA插件吗?
kubectl create -f nvidia.io/device-plugin.yml
Run Code Online (Sandbox Code Playgroud)
有些设备太旧,无法进行健康检查。因此必须禁用此选项:
containers:
- image: nvidia/k8s-device-plugin:1.9
name: nvidia-device-plugin-ctr
env:
- name: DP_DISABLE_HEALTHCHECKS
value: "xids"
Run Code Online (Sandbox Code Playgroud)
看一眼:
| 归档时间: |
|
| 查看次数: |
3217 次 |
| 最近记录: |