I have a node in my K8S cluster that I use for monitoring tools.
Pods running here: Grafana, PGAdmin, Prometheus, and kube-state-metrics
My problem is that I have a lot of evicted pods
The pods evicted: kube-state-metrics, grafana-core, pgadmin
Then, the pod evicted with reason: The node was low on resource: [DiskPressure]. : kube-state-metrics (90% of evicted pods), pgadmin (20% of evicted pods)
When I check any of the pods, I have free space on disk:
bash-5.0$ df -h
Filesystem Size Used Available Use% Mounted on
overlay 7.4G 3.3G 3.7G 47% /
tmpfs 64.0M 0 64.0M 0% /dev
tmpfs 484.2M 0 484.2M 0% /sys/fs/cgroup
/dev/nvme0n1p2 7.4G 3.3G 3.7G 47% /dev/termination-log
shm 64.0M 0 64.0M 0% /dev/shm
/dev/nvme0n1p2 7.4G 3.3G 3.7G 47% /etc/resolv.conf
/dev/nvme0n1p2 7.4G 3.3G 3.7G 47% /etc/hostname
/dev/nvme0n1p2 7.4G 3.3G 3.7G 47% /etc/hosts
/dev/nvme2n1 975.9M 8.8M 951.1M 1% /var/lib/grafana
/dev/nvme0n1p2 7.4G 3.3G 3.7G 47% /etc/grafana/provisioning/datasources
tmpfs 484.2M 12.0K 484.2M 0% /run/secrets/kubernetes.io/serviceaccount
tmpfs 484.2M 0 484.2M 0% /proc/acpi
tmpfs 64.0M 0 64.0M 0% /proc/kcore
tmpfs 64.0M 0 64.0M 0% /proc/keys
tmpfs 64.0M 0 64.0M 0% /proc/timer_list
tmpfs 64.0M 0 64.0M 0% /proc/sched_debug
tmpfs 484.2M 0 484.2M 0% /sys/firmware
Run Code Online (Sandbox Code Playgroud)
Only one or two pods show another message:
The node was low on resource: ephemeral-storage. Container addon-resizer was using 48Ki, which exceeds its request of 0. Container kube-state-metrics was using 44Ki, which exceeds its request of 0.
The node was low on resource: ephemeral-storage. Container pgadmin was using 3432Ki, which exceeds its request of 0.
Run Code Online (Sandbox Code Playgroud)
I also have kubelet saying:
(combined from similar events): failed to garbage collect required amount of images. Wanted to free 753073356 bytes, but freed 0 bytes
Run Code Online (Sandbox Code Playgroud)
I have those pods running on a AWS t3.micro
It appears that it is not affecting my services in production.
Why is it happening, and how should I fix this.
df -h编辑:这是我在节点中执行的结果
admin@ip-172-20-41-112:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 3.9G 0 3.9G 0% /dev
tmpfs 789M 3.0M 786M 1% /run
/dev/nvme0n1p2 7.5G 6.3G 804M 89% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
Run Code Online (Sandbox Code Playgroud)
我可以看到/dev/nvme0n1p2,但是我怎样才能看到内容呢?当我在 / 中执行ncdu时,我只能看到 3GB 的数据......
显然,您的节点上的可用磁盘空间即将耗尽。但请记住,根据文档 DiskPressure条件表示:
\n\n\n节点\xe2\x80\x99s 根文件系统\n 或映像文件系统上的可用磁盘空间和 inode 已满足逐出阈值
\n
尝试df -h在您的工人上运行node,而不是在Pod. 磁盘使用率是多少?此外,您可以检查kubelet日志以获取更多详细信息:
journalctl -xeu kubelet.service\nRun Code Online (Sandbox Code Playgroud)\n\n\n\n如果有帮助请告诉我。
\n\n在这里您可以找到很好地解释同一主题的答案。
\n\n该行清楚地表明默认阈值即将被超过:
\n\n/dev/nvme0n1p2 7.5G 6.3G 804M 89% /\nRun Code Online (Sandbox Code Playgroud)\n\n切换到 root 用户 ( su -) 并运行:
du -hd1 /\nRun Code Online (Sandbox Code Playgroud)\n\n查看哪些目录占用了大部分磁盘空间。
\n| 归档时间: |
|
| 查看次数: |
11187 次 |
| 最近记录: |