The node was low on resource: [DiskPressure]. but df -h shows 47% usage only

Jul*_*oro 2 kubernetes

I have a node in my K8S cluster that I use for monitoring tools.

Pods running here: Grafana, PGAdmin, Prometheus, and kube-state-metrics

My problem is that I have a lot of evicted pods

The pods evicted: kube-state-metrics, grafana-core, pgadmin

Then, the pod evicted with reason: The node was low on resource: [DiskPressure]. : kube-state-metrics (90% of evicted pods), pgadmin (20% of evicted pods)

When I check any of the pods, I have free space on disk:

bash-5.0$ df -h
Filesystem                Size      Used Available Use% Mounted on
overlay                   7.4G      3.3G      3.7G  47% /
tmpfs                    64.0M         0     64.0M   0% /dev
tmpfs                   484.2M         0    484.2M   0% /sys/fs/cgroup
/dev/nvme0n1p2            7.4G      3.3G      3.7G  47% /dev/termination-log
shm                      64.0M         0     64.0M   0% /dev/shm
/dev/nvme0n1p2            7.4G      3.3G      3.7G  47% /etc/resolv.conf
/dev/nvme0n1p2            7.4G      3.3G      3.7G  47% /etc/hostname
/dev/nvme0n1p2            7.4G      3.3G      3.7G  47% /etc/hosts
/dev/nvme2n1            975.9M      8.8M    951.1M   1% /var/lib/grafana
/dev/nvme0n1p2            7.4G      3.3G      3.7G  47% /etc/grafana/provisioning/datasources
tmpfs                   484.2M     12.0K    484.2M   0% /run/secrets/kubernetes.io/serviceaccount
tmpfs                   484.2M         0    484.2M   0% /proc/acpi
tmpfs                    64.0M         0     64.0M   0% /proc/kcore
tmpfs                    64.0M         0     64.0M   0% /proc/keys
tmpfs                    64.0M         0     64.0M   0% /proc/timer_list
tmpfs                    64.0M         0     64.0M   0% /proc/sched_debug
tmpfs                   484.2M         0    484.2M   0% /sys/firmware
Run Code Online (Sandbox Code Playgroud)

Only one or two pods show another message:

The node was low on resource: ephemeral-storage. Container addon-resizer was using 48Ki, which exceeds its request of 0. Container kube-state-metrics was using 44Ki, which exceeds its request of 0.

The node was low on resource: ephemeral-storage. Container pgadmin was using 3432Ki, which exceeds its request of 0.
Run Code Online (Sandbox Code Playgroud)

I also have kubelet saying:

(combined from similar events): failed to garbage collect required amount of images. Wanted to free 753073356 bytes, but freed 0 bytes
Run Code Online (Sandbox Code Playgroud)

I have those pods running on a AWS t3.micro

It appears that it is not affecting my services in production.

Why is it happening, and how should I fix this.

df -h编辑:这是我在节点中执行的结果

admin@ip-172-20-41-112:~$ df -h 
Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           789M  3.0M  786M   1% /run
/dev/nvme0n1p2  7.5G  6.3G  804M  89% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
Run Code Online (Sandbox Code Playgroud)

我可以看到/dev/nvme0n1p2,但是我怎样才能看到内容呢?当我在 / 中执行ncdu时,我只能看到 3GB 的数据......

mar*_*rio 6

显然,您的节点上的可用磁盘空间即将耗尽。但请记住,根据文档 DiskPressure条件表示:

\n\n
\n

节点\xe2\x80\x99s 根文件系统\n 或映像文件系统上的可用磁盘空间和 inode 已满足逐出阈值

\n
\n\n

尝试df -h在您的工人上运行node,而不是在Pod. 磁盘使用率是多少?此外,您可以检查kubelet日志以获取更多详细信息:

\n\n
journalctl -xeu kubelet.service\n
Run Code Online (Sandbox Code Playgroud)\n\n

另请参阅这篇文章和评论。

\n\n

如果有帮助请告诉我。

\n\n

在这里您可以找到很好地解释同一主题的答案。

\n\n

更新:

\n\n

该行清楚地表明默认阈值即将被超过:

\n\n
/dev/nvme0n1p2  7.5G  6.3G  804M  89% /\n
Run Code Online (Sandbox Code Playgroud)\n\n

切换到 root 用户 ( su -) 并运行:

\n\n
du -hd1 /\n
Run Code Online (Sandbox Code Playgroud)\n\n

查看哪些目录占用了大部分磁盘空间。

\n