The node was low on resource: [DiskPressure]. but df -h shows 47% usage only

Question

The node was low on resource: [DiskPressure]. but df -h shows 47% usage only

I have a node in my K8S cluster that I use for monitoring tools.

Pods running here: Grafana, PGAdmin, Prometheus, and kube-state-metrics

My problem is that I have a lot of evicted pods

The pods evicted: kube-state-metrics, grafana-core, pgadmin

Then, the pod evicted with reason: The node was low on resource: [DiskPressure]. : kube-state-metrics (90% of evicted pods), pgadmin (20% of evicted pods)

When I check any of the pods, I have free space on disk:

bash-5.0$ df -h
Filesystem                Size      Used Available Use% Mounted on
overlay                   7.4G      3.3G      3.7G  47% /
tmpfs                    64.0M         0     64.0M   0% /dev
tmpfs                   484.2M         0    484.2M   0% /sys/fs/cgroup
/dev/nvme0n1p2            7.4G      3.3G      3.7G  47% /dev/termination-log
shm                      64.0M         0     64.0M   0% /dev/shm
/dev/nvme0n1p2            7.4G      3.3G      3.7G  47% /etc/resolv.conf
/dev/nvme0n1p2            7.4G      3.3G      3.7G  47% /etc/hostname
/dev/nvme0n1p2            7.4G      3.3G      3.7G  47% /etc/hosts
/dev/nvme2n1            975.9M      8.8M    951.1M   1% /var/lib/grafana
/dev/nvme0n1p2            7.4G      3.3G      3.7G  47% /etc/grafana/provisioning/datasources
tmpfs                   484.2M     12.0K    484.2M   0% /run/secrets/kubernetes.io/serviceaccount
tmpfs                   484.2M         0    484.2M   0% /proc/acpi
tmpfs                    64.0M         0     64.0M   0% /proc/kcore
tmpfs                    64.0M         0     64.0M   0% /proc/keys
tmpfs                    64.0M         0     64.0M   0% /proc/timer_list
tmpfs                    64.0M         0     64.0M   0% /proc/sched_debug
tmpfs                   484.2M         0    484.2M   0% /sys/firmware

Run Code Online (Sandbox Code Playgroud)

Only one or two pods show another message:

The node was low on resource: ephemeral-storage. Container addon-resizer was using 48Ki, which exceeds its request of 0. Container kube-state-metrics was using 44Ki, which exceeds its request of 0.

The node was low on resource: ephemeral-storage. Container pgadmin was using 3432Ki, which exceeds its request of 0.

Run Code Online (Sandbox Code Playgroud)

I also have kubelet saying:

(combined from similar events): failed to garbage collect required amount of images. Wanted to free 753073356 bytes, but freed 0 bytes

Run Code Online (Sandbox Code Playgroud)

I have those pods running on a AWS t3.micro

It appears that it is not affecting my services in production.

Why is it happening, and how should I fix this.

df -h编辑：这是我在节点中执行的结果

admin@ip-172-20-41-112:~$ df -h 
Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           789M  3.0M  786M   1% /run
/dev/nvme0n1p2  7.5G  6.3G  804M  89% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup

Run Code Online (Sandbox Code Playgroud)

我可以看到/dev/nvme0n1p2，但是我怎样才能看到内容呢？当我在 / 中执行ncdu时，我只能看到 3GB 的数据......

Answer 1

mar*_*rio 6

显然，您的节点上的可用磁盘空间即将耗尽。但请记住，根据文档 DiskPressure条件表示：

\n\n

\n
节点\xe2\x80\x99s 根文件系统\n 或映像文件系统上的可用磁盘空间和 inode 已满足逐出阈值
\n

\n\n

尝试df -h在您的工人上运行node，而不是在Pod. 磁盘使用率是多少？此外，您可以检查kubelet日志以获取更多详细信息：

\n\n

journalctl -xeu kubelet.service\n

Run Code Online (Sandbox Code Playgroud)\n\n

另请参阅这篇文章和此评论。

\n\n

如果有帮助请告诉我。

\n\n

在这里您可以找到很好地解释同一主题的答案。

\n\n

更新：

\n\n

该行清楚地表明默认阈值即将被超过：

\n\n

/dev/nvme0n1p2  7.5G  6.3G  804M  89% /\n

Run Code Online (Sandbox Code Playgroud)\n\n

切换到 root 用户 ( su -) 并运行：

\n\n

du -hd1 /\n

Run Code Online (Sandbox Code Playgroud)\n\n

查看哪些目录占用了大部分磁盘空间。

\n

归档时间：	5 年，11 月前
查看次数：	11187 次
最近记录：	5 年，11 月前