在 docker 容器内运行 docker 容器时如何解决 cgroup 错误?

J. *_*lka 5 cgroups docker docker-in-docker

我正在尝试在一个正在运行的 ubuntu docker 容器中运行一些多容器构建测试,我用它来构建我的应用程序(通常,我有一个 Gitlab CI 设置)。

我发现当尝试运行指定内存限制的容器时,我遇到如下错误:

ERROR: for <service-name>  Cannot start service <service-name>: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:385: applying cgroup configuration for process caused: cannot enter cgroupv2 "/sys/fs/cgroup/docker" with domain controllers -- it is in threaded mode: unknown
Run Code Online (Sandbox Code Playgroud)

最小工作示例

这是一个(几乎)最小的工作示例:

# start from ubuntu base image
docker run -it --privileged ubuntu:18.04 /bin/bash

# once inside the container, install docker
apt-get update
apt-get remove docker docker-engine docker.io containerd runc
apt-get install -y apt-transport-https ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
  "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null
apt-get update
apt-get install -y docker-ce docker-ce-cli containerd.io

# start docker daemon
/etc/init.d/docker stop # should already be stopped
dockerd -H unix:///var/run/docker.sock -H tcp://0.0.0.0:2375 &

# run some container -- fails
docker run --memory=1gb eclipse-mosquitto:1.6

# run some container -- works
docker run eclipse-mosquitto:1.6
Run Code Online (Sandbox Code Playgroud)

我收到的输出(拉取图像后)是:

time="2022-01-27T01:23:20.018095900Z" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/424ce744b789f06b7f5ff94331df19b995e5de3ace50d4307b35886c9052f2a6 pid=4697
INFO[2022-01-27T01:23:20.064529100Z] shim disconnected                             id=424ce744b789f06b7f5ff94331df19b995e5de3ace50d4307b35886c9052f2a6
ERRO[2022-01-27T01:23:20.064613000Z] copy shim log                                 error="read /proc/self/fd/13: file already closed"
ERRO[2022-01-27T01:23:20.069022100Z] stream copy error: reading from a closed fifo 
ERRO[2022-01-27T01:23:20.072130600Z] stream copy error: reading from a closed fifo 
ERRO[2022-01-27T01:23:20.122636800Z] 424ce744b789f06b7f5ff94331df19b995e5de3ace50d4307b35886c9052f2a6 cleanup: failed to delete container from containerd: no such container 
ERRO[2022-01-27T01:23:20.123051000Z] Handler for POST /v1.41/containers/424ce744b789f06b7f5ff94331df19b995e5de3ace50d4307b35886c9052f2a6/start returned error: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:385: applying cgroup configuration for process caused: cannot enter cgroupv2 "/sys/fs/cgroup/docker" with domain controllers -- it is in an invalid state: unknown 
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:385: applying cgroup configuration for process caused: cannot enter cgroupv2 "/sys/fs/cgroup/docker" with domain controllers -- it is in an invalid state: unknown.
ERRO[0004] error waiting for container: context canceled 
Run Code Online (Sandbox Code Playgroud)

可能的解决方案

我遇到的一个选择/var/run/docker.sock是,在运行基本容器时,我应该安装此卷,即:

ERROR: for <service-name>  Cannot start service <service-name>: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:385: applying cgroup configuration for process caused: cannot enter cgroupv2 "/sys/fs/cgroup/docker" with domain controllers -- it is in threaded mode: unknown
Run Code Online (Sandbox Code Playgroud)

我想它基本上锁定了主机的 docker 守护进程(我的理解可能不太正确)。然而,如上所述,我使用 Gitlab CI 设置并将此卷安装到运行器的容器中对我来说不是一个实用的解决方案(因为它需要特定于运行器的配置)。

另一种选择

我还遇到了更“标准”的docker-in-dockerdocker.sock (dind) 方法,如果我将该卷安装到容器中,它再次可以正常工作,即:

# start from ubuntu base image
docker run -it --privileged ubuntu:18.04 /bin/bash

# once inside the container, install docker
apt-get update
apt-get remove docker docker-engine docker.io containerd runc
apt-get install -y apt-transport-https ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
  "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null
apt-get update
apt-get install -y docker-ce docker-ce-cli containerd.io

# start docker daemon
/etc/init.d/docker stop # should already be stopped
dockerd -H unix:///var/run/docker.sock -H tcp://0.0.0.0:2375 &

# run some container -- fails
docker run --memory=1gb eclipse-mosquitto:1.6

# run some container -- works
docker run eclipse-mosquitto:1.6
Run Code Online (Sandbox Code Playgroud)

我的请求

有没有任何解决方案可以让我让这个多容器设置在以下限制下工作?

  1. 我无法安装/var/run/docker.sock:/var/run/docker.sock到基础容器中。
  2. 我无法删除内部容器中的内存限制。

tia*_*non 0

请参阅https://github.com/containerd/containerd/issues/6659,特别是https://github.com/moby/moby/blob/38805f20f9bcc5e87869d6c79d432b166e1c88b4/hack/dind#L28-L38

# cgroup v2: enable nesting
if [ -f /sys/fs/cgroup/cgroup.controllers ]; then
    # move the processes from the root group to the /init group,
    # otherwise writing subtree_control fails with EBUSY.
    # An error during moving non-existent process (i.e., "cat") is ignored.
    mkdir -p /sys/fs/cgroup/init
    xargs -rn1 < /sys/fs/cgroup/cgroup.procs > /sys/fs/cgroup/init/cgroup.procs || :
    # enable controllers
    sed -e 's/ / +/g' -e 's/^/+/' < /sys/fs/cgroup/cgroup.controllers \
        > /sys/fs/cgroup/cgroup.subtree_control
fi
Run Code Online (Sandbox Code Playgroud)

(对于现代 v2 cgroup,您必须启用嵌套才能使其工作。)