来自新贵的嵌套非特权 lxc 容器,其所有者可以停止

Iva*_*gai 5 unprivileged lxc cgroup 14.04

在运行Ubuntu 14.04.5 LTS的主机中,我有一个名为ci的用户,该用户可以创建一个同样运行Ubuntu 14.04.5 LTS的启动非特权 lxc 容器。用户的 subid 范围为200000-231071。这样一个容器的配置文件是:

# Distribution configuration
lxc.include = /usr/share/lxc/config/ubuntu.common.conf
lxc.include = /usr/share/lxc/config/ubuntu.userns.conf
lxc.arch = x86_64

# Nested
lxc.mount.auto = cgroup
lxc.aa_profile = lxc-container-default-with-nesting

# Container specific configuration
lxc.id_map = u 0 200000 65536
lxc.id_map = u 100000 265536 65536
lxc.id_map = g 0 200000 65536
lxc.id_map = g 100000 265536 65536
lxc.rootfs = /home/ci/.local/share/lxc/ci/rootfs
lxc.utsname = ci

# Network configuration
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = lxcbr0
lxc.network.hwaddr = 00:16:3e:dd:f1:99
Run Code Online (Sandbox Code Playgroud)

用户可以毫无问题地创建和启动非特权容器:

ci@host:~$ lxc-create -t download -n ci -- -d ubuntu -r trusty -a amd64
ci@host:~$ lxc-start -n ci -d
ci@host:~$ lxc-ls --fancy
    NAME  STATE    IPV4                 IPV6  AUTOSTART
    ---------------------------------------------------
    ci    RUNNING  10.0.3.75, 10.0.4.1  -     NO
Run Code Online (Sandbox Code Playgroud)

在主机中,cgmanager正在运行:

root@host ~ # ps ax | grep cgmanager
    382 ?        Ss     0:01 /sbin/cgmanager --sigstop -m name=systemd
Run Code Online (Sandbox Code Playgroud)

在非特权容器ci 中cgproxy正在运行:

root@ci:~# ps ax | grep cgproxy
    288 ?        Ss     0:00 /sbin/cgproxy --sigstop
Run Code Online (Sandbox Code Playgroud)

在unprivilaged容器CI,用户名为詹金斯与子编号范围100000-65535可以创建并启动里面unprivilaged容器,即unprivilaged嵌套容器,但并非没有一些技巧,它们是:

  1. 在非特权容器ci 中以用户jenkins ssh 登录后,结果为:cat /proc/self/cgroup

    jenkins@ci:~$ cat /proc/self/cgroup
        12:hugetlb:/user/1012.user/11.session/lxc/ci
        11:net_prio:/user/1012.user/11.session/lxc/ci
        10:perf_event:/user/1012.user/11.session/lxc/ci
        9:net_cls:/user/1012.user/11.session/lxc/ci
        8:freezer:/user/1012.user/11.session/lxc/ci
        7:devices:/user/1012.user/11.session/lxc/ci
        6:memory:/user/1012.user/11.session/lxc/ci
        5:blkio:/user/1012.user/11.session/lxc/ci
        4:name=systemd:/user/1012.user/11.session/lxc/ci/user/1012.user/11.session/lxc/ci/user/107.user/c1.session
        3:cpuacct:/user/1012.user/11.session/lxc/ci
        2:cpu:/user/1012.user/11.session/lxc/ci
        1:cpuset:/user/1012.user/11.session/lxc/ci
    
    Run Code Online (Sandbox Code Playgroud)
  2. 此时jenkins可以创建容器,但是不能启动:

    jenkins@ci:~$ lxc-create -t download -n test -- -d ubuntu -r trusty -a amd64
    jenkins@ci:~$ lxc-start -n test
        lxc_container: cgmanager.c: lxc_cgmanager_create: 301 call to cgmanager_create_sync failed: invalid request
        lxc_container: cgmanager.c: lxc_cgmanager_create: 303 Failed to create hugetlb:lxc/test
        lxc_container: cgmanager.c: cgm_create: 650 Error creating cgroup hugetlb:lxc/test
        lxc_container: start.c: lxc_spawn: 891 failed creating cgroups
        lxc_container: start.c: __lxc_start: 1121 failed to spawn 'test'
        lxc_container: lxc_start.c: main: 341 The container failed to start.
        lxc_container: lxc_start.c: main: 345 Additional information can be obtained by setting the --logfile and --logpriority options.
    
    Run Code Online (Sandbox Code Playgroud)
  3. 我以 root 身份在容器中发出:

    restart systemd-logind
    
    Run Code Online (Sandbox Code Playgroud)
  4. 现在作为容器中的jenkins用户,我注销并使用ssh再次登录。cgroup 已更改,现在我可以创建和运行容器:

    jenkins@ci:~$ cat /proc/self/cgroup
        12:hugetlb:/user/1012.user/11.session/lxc/ci/user/1012.user/11.session/lxc/ci/user/107.user/c1.session
        11:net_prio:/user/1012.user/11.session/lxc/ci/user/1012.user/11.session/lxc/ci/user/107.user/c1.session
        10:perf_event:/user/1012.user/11.session/lxc/ci/user/1012.user/11.session/lxc/ci/user/107.user/c1.session
        9:net_cls:/user/1012.user/11.session/lxc/ci/user/1012.user/11.session/lxc/ci/user/107.user/c1.session
        8:freezer:/user/1012.user/11.session/lxc/ci/user/1012.user/11.session/lxc/ci/user/107.user/c1.session
        7:devices:/user/1012.user/11.session/lxc/ci/user/1012.user/11.session/lxc/ci/user/107.user/c1.session
        6:memory:/user/1012.user/11.session/lxc/ci/user/1012.user/11.session/lxc/ci/user/107.user/c1.session
        5:blkio:/user/1012.user/11.session/lxc/ci/user/1012.user/11.session/lxc/ci/user/107.user/c1.session
        4:name=systemd:/user/1012.user/11.session/lxc/ci/user/1012.user/11.session/lxc/ci/user/107.user/c1.session
        3:cpuacct:/user/1012.user/11.session/lxc/ci/user/1012.user/11.session/lxc/ci/user/107.user/c1.session
        2:cpu:/user/1012.user/11.session/lxc/ci/user/1012.user/11.session/lxc/ci/user/107.user/c1.session
        1:cpuset:/user/1012.user/11.session/lxc/ci/user/1012.user/11.session/lxc/ci/user/107.user/c1.session
    jenkins@ci:~$ lxc-create -t download -n test -- -d ubuntu -r trusty -a amd64
    jenkins@ci:~$ lxc-start -n test -d
    jenkins@ci:~$ lxc-ls --fancy
        NAME     STATE    IPV4       IPV6  AUTOSTART
        --------------------------------------------
        test     RUNNING  10.0.4.64  -     NO
    
    Run Code Online (Sandbox Code Playgroud)

第一个问题:为什么我需要restart systemd-logind以及如何避免在能够创建嵌套的非特权容器之前以 root 用户身份输入它?

在容器ci 中,我创建了一个 init 配置文件(一个位于/etc/init/jenkins.conf的 upstart conf 文件)来以jenkins用户身份运行软件Jenkins

description "jenkins"

start on filesystem and static-network-up
stop on runlevel [016]

env USER="jenkins"
env GROUP="jenkins"
env HOME="/var/lib/jenkins"
env JENKINS_LOG="/var/log/jenkins"
env JENKINS_ROOT="/usr/share/jenkins"
env JENKINS_RUN="/var/run/jenkins"
env JENKINS_PIDFILE="jenkins.pid"

pre-start script
    test -f $JENKINS_ROOT/jenkins.war || { stop ; exit 0; }
    mkdir $JENKINS_RUN > /dev/null 2>&1  || true
    chown -R $USER:$GROUP $JENKINS_RUN || true
    mkdir $JENKINS_LOG > /dev/null 2>&1  || true
    chown -R $USER:$GROUP $JENKINS_LOG || true
end script

script
    . /etc/default/jenkins
    # export XDG_SESSION_ID="/run/user/`id -u $USER`"
    export HOME
    export USER
    export GROUP
    exec daemon --name=jenkins --foreground --inherit --user=$USER:$GROUP --pidfile=$JENKINS_RUN/$JENKINS_PIDFILE --output=$JENKINS_LOG -- $JAVA $JAVA_ARGS -jar $JENKINS_WAR $JENKINS_ARGS
end script

post-start script
    while [ ! -f $JENKINS_RUN/$JENKINS_PIDFILE ]; do sleep 1; done
    PID=$(cat $JENKINS_RUN/$JENKINS_PIDFILE)
    cgm create all $USER
    cgm chown all $USER $(id -u $USER) $(id -g $USER)
    # this need to be run in the jenkins job script:
    # cgm movepid all $USER $$
end script

# vim: ft=upstart
Run Code Online (Sandbox Code Playgroud)

Jenkins进程为启动所谓的Jenkins' build发出的脚本中,如果我添加以下行:

cgm movepid all $USER $$
Run Code Online (Sandbox Code Playgroud)

该脚本可以创建和启动非特权嵌套容器,即它的 cgroup:

+ cat /proc/self/cgroup
12:hugetlb:/user/1012.user/11.session/lxc/ci/jenkins
11:net_prio:/user/1012.user/11.session/lxc/ci/jenkins
10:perf_event:/user/1012.user/11.session/lxc/ci/jenkins
9:net_cls:/user/1012.user/11.session/lxc/ci/jenkins
8:freezer:/user/1012.user/11.session/lxc/ci/jenkins
7:devices:/user/1012.user/11.session/lxc/ci/jenkins
6:memory:/user/1012.user/11.session/lxc/ci/jenkins
5:blkio:/user/1012.user/11.session/lxc/ci/jenkins
4:name=systemd:/user/1012.user/11.session/lxc/ci/jenkins
3:cpuacct:/user/1012.user/11.session/lxc/ci/jenkins
2:cpu:/user/1012.user/11.session/lxc/ci/jenkins
1:cpuset:/user/1012.user/11.session/lxc/ci/jenkins
Run Code Online (Sandbox Code Playgroud)

但是使用 ssh 登录的用户jenkins无法停止脚本创建的容器。以下永远不会完成:

jenkins@ci:~$ lxc-stop -n test
Run Code Online (Sandbox Code Playgroud)

第二个问题:我怎样才能实现用户jenkins可以停止用户jenkins从上面的初始化脚本创建的任何容器?