lxc-attach: Remodel cgroup attach logic and attach to namespaces again in parent process
With the introduction of lxc-attach's functionality to attach to cgroups,
the setns() calls were put in the child process after the fork() and not the
parent process before the fork() so the parent process remained outside the
namespaces and could add the child to the correct cgroup.
Unfortunately, the pid namespace really affects only children of the current
process and not the process itself, which has several drawbacks: The
attached program does not have a pid inside the container and the context
that is used when remounting /proc from that process is wrong. Thus, the
previous logic of first setting the namespaces and then forking so the child
process (which then exec()s to the desired program) is a real member of the
container.
However, inside the container, there is no guarantee that the cgroup
filesystem is still be mounted and that we are allowed to write to it (which
is why the setns() was moved in the first place).
To work around both problems, we separate the cgroup attach functionality
into two parts: Preparing the attach process, which just opens the tasks
files of all cgroups and keeps the file descriptors open and the writing to
those fds part. This allows us to open all the tasks files in lxc_attach,
then call setns(), then fork, in the child process close them completely and
in the parent process just write the pid of the child process to all those
fds.
Signed-off-by: Christian Seiler <christian@iwakd.de> Cc: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>