This commit deals with different kernel and userspace layouts and nesting. Here
are three examples:
1. 64bit kernel and 64bit userspace running 32bit containers
2. 64bit kernel and 32bit userspace running 64bit containers
3. 64bit kernel and 64bit userspace running 32bit containers running 64bit containers
Two things to lookout for:
1. The compat arch that is detected might have already been present in the main
context. So check that it actually hasn't been and only then add it.
2. The contexts don't need merging if the architectures are the same and also can't be.
With these changes I can run all crazy/weird combinations with proper seccomp
isolation.
When starting application containers without a mapping for container root are
started, a dummy bind-mount target for lxc-init needs to be created. This will
not always work directly under "/" when e.g. permissions are missing due to the
ownership and/or mode of "/". We can try to work around this by using the
P_tmpdir as defined in POSIX which should usually land us in /tmp where
basically everyone can create files.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
We should always default to mounting devpts with gid=5 but we should fallback
to mounting without gid=5. This let's us cover use-cases such as container
started with only a single mapping e.g.:
lxc.idmap = u 1000 1000 1
lxc.idmap = g 1000 1000 1
Closes #2257.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Fix compilation with static libcap and shared gnutls
Commit c06ed219c47098f34485d408410b6ecc94a40877 has broken
compilation with a static libcap and a shared gnutls.
This results in a build failure on init_lxc_static if gnutls is
a shared library as init_lxc_static is built with -all-static option
(see src/lxc/Makefile.am) and AC_CHECK_LIB adds gnutls to LIBS.
This commit fix the issue by removing default behavior of AC_CHECK_LIB
and handling manually GNUTLS_LIBS and HAVE_LIBGNUTLS
The problem here is that these two clauses were ordered backwards: we first
check if the signal came from not the init pid, and if it did, then we give
a notice and return. The comment notes that this is intended to protect
against SIGCHLD, but we don't in fact know if the signal is a SIGCHLD yet,
because that's tested in the next hunk.
The symptom is that if I e.g. send SIGTERM from the outside world to the
container init, it ignores it and gives this notice. If we re-order these
clauses, it forwards non SIGCHLD signals, and ignores SIGCHLD signals from
things that aren't the real container process.
Tycho Andersen [Thu, 22 Mar 2018 15:49:08 +0000 (09:49 -0600)]
remove leading whitespace from log files
This has annoyed me for a long time, 3.0 seems like the time to fix it :).
I think the way that the log prefix was intended to be used was perhaps a
dynamic prefix per file, but we don't do that today; we include the
filename later in the log message. Instead, we use it as the tool name,
which for liblxc is always "lxc", but could also be things like
"lxc-cgroup" or whatever. There is absolutely no reason to pad this, since
it is always the same for every log file (in fact, we could probably get
rid of the prefix all together, but that seems slightly more drastic).
Instead, let's just drop this padding. Hopefully this will save thousands
of hours of slight annoyance and right scrolling in various pastebins.
The "--ldcache" argument allows overriding the location of the DSO cache:
https://github.com/NVIDIA/libnvidia-container/commit/41656bf9ed71448972f3254a10ceb3c53225a4e6
The "--root" argument allows nvidia-container-cli to execute in a different rootfs:
https://github.com/NVIDIA/libnvidia-container/commit/019fdc14e325eea55fbe0397a581bda9d0c4c5b1
Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
Felix Abecassis [Mon, 19 Mar 2018 18:38:06 +0000 (11:38 -0700)]
hooks: fix nvidia hook when running under the lxc-start AppArmor profile
For a reason that I don't understand, the profile transition needs to
be done on the current process. Changing the attributes for a
subsequent execve(2) (with /proc/self/attr/exec) will cause the kernel
to set AT_SECURE in the auxiliary vector and thus secure_getenv(3)
inside libnvidia-container will return NULL.
Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
Tycho Andersen [Thu, 15 Mar 2018 15:29:27 +0000 (15:29 +0000)]
fix handler use-after-free
The problem here is that __lxc_start frees the handler, so any use
afterwards is invalid. Since we don't have access to the actual struct
lxc_container object in __lxc_start, let's pass a pointer to error_num in
so it can be returned.
Unfortunately, I'm a little too paranoid to change the return type of
lxc_start, since it returns failure if some of the cleanup fails, which
may be useful in some cases. So let's keep this out of band.
Closes #2218
Closes #2219
Reported-by: Felix Abecassis <fabecassis@nvidia.com> Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Igor Galić [Wed, 14 Mar 2018 15:53:24 +0000 (16:53 +0100)]
conf: fix clang warning when building w/o libcap
when compiling lxc with clang-5.0 parse_cap()'s main loop will produce a
warning about a tautological comparision (#2215).
By moving the result of computation into a variable (end) this is no
longer a constant expression. clang-5.0 does not do dataflow analysis at
this point, so it is, to quote someone from #llvm, "morally equivalent"
to casting `(int)i`.
in addition, we also clean up the #if HAVE_LIBCAP to no longer need
its #else branch!
Signed-off-by: Igor Galić <igor.galic@automatic-server.com>
Tycho Andersen [Mon, 12 Mar 2018 15:39:37 +0000 (09:39 -0600)]
usernsexec: init log fd
lxc-usernsexec uses some functions (e.g. lxc_map_ids()), which are part of
the lxc library and thus use the WARN etc. macros to emit log messages.
However, it doesn't initialize the log in any way, so these messages go
into the ether.
lxc-usernsexec currently has no log parameters, so let's just log these to
stderr. Someone can do something fancier later if they want.