lxc_execute() and lxc-execute where broken when a user tried to switch to a
non-root uid/gid. This prevented necessary setup operations like mounting the
rootfs which require root in the user namespace. This commit separates
switching to root in the user namespace from switching to the requested uid/gid
by lxc_execute().
This should be safe: Once we switched to root in the user namespace via
setuid() and then switch to a non-root uid/gid in the user namespace for
lxc_execute() via setuid() we cannot regain root privileges again. So we can
only make us safer (Unless I forget about some very intricate user namespace
nonsense; which is not as unlikely as I try to make it sound.).
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This commit adds lxc_switch_uid_gid() which allows to switch the uid and gid of
a process via setuid() and setgid() and lxc_setgroups() which allows to set
groups via setgroups(). The main advantage is that they nicely log the switches
they perform.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
On some Android systems the lxc folders where containers are stored might be
read-only and so checking for O_RDWR, will effectively make the tools useless
on these systems, so let's dumb the check down to O_RDONLY.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This mainly affects Trusty. The 3.13 kernel has a broken overlay module which
does not handle symlinks correctly. This is a problem for containers that use
an overlay based rootfs since safe_mount() uses /proc/<pid>/fd/<fd-number> in
its calls to mount().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Fabrice Fontaine [Sun, 18 Dec 2016 20:39:24 +0000 (21:39 +0100)]
Add --enable-gnutls option
Previously HAVE_LIBGNUTLS was never set in config.h even if gnutls was
detected as AC_CHECK_LIB default action-if-found was overriden by
enable_gnutls=yes
This patch adds an --enable-gnutls option and will call AC_CHECK_LIB
with the default action to write HAVE_LIBGNUTLS in config.h
fli [Tue, 6 Dec 2016 08:59:52 +0000 (00:59 -0800)]
confile: support the network link string pattern matching
Enable lxc network config support the following type and link:
lxc.network.type = phys
lxc.network.link = eth+
Here, the suffix '+' will trigger a string pattern matching
and when lxc find any network interfaces name prefixed with
"eth" such as "eth0", "eth1", "ethxxxx" and so on, it will
try to move them into the container's namespace; If it didn't
find any matching, it would do nothing for this configure
line.
Aside from adding a 42.2 option, $DISTRO comparisons for Leap have been
changed [ exp ] => [[ exp ]] to accomodate pattern matching for future
releases.
Signed-off-by: Terzeus S. Dominguez <tsdmgz@gmail.com>
FooDeas [Thu, 8 Dec 2016 13:03:10 +0000 (14:03 +0100)]
templates: fix getty service startup
Commit bf39edb39ecaea25801d716aebef798885277992 broke the handling of the getty service file with an '@' character in filename. So the startup condition was not fixed.
Because the parameter was quoted with the causal commit, the escaping has to be removed.
Signed-off-by: Andreas Eberlein foodeas@aeberlein.de
Converts a unix time Epoch given by a struct timespec to a UTC string useable
in our logging functions. Maybe expanded to allow for more generic formatting.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Our log functions need to make extra sure that they are thread-safe. We had
some problems with that before. This especially involves time-conversion
functions. I don't want to find any localtime() or gmtime() functions or
relatives in here. Not even localtime_r() or gmtime_r() or relatives. They all
fiddle with global variables and locking in various libcs. They cause deadlocks
when liblxc is used multi-threaded and no matter how smart you think you are,
you __will__ cause trouble using them.
(As a short example how this can cause trouble: LXD uses forkstart to fork off
a new process that runs the container. At the same time the go runtime LXD
relies on does its own multi-threading thing which we can't control. The
fork()ing + threading then seems to mess with the locking states in these time
functions causing deadlocks.)
The current solution is to be good old unix people and use the Epoch as our
reference point and simply use the seconds and nanoseconds that have past since
then. This relies on clock_gettime() which is explicitly marked MT-Safe with no
restrictions! This way, anyone who is really strongly invested in getting the
actual time the log entry was created, can just convert it for themselves. Our
logging is mostly done for debugging purposes so don't try to make it pretty.
Pretty might cost you thread-safety.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
The thread-unsafe functions strsignal() is called in run_buffer() which in turn
is called in run_buffer_argv() which is responsible for running __all__ lxc
hooks. This is pretty dangerous for multi-threaded users like LXD.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Before lxc_monitord called lxc_monitord_cleanup() from a signal handler. This
function calls a bunch of async signal unsafe functions and basically begs for
deadlocks. This commit switches lxc-monitord to using sigsetjmp() and
siglongjmp() in the signal handler to jump to a cleanup label that call
lxc_monitord_cleanup(). In this way, we avoid using async signal unsafe
functions.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
lxc_monitord: improve log + set log level to DEBUG
Setting loglevel to DEBUG will allow us to retrieve more useful information in
case something goes wrong. The total size of the log will not increase
significantly.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Before we used tmpfile() to write out mount entries for the container. This
requires a writeable /tmp file system which can be a problem for systems where
this filesystem is not present. This commit switches from tmpfile() to using
the memfd_create() syscall. It allows us to create an anonymous tmpfs file (And
is somewhat similar to mmap().) which is automatically deleted as soon as any
references to it are dropped. In case we detect that syscall is not
implemented, we fallback to using tmpfile().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
localtime_r() can lead to deadlocks because it calls __tzset() and
__tzconvert() internally. The deadlock stems from an interaction between these
functions and the functions in monitor.c and commands.{c,h}. The latter
functions will write to the log independent of the container thread that is
currently running. Since the monitor fork()ed it seems to duplicate the mutex
states of the time functions mentioned above causing the deadlock.
As a short termm fix, I suggest to simply disable receiving the time when
monitor.c or command.{c,h} functions are called. This should be ok, since the
[lxc monitor] will only emit a few messages and thread-safety is currently more
important than beautiful logs. The rest of the log stays the same as it was
before.
Here is an example output from logs where I printed the pid and tid of the
process that is currently writing to the log:
lxc 20161125170200.619 INFO lxc_start: 18695-18695: - start.c:lxc_check_inherited:243 - Closed inherited fd: 23.
lxc 20161125170200.640 DEBUG lxc_start: 18677-18677: - start.c:__lxc_start:1334 - Not dropping CAP_SYS_BOOT or watching utmp.
lxc 20161125170200.640 INFO lxc_cgroup: 18677-18677: - cgroups/cgroup.c:cgroup_init:68 - cgroup driver cgroupfs-ng initing for lxc-test-concurrent-0
----------> lxc 20150427012246.000 INFO lxc_monitor: 13017-18622: - monitor.c:lxc_monitor_sock_name:178 - using monitor sock name lxc/ad055575fe28ddd5//var/lib/lxc
lxc 20161125170200.662 DEBUG lxc_cgfsng: 18677-18677: - cgroups/cgfsng.c:filter_and_set_cpus:478 - No isolated cpus detected.
lxc 20161125170200.662 DEBUG lxc_cgfsng: 18677-18677: - cgroups/cgfsng.c:handle_cpuset_hierarchy:648 - "cgroup.clone_children" was already set to "1".
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This fixes a race in liblxc logging which can lead to deadlocks. The reproducer
for this issue before this is to simply compile with --enable-tests and then
run:
So far, we opened a file descriptor refering to proc on the host inside the
host namespace and handed that fd to the attached process in
attach_child_main(). This was done to ensure that LSM labels were correctly
setup. However, by exploiting a potential kernel bug, ptrace could be used to
prevent the file descriptor from being closed which in turn could be used by an
unprivileged container to gain access to the host namespace. Aside from this
needing an upstream kernel fix, we should make sure that we don't pass the fd
for proc itself to the attached process. However, we cannot completely prevent
this, as the attached process needs to be able to change its apparmor profile
by writing to /proc/self/attr/exec or /proc/self/attr/current. To minimize the
attack surface, we only send the fd for /proc/self/attr/exec or
/proc/self/attr/current to the attached process. To do this we introduce a
little more IPC between the child and parent:
* IPC mechanism: (X is receiver)
* initial process intermediate attached
* X <--- send pid of
* attached proc,
* then exit
* send 0 ------------------------------------> X
* [do initialization]
* X <------------------------------------ send 1
* [add to cgroup, ...]
* send 2 ------------------------------------> X
* [set LXC_ATTACH_NO_NEW_PRIVS]
* X <------------------------------------ send 3
* [open LSM label fd]
* send 4 ------------------------------------> X
* [set LSM label]
* close socket close socket
* run program
The attached child tells the parent when it is ready to have its LSM labels set
up. The parent then opens an approriate fd for the child PID to
/proc/<pid>/attr/exec or /proc/<pid>/attr/current and sends it via SCM_RIGHTS
to the child. The child can then set its LSM laben. Both sides then close the
socket fds and the child execs the requested process.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>