Tycho Andersen [Fri, 19 Jan 2018 03:31:33 +0000 (03:31 +0000)]
lxc-execute: actually exit with the status of the spawned task
Now that we have things propagated through init and liblxc correctly, at
least in non-daemon mode, we can exit with the actual exit status of the
task, instead of always succeeding, which is not so helpful.
Tycho Andersen [Fri, 19 Jan 2018 03:29:05 +0000 (03:29 +0000)]
start: don't return false when the container's init exits nonzero
This seems slightly counter-intuitive, but IMO it's what we want.
Basically, ->start() should succeed if the container is spawned correctly
(similar to how golang's exec.Cmd.Start() returns nil if the thing spawns
correctly), and users can check error_num (i.e. golang's exec.Cmd.Wait())
to see how it exited.
This preserves previous behavior, which basically was that start was always
successful if the thing actually launched. Since we never kept track of
exit codes, this would always succeed too. Now that we do, it doesn't, and
this change is required.
Tycho Andersen [Fri, 19 Jan 2018 03:24:59 +0000 (03:24 +0000)]
remember the exit code from the init process
error_num seems to be trying to remember the exit code of the init process,
except that nothing actually keeps track of it anywhere. So, let's add a
field to the handler, so that we can keep track of the process' exit
status, and the propagate it to error_num in struct lxc_container so that
people can use it.
Note that this is a slight behavior change, essentially instead of making
error_num always == the return code from start, now it contains slightly
more useful information (the actual exit status). But, there is only one
internal user of error_num which I'll fix in later in the series, so IMO
this is ok.
Tycho Andersen [Fri, 19 Jan 2018 03:21:10 +0000 (03:21 +0000)]
lxc.init: correctly exit with the app's error code
Based on the comments in the code (and the have_status flag), the intent
here (and IMO, the desired behavior) should be for init.lxc to propagate
the actual exit code from the real application process up through.
Otherwise, it is swallowed and nobody can access it.
The bug being fixed here is that ret held the correct exit code, but when
it went around the loop again (to wait for other children) ret is
clobbered. Let's save the desired exit status somewhere else, so it can't
get clobbered, and we propagate things correctly.
Tycho Andersen [Fri, 19 Jan 2018 03:20:08 +0000 (03:20 +0000)]
fix lxc_error_set_and_log to match the docs
The documentation for this function says if the task was killed by a
signal, the return code will be 128+n, where n is the signal number. Let's
make that actually true.
Tycho Andersen [Fri, 19 Jan 2018 00:50:39 +0000 (00:50 +0000)]
start: don't log stop/continue for non-init processes
This non-init forwarding check should really be before all the log messages
about "init continued" or "init stopped", since they will otherwise lie
about some process that wasn't init being stopped or continued.
When we deleted cgroups for unprivileged containers we used to allocate a new
mapping and clone a new user namespace each time we delete a cgroup. This of
course meant - on a cgroup v1 system - doing this >= 10 times when all
controllers were used. Let's not to do this and only allocate and establish a
mapping once.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
When fully unprivileged users run a container that only maps their own {g,u}id
and they do not have access to setuid new{g,u}idmap binaries we will write the
idmapping directly. This however requires us to write "deny" to
/proc/[pid]/setgroups otherwise any write to /proc/[pid]/gid_map will be
denied.
On a sidenote, this patch enables fully unprivileged containers. If you now set
lxc.net.[i].type = empty no privilege whatsoever is required to run a container.
Enhances #2033.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Felix Abecassis <fabecassis@nvidia.com> Cc: Jonathan Calmels <jcalmels@nvidia.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Serge Hallyn [Thu, 4 Jan 2018 03:02:53 +0000 (21:02 -0600)]
configure.ac: fix the check for static libcap
The existing check doesn't work, because when you statically
link a program against libc, any functions not called are not
included. So cap_init() which we check for is not there in
the built binary.
So instead just check whether a "gcc -lcap -static" works.
If libcap.a is not available it will fail, if it is it will
succeed.
- As discussed we will have a proper API extension that will allow updating
various parts of a running container. The prior approach wasn't a good idea.
- Revert this is not a problem since we haven't released any version with the
set_running_config_item() API extension.
- I'm not simply reverting so that master users can still call into new
liblxc's without crashing the container. This is achieved by keeping the
commands callback struct member number identical.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
mainloop: capture output of short-lived init procs
The handler for the signal fd will detect when the init process of a container
has exited and cause the mainloop to close. However, this can happen before the
console handlers - or any other events for that matter - are handled. So in the
case of init exiting we still need to allow for all buffered input to the
console to be handled before exiting. This allows us to capture output from
short-lived init processes.
This is conceptually equivalent to my implementation of ExecReaderToChannel()
https://github.com/lxc/lxd/blob/master/shared/util_linux.go#L527
Closes #1694.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
When set set{u,g}id() the kernel will make us undumpable. This is unnecessary
since we can guarantee that whatever is running inside the child process at
this point this is fully trusted by the parent. Making us dumpable let's users
use debuggers on the child process before the exec as well and also allows us
to open /proc/<child-pid> files in lieu of the child.
Note, that we only need to perform the prctl(PR_SET_DUMPABLE, ...) if our
effective uid on the host is not 0. If our effective uid on the host is 0 then
we will keep all capabilities in the child user namespace across set{g,u}id().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Receive fd for LSM security module before we set{g,u}id(). The reason is that
on set{g,u}id() the kernel will a) make us undumpable and b) we will change our
effective uid. This means our effective uid will be different from the
effective uid of the process that created us which means that this processs no
longer has capabilities in our namespace including CAP_SYS_PTRACE. This means
we will not be able to read and /proc/<pid> files for the process anymore when
/proc is mounted with hidepid={1,2}. So let's get the lsm label fd before the
set{g,u}id().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>