Remove dead state clients from state client list. Consider the following
scenario:
01 start container
02 issue shutdown request
03 state_client_fd is added to lxc_handler
03 container doesn't respond to shutdown request
04 user aborts shutdown request
05 lxc_cmd_fd_cleanup() removes state_client_fd from lxc_mainloop
06 invalid state_client_fd is still recorded in the lxc_handler
07 user issues lxc_cmd_stop() request via SIGKILL
08 container reaches STOPPED state and sends message to state_client_fd
09 state_client_fd number has been reused by lxc_cmd_stop_callback()
10 invalid data gets dumped to lxc_cmd_stop()
Reproducer:
Set an invalid shutdown signal to which the init system does not respond with a
shutdown via lxc.signal.halt e.g. "lxc.signal.halt = SIGUSR1". Then do:
2. try to shutdown container
root@conventiont|~
> lxc-stop -n a1
3. abort shutdown
^C
4. SIGKILL the container (lxc.signal.stop = SIGKILL)
root@conventiont|~
> lxc-stop -n a1 -k
lxc-stop: a1: commands.c: lxc_cmd_rsp_recv: 165 File too large - Response data for command "stop" is too long: 12641 bytes > 8192
To not let this happen we remove the state_client_fd from the lxc_handler when
we detect a cleanup event in lxc_cmd_fd_cleanup().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
The lxc_console_create() function used to munge the ringbuffer setup and the
log file setup already. This made somewhat sense when we didn't have a separate
ringbuffer log file. Now it's just plain confusing. So split this into logical
helpers that future maintainers can understand:
This allows cleanly exiting a console session without control sequences.
Relates to https://github.com/lxc/lxd/pull/4001 .
Note that the existence of a signal handler now doesn't guarantee that ts->node
is allocated. Instead, ts->node will now only be added to if stdinfd is a tty.
New checks need to take that into account.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
In order to enable proper unprivileged cgroup delegation on newer kernels we not
just need to delegate the "cgroup.procs" file but also "cgroup.threads". But
don't report an error in case it doesn't exist. Also delegate
"cgroup.subtree_control" to enable delegation of controllers to descendant
cgroups.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
console: add "write_logfile" to console_log struct
If a console log file was specified this flag indicates whether the contents of
the ringbuffer should be written to the logfile when a request is sent to the
ringbuffer.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
When users request that the container keep a console ringbuffer we will not
continously write to the on-disk logfile as mirroring the contents of the
in-memory ringbuffer on-disk is costly and complicated. Instead, we dump the
ringbuffer contents on-disk when the container stops or fails to start. This
way users can still diagnose problems or retrieve the last contents of the
ringbuffer on-disk.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
We need to have lxc_attach() distinguish between a caller specifying specific
namespaces to attach to and a caller not requesting specific namespaces. The
latter is taken by lxc_attach() to mean that all namespaces will be attached.
This also needs to include all inherited namespaces.
Closes #1890.
Closes #1897.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
The configure checks for these use AC_CHECK_DECLS, which define the symbol
to 0 if not available - So adjust the code to match. From the autoconf
manual:
- Implement inheriting user namespaces.
- When inheriting user namespaces make sure to not try and map ids again. The
kernel will not allow you to do this.
- Change clone() logic:
1. If we inherit no namespaces simply call lxc_clone().
2. If we inherit any namespaces call lxc_fork_attach_clone(). Here's why:
- Causes one syscall (fork()) instead of two syscalls (setns() to
inherited namespace and setns() back to parent namespace) to be
performed.
- Allows us to get rid of a bunch of variables and helper functions/code.
- Sharing a user namespaces requires us to setns() to the inherited user
namespace but the kernel does not allow reattaching to a parent user
namespace. So the old logic made user namespace inheritance impossible.
By using the lxc_fork_attach_clone() model we can simply setns() to the
inherited user namespace in the fork()ed child and be done with it.
The only thing we need to do is to specify CLONE_PARENT when calling
clone() in lxc_fork_attach_clone() so that we can wait on the child.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This commit also gets rid of ~10 unnecessarily file descriptors that were kept
open. Before we kept open:
- A set of file descriptors that refer to the monitor's namespaces. These were
only used to reattach to the monitor's namespace in lxc_spawn() and were
never used anywhere else. So close them and don't keep them around.
- A list of inherited file descriptors.
- A list of file descriptors referring to the containers's namespaces to pass
to lxc.hook.stop. This list duplicated inherited file descriptors.
Let's simply use a single list in the handler that has all file descriptors we
need and get rid of all other ones. As an illustration. Starting a container
1. Without this patch and looking at the fds that the monitor keeps open (26):
There's no obvious need to strdup() the name of the container in the handler.
We can simply make this a pointer to the memory allocated in
lxc_container_new().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Some toolchains which are not bionic like uclibc does not support
prlimit or prlimit64. In this case, return an error.
Moreover, if prlimit64 is available, use lxc implementation of prlimit.
In case cgroup namespaces are supported but we do not have CAP_SYS_ADMIN we
need to mount cgroups for the container. This patch enables both privileged and
unprivileged containers without CAP_SYS_ADMIN.
Closes #1737.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
When attaching to a container's namespaces we did not handle the case where we
inherited namespaces correctly. In essence, liblxc on start records the
namespaces the container was created with in the handler. But it only records
the clone flags that were passed to clone() and doesn't record the namespaces
we e.g. inherited from other containers. This means that attach only ever
attached to the clone flags. But this is only correct if all other namespaces
not recorded in the handler refer to the namespaces of the caller. However,
this need not be the case if the container has inherited namespaces from
another container. To handle this case we need to check whether caller and
container are in the same namespace. If they are, we know that things are all
good. If they aren't then we need to attach to these namespaces as well.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Antonio Terceiro [Sat, 28 Oct 2017 11:20:35 +0000 (09:20 -0200)]
lxc-debian: don't hardcode valid releases
This avoids the dance of updating the list of valid releases every time
Debian makes a new release.
It also fixes the following bug: even though lxc-debian will default to
creating containers of the latest stable by querying the archive, it
won't allow you to explicitly request `stable` because the current list
of valid releases don't include it.
Last, but not least, avoid hitting the mirror in the case the desired
release is one of the ones we know will always be there, i.e. stable,
testing, sid, and unstable.
Signed-off-by: Antonio Terceiro <terceiro@debian.org>
Antonio Terceiro [Thu, 26 Oct 2017 22:42:49 +0000 (20:42 -0200)]
lxc-debian: allow creating `testing` and `unstable`
Being able to create `testing` containers, regardless of what's the name
of the next stable, is useful in several contexts, included but not
limited to testing purposes. i.e. one won't need to explicitly switch to
`bullseye` once `buster` is released to be able to continue tracking
`testing`. While we are at it, let's also enable `unstable`, which is
exactly the same as `sid`, but there is no reason for not being able to.
Signed-off-by: Antonio Terceiro <terceiro@debian.org>