Serge Hallyn [Thu, 25 Feb 2016 19:01:12 +0000 (11:01 -0800)]
cgfs: make sure we use valid cgroup mountpoints
If lxcfs starts before cgroup-lite, then the first cgroup mountpoints in
/proc/self/mountinfo are /run/lxcfs/*. Unprivileged users cannot access
these. So privileged containers are ok, and unprivileged containers are ok
since they won't cache those to begin with. But unprivileged root-owned
containers cache /run/lxcfs/* and then try to use them.
So when doing cgroup automounting check whether the mountpoints we have
stored are accessible, and if not look for a new one to use.
Serge Hallyn [Thu, 25 Feb 2016 01:00:35 +0000 (17:00 -0800)]
cgfs: do not automount if cgroup namespaces are supported
In that case containers will be able to mount cgroup filesystems
for themselves as they do on a host.
This fixes inability to start systemd based containers on cgns-enabled
kernels with cgmanager not running.
I've tested debian jessie, busybox, ubuntu trusty and xenial, all of
which booted ok. However if there are some setups which require
premounted cgroupfs (i.e. they don't mount if they detect being in
a container), this may cause trouble.
- lxc-clone and lxc-start-ephemeral are marked deprecated. We add a
--enable-deprecated flag to configure.ac allowing us to enable these
deprecated executables
- update tests to use lxc-copy instead of lxc-clone
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org>
- add note to lxc-clone manpage that it is superseded by lxc-copy
- add note to lxc-start-ephemeral manpage that it is superseded by lxc-copy
- fix typo in lxc-attach manpage
- fix some of my comments in lxc_ls.c
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org>
- explain rationale behind allocation of pty
- briefly explain how a pty is allocated
- add a short note that describes the changed behavior for lxc-attach when the
user is not placed in a writeable cgroup at login
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org>
- The code required to prepare an fd to act as a login tty is shared among
pty_on_host_callback() and fork_pty(). This implements login_pty(), a
minimalistic login_tty() clone, to avoid code redundancy.
- Give pty_in_container() a slightly extended comment.
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org>
Serge Hallyn [Sun, 21 Feb 2016 23:38:11 +0000 (15:38 -0800)]
add lxc-default-cgns profile
This isn't safe for privileged containers which do not use cgroup
namespaces, but is required for systemd containers with cgroup
namespaces. So create a new profile for it which lxc will use as
the default when it knows it can.
So far lxc-attach did not use a pty when attaching to a container. This made it
vulnerable to tty input faking via TIOCSTI when switching to a different user.
This patch makes lxc-attach use a pty in most cases. The only current exemption
is when stdin, stdout, and stderr are not referring to a pty.
There are two ways how lxc-attach can receive a pty:
1. get a pty in the container
2. get a pty on the host
This patch makes 1. the default and only opts for 2. when 1. fails before
giving up. The rationale behind this is as follows: If we create a pty on the
host (2.) and pass the fds to the container the container may report "no tty"
when the "tty" command is used. This could be irritating for users when they
expect that lxc-attach now always tries to use a pty. Hence, option 1. is the
default.
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org>
lxc_console_cb_tty_masterfd() unnecessarily reported a read/write error when
the fd was closed. This happens e.g. when we have allocated a tty in the
container with lxc-console and we shut the container down. lxc-console will
then exit with an error message. This patch introduces a test whether the
EPOLLHUP bit is set in the events mask. If so, we report no error.
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org>
Nikolay Martynov [Sun, 21 Feb 2016 06:16:15 +0000 (01:16 -0500)]
Fix sshd template on systems with systemd
Systems with systemd have /sbin/init as a symlink pointing to real init.
Sshd template tries to bind-mount special init implementation.
The problem is that one cannot bind-mount to a location that is a symlink.
Fix this by deferencing /sbin/init symling and using that as bind-mount location.
Signed-off-by: Nikolay Martynov <mar.kolya@gmail.com>
Ubuntu [Sat, 20 Feb 2016 02:25:55 +0000 (02:25 +0000)]
lxc: cgfs: handle lxcfs
When containers have lxcfs mounted instead of cgroupfs, we have to
process /proc/self/mountinfo a bit differently. In particular, we
should look for fuse.lxcfs fstype, we need to look elsewhere for the
list of comounted controllers, and the mount_prefix is not a cgroup path
which was bind mounted, so we should ignore it, and named subsystems
show up without the 'name=' prefix.
With this patchset I can start containers inside a privileged lxd
container with lxcfs mounted (i.e. without cgroup namespaces).
Serge Hallyn [Fri, 19 Feb 2016 22:12:47 +0000 (14:12 -0800)]
cgroups: do not fail if setting devices cgroup fails due to EPERM
If we're trying to allow a device which was denied to our parent
container, just continue.
Cgmanager does not help us to distinguish between eperm and other
errors, so just always continue.
We may want to consider actually computing the range of devices
to which the container monitor has access, but OTOH that introduces
a whole new set of complexity to compute access sets.
Serge Hallyn [Mon, 15 Feb 2016 20:15:10 +0000 (12:15 -0800)]
log.c:__lxc_log_set_file: fname cannot be null
fname cannot be passed in as NULL by any of its current callers. If it
could, then build_dir() would crash as it doesn't check for it. So make
sure we are warned if in the future we pass in NULL.
- Ephemeral containers are destroyed on shutdown so we do not destroy them.
- Destroy ephemeral containers with clones: first destroy all the clones, then
destroy the container.
- Ephemeral containers with snapshots cannot be easily handled but we can
probably trust that no one will try to make snapshots of an ephemeral
container.
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org>
Serge Hallyn [Mon, 8 Feb 2016 07:06:10 +0000 (23:06 -0800)]
apparmor: don't fail if current aa label is given
Ideally a container configuration will specify 'unchanged' if
it wants the container to use the current (parent) profile. But
lxd passes its current label. Support that too.
Note that if/when stackable profiles exist, this behavior may
or may not be what we want. But the code to deal with aa
stacking will need some changes anyway so this is ok.
With this patch, I can create nested containers inside a
lxd xenial container both using
Serge Hallyn [Wed, 3 Feb 2016 03:20:05 +0000 (19:20 -0800)]
Comment the lxc_rootfs structure
Comment rootfs.path and rootfs.mount so people can better figure
out which to use.
Remove the unused pivotdir argument from setup_rootfs_pivot_root().
Remove the unused pivot member of the lxc_rootfs struct. And just
return 0 (success) when someone passes a lxc.pivotdir entry. One
day we'll turn that into an error, but not yet...
- The function mount_entry_create_aufs_dirs() moves from conf.c to
lxcaufs.{c,h} where it belongs.
- In accordance with the "aufs_" prefix naming scheme for functions associated
with lxcaufs.{c,h} mount_entry_create_aufs_dirs() becomes aufs_mkdir().
- Add aufs_get_rootfs() which returns the rootfs for an aufs lxc.rootfs.
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org>
In mount_entry_on_generic() we dereferenced a NULL pointer whenever a container
without a rootfs was created. (Since mount_entry_on_systemfs() passes them with
NULL.) We have mount_entry_on_generic() check whether rootfs != NULL.
We also check whether rootfs != NULL in the functions ovl_mkdir() and
mount_entry_create_aufs_dirs() and bail immediately. Rationale: For overlay and
aufs lxc.mount.entry entries users give us absolute paths to e.g. workdir and
upperdir which we create for them. We currently use rootfs->path and the
lxcpath for the container to check that users give us a sane path to create
those directories under and refuse if they do not. If we want to allow overlay
mounts for containers without a rootfs they can easily be reworked.
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org>
Since we allow containers to be created without a rootfs most checks in conf.c
are not sane anymore. Instead of just checking if rootfs->path != NULL we need
to check whether rootfs != NULL.
Minor fixes:
- Have mount_autodev() always return -1 on failure: mount_autodev() returns 0
on success and -1 on failure. But when the return value of safe_mount() was
checked in mount_autodev() we returned false (instead of -1) which caused
mount_autodev() to return 0 (success) instead of the correct -1 (failure).
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org>
Stéphane Graber [Mon, 1 Feb 2016 16:37:24 +0000 (17:37 +0100)]
Remove legacy versions of lxc-ls
lxc-ls nowadays is a C binary so there's no need to keep the python and
shell versions around anymore, remove them from the branch and cleanup
documentation and Makefiles.
Some systems need to be able to bind-mount /run to /var/run
and /run/lock to /var/run/lock. (Tested with opensuse 13.1
containers migrated from openvz.)
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>