Ubuntu [Sat, 20 Feb 2016 02:25:55 +0000 (02:25 +0000)]
lxc: cgfs: handle lxcfs
When containers have lxcfs mounted instead of cgroupfs, we have to
process /proc/self/mountinfo a bit differently. In particular, we
should look for fuse.lxcfs fstype, we need to look elsewhere for the
list of comounted controllers, and the mount_prefix is not a cgroup path
which was bind mounted, so we should ignore it, and named subsystems
show up without the 'name=' prefix.
With this patchset I can start containers inside a privileged lxd
container with lxcfs mounted (i.e. without cgroup namespaces).
Serge Hallyn [Fri, 19 Feb 2016 22:12:47 +0000 (14:12 -0800)]
cgroups: do not fail if setting devices cgroup fails due to EPERM
If we're trying to allow a device which was denied to our parent
container, just continue.
Cgmanager does not help us to distinguish between eperm and other
errors, so just always continue.
We may want to consider actually computing the range of devices
to which the container monitor has access, but OTOH that introduces
a whole new set of complexity to compute access sets.
Serge Hallyn [Mon, 15 Feb 2016 20:15:10 +0000 (12:15 -0800)]
log.c:__lxc_log_set_file: fname cannot be null
fname cannot be passed in as NULL by any of its current callers. If it
could, then build_dir() would crash as it doesn't check for it. So make
sure we are warned if in the future we pass in NULL.
- Ephemeral containers are destroyed on shutdown so we do not destroy them.
- Destroy ephemeral containers with clones: first destroy all the clones, then
destroy the container.
- Ephemeral containers with snapshots cannot be easily handled but we can
probably trust that no one will try to make snapshots of an ephemeral
container.
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org>
Serge Hallyn [Mon, 8 Feb 2016 07:06:10 +0000 (23:06 -0800)]
apparmor: don't fail if current aa label is given
Ideally a container configuration will specify 'unchanged' if
it wants the container to use the current (parent) profile. But
lxd passes its current label. Support that too.
Note that if/when stackable profiles exist, this behavior may
or may not be what we want. But the code to deal with aa
stacking will need some changes anyway so this is ok.
With this patch, I can create nested containers inside a
lxd xenial container both using
Serge Hallyn [Wed, 3 Feb 2016 03:20:05 +0000 (19:20 -0800)]
Comment the lxc_rootfs structure
Comment rootfs.path and rootfs.mount so people can better figure
out which to use.
Remove the unused pivotdir argument from setup_rootfs_pivot_root().
Remove the unused pivot member of the lxc_rootfs struct. And just
return 0 (success) when someone passes a lxc.pivotdir entry. One
day we'll turn that into an error, but not yet...
- The function mount_entry_create_aufs_dirs() moves from conf.c to
lxcaufs.{c,h} where it belongs.
- In accordance with the "aufs_" prefix naming scheme for functions associated
with lxcaufs.{c,h} mount_entry_create_aufs_dirs() becomes aufs_mkdir().
- Add aufs_get_rootfs() which returns the rootfs for an aufs lxc.rootfs.
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org>
In mount_entry_on_generic() we dereferenced a NULL pointer whenever a container
without a rootfs was created. (Since mount_entry_on_systemfs() passes them with
NULL.) We have mount_entry_on_generic() check whether rootfs != NULL.
We also check whether rootfs != NULL in the functions ovl_mkdir() and
mount_entry_create_aufs_dirs() and bail immediately. Rationale: For overlay and
aufs lxc.mount.entry entries users give us absolute paths to e.g. workdir and
upperdir which we create for them. We currently use rootfs->path and the
lxcpath for the container to check that users give us a sane path to create
those directories under and refuse if they do not. If we want to allow overlay
mounts for containers without a rootfs they can easily be reworked.
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org>
Since we allow containers to be created without a rootfs most checks in conf.c
are not sane anymore. Instead of just checking if rootfs->path != NULL we need
to check whether rootfs != NULL.
Minor fixes:
- Have mount_autodev() always return -1 on failure: mount_autodev() returns 0
on success and -1 on failure. But when the return value of safe_mount() was
checked in mount_autodev() we returned false (instead of -1) which caused
mount_autodev() to return 0 (success) instead of the correct -1 (failure).
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org>
Stéphane Graber [Mon, 1 Feb 2016 16:37:24 +0000 (17:37 +0100)]
Remove legacy versions of lxc-ls
lxc-ls nowadays is a C binary so there's no need to keep the python and
shell versions around anymore, remove them from the branch and cleanup
documentation and Makefiles.
Some systems need to be able to bind-mount /run to /var/run
and /run/lock to /var/run/lock. (Tested with opensuse 13.1
containers migrated from openvz.)
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
- make free_mnts() work directly on the globals mnt_table and mnt_table_size
- have free_mnts() set mnt_table = NULL and mnt_table_size = 0 when its done to
avoid double frees
- simplify error-handling in do_clone_ephemeral()
- do_clone_ephemeral(): when chmod() falls to set permissions on the temporary
folder we created for mkdtemp() remove the folder
- simplify error handling in main()
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org>
Serge Hallyn [Sun, 31 Jan 2016 15:33:30 +0000 (16:33 +0100)]
cgfs: always handle named subsystems by default
Previously, name= controllers would be handled if lxc.cgroup.use=@all,
but not if lxc.cgroup.use was unspecified. Change that, since you cannot
run systemd in a container without it.
Fix message after {fedora|centos}container creation
If the backingstore is not 'dir', then lxc shouldn't ask the user
to change the password by performing a 'chroot'. Rather, the user
should start, attach, use the passwd command, and then stop the
container.
Serge Hallyn [Thu, 14 Jan 2016 07:48:57 +0000 (07:48 +0000)]
fork off a task to delete ovs ports when done
The new task waits until the container is STOPPED, then asks
openvswitch to delete the port.
This requires two new arguements to be sent to lxc-user-nic.
Since lxc-user-nic ships with lxc, this shouldn't be a problem.
Finally when calling lxc-user-nic, use execlp insteac of execvp
to preserve lxcpath's const-ness. Technically we are
guaranteed that execvp won't change the args, but it's worth
it to silence the warnings (and not hide real errors).
With this patch, container nics are cleaned up from openvswitch
bridges on shutdown.
- With the -g/--groups argument the user can give a comma-separated list of
groups MUST a container must have in order to be displayed. We receive
this list as a single string. ls_has_all_grps() is called to check if a
container has all the groups of MUST in its current list of groups HAS. I.e.
we determine whether MUST ⊆ HAS and only then do we record the container.
The original implementation was dumb in that it split the string MUST
everytime it needed to check whether MUST ⊆ HAS for a given container. That's
pointless work. Instead we split the string MUST only once in main() and pass
it to ls_get() which passes it along to ls_has_all_grps().
- Before doing any costly checking make sure that #MUST <= #HAS. If not bail
immediately.
- The linear search algorithm ls_has_all_grps() currently uses stays for now.
Binary search et al. do not seem to make sense since sorting the array HAS
for each container is probably too costly. Especially, since it seems
unlikely that a users specifies 50+ or so groups on the command line a
container must have to be displayed. If however there are a lot of use-cases
where users have a lot of containers each with 50-100 groups and regularly use
lxc-ls with -g/--groups to only show containers that have 50 specified groups
among their 50-100 groups we can revisit this issue and implement e.g. binary
search or a ternary search tree.
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
In the Python implementation users could pass a regex without a parameter flag
as additional argument on the command line. The C implementation gained the
flag -r/--regex for this. To not irritate users we restore the old behaviour
and additionally rename -r/--regex to --filter to allow eplicitly passing the
regex.
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
- If lxc_container_new() fails we check for ENOMEM and if so goto out. If
ENOMEM is not set we will simply continue. The same goes for the call to
regcomp() but instead of checking for ENOMEM we need to check for REG_ESPACE.
- Tweaking: Since lxc-ls might have to gather a lot of containers and I don't
know if compilers will always optimize this let's move *some* variable
declarations outside of the loop when it does not hinder readability
- Set ls_nesting to 0 initially. Otherwise users will always see nested
containers printed.
- ls_get() gains an argument char **lockpath which is a string pointing us to
the lock we put under /run/lxc/lock/.../... so that we can remove the lock
when we no longer need it. To avoid pointless memory allocation in each new
recursion level we share lockpath amongst all non-fork()ing recursive call to
ls_get(). As it is not guaranteed that realloc() does not do any memory
moving when newlen == len_lockpath, we give ls_get() an additional argument
size_t len_lockpath). Every time we have a non-fork()ing recursive call to
ls_get() we check if newlen > len_lockpath and only then do we
realloc(*lockpath, newlen * 2) a reasonable chunk of memory (as the path will
keep growing) and set len_lockpath = newlen * 2 to pass to the next
non-fork()ing recursive call to ls_get().
To avoid keeping a variable char *lockpath in main() which serves no purpose
whatsoever and might be abused later we use a compound literal
&(char *){NULL} which gives us an anonymous pointer which we can use for
memory allocation in ls_get() for lockpath. We can conveniently free() it in
ls_get() when the nesting level parameter lvl == 0 after exiting the loop.
The advantage is that the variable is only accessible within ls_get() and not
in main() while at the same time giving us an easy way to share lockpath
amongst all non-fork()ing recursive calls to ls_get().
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>