Tycho Andersen [Mon, 30 Nov 2015 22:14:22 +0000 (15:14 -0700)]
c/r: add a new ->migrate API call
This patch adds a new ->migrate API call with three commands:
MIGRATE_DUMP: this is basically just ->checkpoint()
MIGRATE_RESTORE: this is just ->restore()
MIGRATE_PRE_DUMP: this can be used to invoke criu's pre-dump command on the
container.
A small addition to the (pre-)dump commands is the ability to specify a
previous partial dump directory, so that one can use a pre-dump of a
container.
Finally, this new API call uses a structure to pass options so that it can
be easily extended in the future (e.g. to CRIU's --leave-frozen option in
the future, for potentially smarter failure handling on restore).
v2: remember to flip the return code for legacy ->checkpoint and ->restore
calls
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Tycho Andersen [Wed, 2 Dec 2015 21:30:52 +0000 (14:30 -0700)]
api wrapper: only reset the current config if this call set it
Instead of *always* resetting the current_config to null, we should only
reset it if this API call set it.
This allows nesting of API calls, e.g. c->checkpoint() can pass stuff into
criu.c, which can call c->init_pid() and not lose the ability to log stuff
afterwards.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Wed, 2 Dec 2015 22:42:36 +0000 (22:42 +0000)]
seccomp: support 32-bit arm on arm64, and 32-bit ppc on ppc64
Generally we enforce that a [arch] seccomp section can only be used on [arch].
However, on amd64 we allow [i386] sections for i386 containers, and there we
also take [all] sections and apply them for both 32- and 64-bit.
Do that also for ppc64 and arm64. This allows seccomp-protected armhf
containers to run on arm64.
fli [Tue, 1 Dec 2015 11:17:29 +0000 (19:17 +0800)]
lxc: let lxc-start support wlan phys
The commit: e5848d395cb <netdev_move_by_index: support wlan> only
made netdev_move_by_name support wlan, instead of netdev_move_by_index.
Given netdev_move_by_name is a wrapper of netdev_move_by_index, so here
replacing all of the call to lxc_netdev_move_by_index with lxc_netdev_move_by_name
to let lxc-start support wlan phys.
Signed-off-by: fupan li <fupan.li@windriver.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
If manual mounting with elevated permissions is required
this can currently only be done in pre-start hooks or before
starting LXC. In both cases the mounts would appear in the
host's namespace.
With this flag the namespace is unshared before the startup
sequence, so that mounts performed in the pre-start hook
don't show up on the host.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Fri, 20 Nov 2015 05:34:09 +0000 (00:34 -0500)]
debian: Fix container creation on missing cache
This is currently breaking our daily image builds which happen in a
perfectly clean environment without a Debian keyring and without
anything in /var/cache/lxc
Serge Hallyn [Tue, 17 Nov 2015 21:05:05 +0000 (15:05 -0600)]
lxcapi_clone: restore the unexpanded config len
Otherwise it gets shortened with the temporary len but never
restored - which will only break API users which do a clone
then continue to use the original container, meaning this is
a hard one to detect.
Serge Hallyn [Tue, 17 Nov 2015 18:59:05 +0000 (12:59 -0600)]
Better handle preserve_ns behavior
Commit b6b2b194a8 preserves the container's namespaces for
possible later use in stop hook. But some kernels don't have
/proc/pid/ns/ns for all the namespaces we may be interested in.
So warn but continue if this is the case.
Implement stgraber's suggested semantics.
- User requests some namespaces be preserved:
- If /proc/self/ns is missing => fail (saying kernel misses setns)
- If /proc/self/ns/<namespace> entry is missing => fail (saying kernel misses setns for <namespace>)
- User doesn't request some namespaces be preserved:
- If /proc/self/ns is missing => log an INFO message (kernel misses setns) and continue
- If /proc/self/ns/<namespace> entry is missing => log an INFO message (kernel misses setns for <namespace>) and continue
Serge Hallyn [Wed, 11 Nov 2015 17:13:25 +0000 (17:13 +0000)]
clone: clear the rootfs out of unexpanded config
Closes #694
When we start cloning container c1 to c2, we first save c1's
configuration in c2's as a starting point. We long ago cleared
out the lxc.rootfs entry before saving it, so that if we are
killed before we update the rootfs, c2's rootfs doesn't point
to c1's. Because then lxc-destroy -n c2 would delete c1's rootfs.
But when we introduced the unexpanded_config, we didn't update
this code to clear the rootfs out of the unexpanded_config, which
is what now actually gets saved in write_config().
When we create a random container directory with mkdtemp() we set the mode to
0770 otherwise do_lxcapi_clone() will complain about not being able to create
the config.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
When the clone failed we tried to destroy the container. This will lead to a
segfault. Instead simply return -1. Also move the call to free_mnts() after the
put label to free the user specified mounts even when we just goto put.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
This is a complete reimplementation of lxc-clone and lxc-start-ephemeral.
lxc-copy merges the functionalities of lxc-clone + lxc-start-ephemeral.
(1) Cloning containers:
(a) as copy:
lxc-copy -n aa -N bb
(b) as snapshot:
lxc-copy -n aa -N bb -s
(2) Renaming containers:
lxc-copy -n aa -N bb -R
(3) Starting ephemeral containers:
Ephemeral containers are created and started by passing the flag -e /
--ephemeral. Whenever this flag is missing a copy of the container is created.
The flag -e / --ephemeral implies -s / --snapshot.
(a) start ephemeral container daemonized with random name:
lxc-copy -n aa -e
(b) start ephemeral container in foreground mode with random name:
lxc-copy -n aa -e -F
(c) start ephemeral container with specified name in daemonized mode:
Analogous to lxc-start ephemeral containers start in daemonized
mode per default:
lxc-copy -n aa -N bb -e
One can however also explicitly pass -d / --daemon:
lxc-copy -n aa -N bb -e -d
but both commands are equivalent.
(d) start non-ephemeral container in daemonized mode:
lxc-copy -n aa -D -e
(e) start ephemeral container in daemonized mode and keep the original
hostname:
lxc-copy -n aa -K -e
(f) start ephemeral container in daemonized mode and keep the
MAC-address of the original container:
lxc-copy -n aa -M -e
(g) start ephemeral container with custom mounts (additional mounts can
be of type {bind,aufs,overlay}) in daemonized mode:
lxc-copy -n aa -e -m bind=/src:/dest:ro,aufs=/src:/dest,overlay=/src:/dest
(4) Other options:
lxc-copy --help
In order to create a random containername and random upper- and workdirs for
custom mounts we use mkdtemp() to not just create the names but also directly
create the corresponding directories. This will be safer and make the code
considerably shorter.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Thu, 12 Nov 2015 17:44:38 +0000 (12:44 -0500)]
ubuntu-cloud: Various fixes
- Update list of supported releases
- Make the fallback release trusty
- Don't specify the compression algorithm (use auto-detection) so that
people passing tarballs to the template don't see regressions.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Virgil Dupras [Tue, 10 Nov 2015 02:23:51 +0000 (21:23 -0500)]
Fetch Debian archive GPG keyrings when they're not available
When running the debian template on a non-debian host, it's usual not to
have debian-archive-keyring.gpg. When that happens, we skip the
signature checking of the release, which is dangerous because it's made over
HTTP.
This commit adds automatic fetching of Debian release keys.
Tycho Andersen [Sat, 7 Nov 2015 00:26:43 +0000 (17:26 -0700)]
c/r: use freezer to seize tasks
Instead of relying on the old ptrace loop, we should instead put all the
tasks in the container into the freezer. This will stop them all at the
same time, preventing fork bombs from causing criu to infinite loop (and is
also simply a lot faster).
Note that this uses --freeze-cgroup which isn't in criu 1.7, so it should
only go into master.
Tycho Andersen [Fri, 6 Nov 2015 20:50:33 +0000 (13:50 -0700)]
define PR_SET_MM_MAP & friends if necessary
PR_SET_MM_MAP only went in to the kernel at 3.18 (or 3.19), so we need to
define these for kernels before then. If there was an error, the code
simply logs the failure and continues on.
Tycho Andersen [Fri, 6 Nov 2015 19:34:47 +0000 (12:34 -0700)]
use PR_SET_MM_MAP instead of PR_SET_MM
PR_SET_MM_MAP can be called as non-root, which we are in the unprivileged
(or nested) case.
Also, let's not do the strcpy() for the new cmdline until after we're sure
the prctl succeeded. This means that even if it does fail, we won't
mutilate the command line like we did before, it just won't be as pretty.
v2: remember to chop off bits of the string that are too long
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
(1) This commit fixes the calculations when updating paths in lxc.hooks.*
entries. We now also update conf->unexpandend_alloced which hasn't been
done prior to this commit.
(2) Also we use the stricter check:
if (p >= lend)
continue;
This should deal better with invalid config files.
(3) Insert some spaces between operators to increase readability.
(4) Use gotos to simplify function and increase readability.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
When using overlay and aufs mounts with lxc.mount.entry users have to specify
absolute paths for upperdir and workdir which will then get created
automatically by mount_entry_create_overlay_dirs() and
mount_entry_create_aufs_dirs() in conf.c. When we clone a container with
overlay or aufs lxc.mount.entry entries we need to update these absolute paths.
In order to do this we add the function update_ovl_paths() in
lxccontainer.c. The function updates the mounts in two locations:
If we were to only update 2) we would end up with wrong upperdir and workdir
mounts as the absolute paths would still point to the container that serves as
the base for the clone. If we were to only update 1) we would end up with wrong
upperdir and workdir lxc.mount.entry entries in the clone's config as the
absolute paths in upperdir and workdir would still point to the container that
serves as the base for the clone. Updating both will get the job done.
NOTE: This function does not sanitize paths apart from removing trailing
slashes. (So when a user specifies //home//someone/// it will be cleaned to
//home//someone. This is the minimal path cleansing which is also done by
lxc_container_new().) But the mount_entry_create_overlay_dirs() and
mount_entry_create_aufs_dirs() functions both try to be extremely strict about
when to create upperdirs and workdirs. They will only accept sanitized paths,
i.e. they require /home/someone. I think this is a (safety) virtue and we
should consider sanitizing paths in general. In short: update_ovl_paths() does
update all absolute paths to the new container but
mount_entry_create_overlay_dirs() and mount_entry_create_aufs_dirs() will still
refuse to create upperdir and workdir when the updated path is unclean. This
happens easily when e.g. a user calls lxc-clone -o OLD -n NEW -P
//home//chb///.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
This functions updates absolute paths for overlay upper- and workdirs so users
can simply clone and start new containers without worrying about absolute paths
in lxc.mount.entry overlay entries.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Jakub Sztandera [Fri, 30 Oct 2015 11:05:44 +0000 (12:05 +0100)]
arch template: Fix systemd-sysctl service
The systemd-sysctl service includes condition that /proc/sys/ has to be read-write.
In lxc only /proc/sys/net/ is read-write which causes the condition to fail and service not to run.
This patch changes the check to /proc/sys/net/ and makes the service apply only rules that are in net tree.
Instead of duplicating the cleanup-code, once for success and once for failure,
simply keep a variable fret which is -1 in the beginning and gets set to 0 on
success or stays -1 on failure.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
The mount_entry_overlay_dirs() and mount_entry_aufs_dirs() functions create
workdirs and upperdirs for overlay and aufs lxc.mount.entry entries. They try
to make sure that the workdirs and upperdirs can only be created under the
containerdir (e.g. /path/to/the/container/CONTAINERNAME). In order to do this
the right hand side of
was thought to check if the rootfs->path is not present in the workdir and
upperdir mount options. But the current check is bogus since it will be
trivially true whenever the container is a block-dev or overlay or aufs backed
since the rootfs->path will then have a form like e.g.
overlayfs:/some/path:/some/other/path
This patch adds the function ovl_get_rootfs_dir() which parses rootfs->path by
searching backwards for the first occurrence of the delimiter pair ":/". We do
not simply search for ":" since it might be used in path names. If ":/" is not
found we assume the container is directory backed and simply return
strdup(rootfs->path).
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Thu, 15 Oct 2015 18:56:17 +0000 (18:56 +0000)]
Ignore trailing /init.scope in init cgroups
The lxc monitor does not store the container's cgroups, rather it
recalculates them whenever needed.
Systemd moves itself into a /init.scope cgroup for the systemd
controller.
It might be worth changing that (by storing all cgroup info in the
lxc_handler), but for now go the hacky route and chop off any
trailing /init.scope.
I definately thinkg we want to switch to storing as that will be
more bullet-proof, but for now we need a quick backportable fix
for systemd 226 guests.
The mount_entry_create_*_dirs() functions currently assume that the rootfs of
the container is actually named "rootfs". This has the consequence that
del = strstr(lxcpath, "/rootfs");
if (!del) {
free(lxcpath);
lxc_free_array((void **)opts, free);
return -1;
}
*del = '\0';
will return NULL when the rootfs of a container is not actually named "rootfs".
This means the we return -1 and do not create the necessary upperdir/workdir
directories required for the overlay/aufs mount to work. Hence, let's not make
that assumption. We now pass lxc_path and lxc_name to
mount_entry_create_*_dirs() and create the path directly. To prevent failure we
also have mount_entry_create_*_dirs() check that lxc_name and lxc_path are not
empty when they are passed in.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>