Major Hayden [Wed, 2 Sep 2015 21:21:11 +0000 (16:21 -0500)]
Tear down network devices during container halt
On very busy systems, some virtual network devices won't be destroyed after a
container halts. This patch uses the lxc_delete_network() method to ensure
that network devices attached to the container are destroyed when the
container halts.
Without the patch, some virtual network devices are left over on the system
and must be removed with `ip link del <device>`. This caused containers
with lxc.network.veth.pair to not be able to start. For containers using
randomly generated virtual network device names, the old devices will hang
around on the bridge with their original MAC address.
David Ward [Tue, 23 Jun 2015 14:57:19 +0000 (10:57 -0400)]
Only mount /proc if needed, even without a rootfs
Use the same code with and without a rootfs to check if mounting
/proc is necessary before doing so. If mounting it is unsuccessful
and there is no rootfs, continue as before.
Signed-off-by: David Ward <david.ward@ll.mit.edu> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
David Ward [Tue, 23 Jun 2015 14:57:23 +0000 (10:57 -0400)]
Allow autodev without a rootfs
A container without a rootfs is useful for running a collection of
processes in separate namespaces (to provide separate networking as
an example), while sharing the host filesystem (except for specific
paths that are re-mounted as needed). For multiple processes to run
automatically when such a container is started, it can be launched
using lxc-start, and a separate instance of systemd can manage just
the processes inside the container. (This assumes that the path to
the systemd unit files is re-mounted and only contains the services
that should run inside the container.) For this use case, autodev
should be permitted for a container that does not have a rootfs.
Signed-off-by: David Ward <david.ward@ll.mit.edu> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Natanael Copa [Fri, 21 Aug 2015 09:48:10 +0000 (11:48 +0200)]
Clone bridge interface MTU setting
Instead of require static mtu setting in config we simply clone the
existing MTU setting of the bridge interface.
This fixes issue when bridge interface has bigger MTU (like 9000 for
jumbo frame support) than the default 1500. When veth interface is
created it has by default MTU set to 1500 and when this is added to the
bridge, the kernel wee reduce the MTU for the bridge to 1500. We solve
this by cloning the MTU value from bridge interface.
This simplifies managing containers with bridge interface who supports
jumbo frames (mtu 9000) and makes it easier to move containers between
hosts with different MTU settings.
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
If we currently create clone-snapshots via lxc-clone only the plain total
number of the containers it serves as a base-container is written to the file
"lxc-snapshots". This commit modifies mod_rdep() so it will store the paths and
names to the containers that are clone-snapshots (similar to the "lxc_rdepends"
file for the clones). **Users which still have containers that have a non-empty
(with a number > 0 as an entry) "lxc-snapshots" file in the old format are not
affected by this change. It will be used until all old clones have been
deleted!** For all others, the "lxc_snapshots" file placed under the original
container now looks like this:
/var/lib/lxc
bb
/var/lib/lxc
cc
/opt
dd
This is an example of a container that provides the base for three
clone-snapshots bb, cc, and dd. Where bb and cc both are placed in the usual
path for privileged containers and dd is placed in a custom path.
- Add additional argument to function that takes in the clone-snapshotted
lxc_container.
- Have mod_rdep() write the path and name of the clone-snapshotted container the
file lxc_snapshots of the original container.
- If a clone-snapshot gets deleted the corresponding line in the file
lxc_snapshot of the original container will be deleted and the file updated
via mmap() + memmove() + munmap().
- Adapt has_fs_snapshots().
- **If an lxc-snapshot file in the old format is found we'll keep using it.**
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
- Passing the LXC_CLONE_KEEPNAME flag to do_lxcapi_clone() was not respected and
let to unexpected behaviour for e.g. lxc-clone. We wrap
clear_unexp_config_line() and set_config_item_line() in an appropriate
if-condition.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
- This enables the user to destroy a container with all its snapshots without
having to use lxc-snapshot first to destroy all snapshots. (The enum values
DESTROY and SNAP from the previous commit are reused here again.)
- Some unification regarding the usage of exit() and return has been done.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
- lxc_snapshot.c lacked necessary members in the associated lxc_arguments struct
in arguments.h. This commit extends the lxc_arguments struct to include
several parameters used by lxc-snapshot which allows a rewrite that is more
consistent with the rest of the lxc-* executables.
- All tests have been moved beyond the call to lxc_log_init() to allow for the
messages to be printed or saved.
- Some small changes to the my_args struct. (The enum task is set to SNAP (for
snapshot) per default and variables illustrating the usage of the command line
flags are written in all caps.)
- arguments.h has been extended to accommodate a future rewrite of lxc-clone
- Traditional behaviour of the executable has been retained in this commit.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
KATOH Yasufumi [Wed, 19 Aug 2015 11:35:36 +0000 (20:35 +0900)]
doc: Update lxc.cgroup.use in lxc.system.conf(5)
LXC now uses lxc.cgroup.use even when cgmanager is used.
So remove the description for the case of using cgmanager.
And add the case of not specifying it.
This commit only updates en and ja man pages.
Robert Schiele [Fri, 21 Aug 2015 05:35:34 +0000 (07:35 +0200)]
check for NULL pointers before calling setenv()
Latest glibc release actually honours calling setenv with a NULL
pointer by causing SIGSEGV but checking pointers before submitting
to any system function is a good idea anyway.
Signed-off-by: Robert Schiele <rschiele@gmail.com>
Tycho Andersen [Fri, 14 Aug 2015 16:24:47 +0000 (10:24 -0600)]
c/r: enable tracefs
tracefs is a new filesystem that can be mounted by users. Only the options
and fs name need to be passed to restore the state, so we can use criu's
auto fs feature.
Tycho Andersen [Mon, 10 Aug 2015 17:12:18 +0000 (11:12 -0600)]
c/r: get rid of dump_net_info()
This was originally used to propagate the bridge and veth names across
hosts, but now we extract both from the container's config file, and
nothing reads the files that dump_net_info() writes, so let's just get rid
of them.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
reuse label cleanup since free(NULL) is a no-op Signed-off-by: Arjun Sreedharan <arjun024@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
When setting lxc.network.veth.pair to get a fixed interface
name the recreation of it after a reboot caused an EEXIST.
-) The reboot flag is now a three-state value. It's set to
1 to request a reboot, and 2 during a reboot until after
lxc_spawn where it is reset to 0.
-) If the reboot is set (!= 0) within instantiate_veth and
a fixed name is used, the interface is now deleted before
being recreated.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Robert LeBlanc [Thu, 13 Aug 2015 19:36:55 +0000 (13:36 -0600)]
Caps are getting lost when cloning an LXC. Adding the -X parameter copies the extended attributes. This allows things like ping to continue to be used by a non-privilged user in Debian at least.
- This enables the user to destroy a container with all its snapshots without
having to use lxc-snapshot first to destroy all snapshots. (The enum values
DESTROY and SNAP from the previous commit are reused here again.)
- Some unification regarding the usage of exit() and return has been done.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Passing the LXC_CLONE_KEEPNAME flag to do_lxcapi_clone() was not respected. We
wrap clear_unexp_config_line() and set_config_item_line() in an appropriate
if-condition.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
- This commit adapts lxc-clone to be similiar in usage and feel to the other
lxc-* executables. It builds on the previous extension of the lxc_argument
struct and now uses the default lxc_arguments_parse() function.
- Options which were not used have been removed.
- The LXC_CLONE_KEEPNAME flag was not respected in the previous version of
lxc-clone. The culprit is a missing if-condition in lxccontainer.c. As this
requires a change in one of the API functions in lxccontainer.c it will be
addressed in a follow-up commit.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
- lxc_snapshot.c lacked necessary members in the associated lxc_arguments struct
in arguments.h. This commit extends the lxc_arguments struct to include
several parameters used by lxc-snapshot which allows a rewrite that is more
consistent with the rest of the lxc-* executables.
- All tests have been moved beyond the call to lxc_log_init() to allow for the
messages to be printed or saved.
- Some small changes to the my_args struct. (The enum task is set to
SNAP (for snapshot) per default and variables illustrating the usage of the
command line flags are written in all caps.)
- arguments.h has been extended to accommodate a rewrite of lxc-clone
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Jiri Slaby [Wed, 5 Aug 2015 08:32:54 +0000 (10:32 +0200)]
templates: lxc-opensuse, use rpm to determine build version
zypper info's output is not usable for several reasons:
* it is localized -- there is no "Version: " in my output
* it shows results both from the repo and local system
So use plain rpm to determine whether build is installed and if proper
version is in place.
This commit adds an -R, --rename option to lxc-clone to rename a container. As
c->rename calls do_lxcapi_rename() which in turn calls do_lxcapi_clone() it
seemed best to implement it in lxc-clone rather than lxc-snapshot which also
calls do_lxcapi_clone(). Some additional unification regarding the usage of
return vs exit() in main() was done.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
1) Two checks on amd64 for whether compat_ctx has already
been generated were redundant, as compat_ctx is generally
generated before entering the parsing loop.
2) With introduction of reject_force_umount the check for
whether the syscall has the same id on both native and
compat archs results in false behavior as this is an
internal keyword and thus produces a -1 on
seccomp_syscall_resolve_name_arch().
The result was that it was added to the native architecture
twice and never to the 32 bit architecture, causing it to
have no effect on 32 bit containers on 64 bit hosts.
3) I do not see a reason to care about whether the syscalls
have the same number on the two architectures. On the one
hand this check was there to avoid adding it to two archs
(and effectively leaving one arch unprotected), while on
the other hand it seemed to be okay to add it to the
same arch *twice*.
The entire architecture checking branches are now reduced to
three simple cases: 'native', 'non-native' and 'all'. With
'all' adding to both architectures regardless of the syscall
ID.
Also note that libseccomp had a bug in its architecture
checking, so architecture related filters weren't working as
expected before version 2.2.2, which may have contributed to
the confusion in the original architecture-related code.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
CVE-2015-1334: Don't use the container's /proc during attach
A user could otherwise over-mount /proc and prevent the apparmor profile
or selinux label from being written which combined with a modified
/bin/sh or other commonly used binary would lead to unconfined code
execution.
Reported-by: Roman Fiedler Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>