git.proxmox.com Git - mirror

c/r: factor out network dump/restore code

Break the monolithic ->checkpoint and ->restore functions into smaller ones.
This is in preparation for the checkpoint/restore tty work, which has a similar
need to dump information outside of criu.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

netdev_move_by_index: support wlan

The python lxc-device supported adding wlan devices, so add that
support as well. Since the python one did not support 'del',
I didn't try adding that support, though it should be trivial to
add.

We should be able to do the wlan adding using netlink, but I
went ahead and used 'iw' as the netlink path looked more
complicated than it does for other nics. Patches to switch that
over would be very welcome.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

lxccontainer.c: rename enter_to_ns to enter_net_ns

because that's what it does

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

lxc-device: rewrite lxc-device.

As there is a function named attach_interface to pass
a interface to container now, we do not need to relay on
python impolementation for lxc-device any more.

changelog: 10/15/2014: serge: fail immediately if run as non-root.
changelog: 10/15/2014: serge: add explicit error message on bad usage (fix build failure)

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

python-lxc: Add [at|de]tach_interface() to python binding.

Changelog: 10/15/2014: serge: make ifname mandatory for detach_interface.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

container: introduce two functions named as {at/de}tach_interface().

Currently, we depends on ip command to attach interface to container.
It means we only implemented it by python.

This patch implement adding and removing interface by c and added
them in struct container.

Changelog: 10/15/2014 (serge): return error if ifname is NULL.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

utils: move useful helper functions from lxccontainer to utils.

Function of enter_to_ns() is useful but currently is static for
lxccontainer.c.

This patch split it into two parts named as switch_to_newuser()
and switch_to_newnet() into utils.c.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

network: introduce a interface named lxc_netdev_isup().

When we need to know some info about a netdev, such as is_up or not,
we need to read the flag for the netdev.

This patch introduce a interface function named lxc_netdev_isup()
to check is a netdev up or down.

And introduce a network private function named netdev_get_flag()
to get flag for netdev by netlink.

Changelog: 10/15/2015: Return failure if name==NULL to avoid later strlen fun

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

network: allow lxc_network_move_by_index() rename netdev in moving.

In netlink, we can set the dest_name of netdev when move netdev
between namespaces in one netlink request. And moving a netdev of
a src_name to a netdev with a dest_name is a common usecase.

So this patch add a parametaer to lxc_network_move_by_index() to
indicate the dest_name for the movement. NULL means same with
the src_name.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

lxc_start: ERROR if container is already running.

We should exit with a error when starting a running container.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

network: check result of if_nametoindex().

When we want to get index of a ifname which does not
exist, we should return a -EINVAL in this case.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

network: convert param ifname to const.

We should not modify ifname in lxc_netdev_move_by_name(),
making it as const in param list will make our code more
robust.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

conf.c: Define MS_PRIVATE for Android

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

fix lxc.mount.auto clearing

the way config_mount was structured, sending 'lxc.mount.auto = '
ended up actually clearing all lxc.mount.entrys. Fix that by
moving the check for an empty value to after the subkey checks.
Then, actually do the clearing of auto_mounts in config_mount_auto.

The 'strlen(subkey)' check being removed was bogus - the subkey
either known to be 'lxc.mount.entry', else subkey would have been
NULL (and forced a return in the block above).

This would have been clearer if the config_mount() and helper
fns were structured like the rest of confile.c. It's tempting
to switch it over, but there are subtleties in there so it's
not something to do without a lot of thought and testing.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

lxc-test-apparmor-mount: don't clear out /etc/lxc/lxc-usernet

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

lxc-test-unpriv: test for different cgroups per subsystem

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

systemd/selinux init scripts fixups

- RHEL/OL 7 doesn't have the ifconfig command by default so have the
  lxc-net script check for its existence before use, and fall back
  to using the ip command if ifconfig is not available

- When lxc-net is run from systemd on a system with selinux enabled,
  the mkdir -p ${varrun} will create /run/lxc as init_var_run_t which
  dnsmasq can't write its pid into, so we restorecon it
  after creation (to var_run_t)

- The lxc-net systemd .service file needs an [Install] section so that
  "systemctl enable lxc-net" will work

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>

lxc-checkpoint: close stdout/stdin when daemonizing

If we don't close these running lxc-checkpoint via:

ssh host "sudo lxc-checkpoint ..."

just hangs. We leave stderr open so that subesquent errors will print correctly
(and also because for whatever reason it doesn't break ssh :).

Signed-off-by: Tycho Andersen <tycho.andersen at canonical.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

restore: create cgroups for criu

Previously, we let criu create the cgroups for a container as it was restoring
things. In some cases (i.e. migration across hosts), if the container being
migrated was in /lxc/u1-3, it would be migrated to the target host in
/lxc/u1-3, even if there was no /lxc/u1-2 (or worse, if there was already an
alive container in u1-3).

Instead, we use lxc's cgroup_create, and then tell criu where to restore to.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

restore: Hoist handler to function level

On Tue, Oct 07, 2014 at 07:33:07PM +0000, Tycho Andersen wrote:
> This commit is in preparation for the cgroups create work, since we will need
> the handler in both the parent and the child. This commit also re-works how
> errors are propagated to be less verbose.

Here is an updated version:

From 941623498a49551411ccf185146061f3f37d3a67 Mon Sep 17 00:00:00 2001
From: Tycho Andersen <tycho.andersen@canonical.com>
Date: Tue, 7 Oct 2014 19:13:51 +0000
Subject: [PATCH 1/2] restore: Hoist handler to function level

This commit is in preparation for the cgroups create work, since we will need
the handler in both the parent and the child. This commit also re-works how
errors are propagated to be less verbose.

v2: rename error to has_error, handle it correctly, and remove some diff noise

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

criu: DECLARE_ARG should check for null arguments

This is in preparation for the cgroups creation work, but also probably just a
good idea in general. The ERROR message is handy since we print line nos. it
will to give people an indication of what arg was null.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

lxc-test-unpriv: don't clear out /etc/lxc/lxc-usernet

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

lxc: don't call pivot_root if / is on a ramfs

pivot_root can't be called if / is on a ramfs. Currently chroot is
called before pivot_root. In this case the standard well-known
'chroot escape' technique allows to escape a container.

I think the best way to handle this situation is to make following actions:
* clean all mounts, which should not be visible in CT
* move CT's rootfs into /
* make chroot into /

I don't have a host, where / is on a ramfs, so I can't test this patch.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

cgmanager: several fixes

These all fix various ways that cgroup actions could fail if an
unprivileged user's cgroup paths were not all the same for all
controllers.

1. in cgm_{g,s}et, use the right controller, not the first in the list,
   to get the cgroup path.

2. when we pass 'all' to cgmanager for a ${METHOD}_abs, make sure that all
   cgroup paths are the same.  That isn't necessary for methods not
   taking an absolute path, so split up the former
   cgm_supports_multiple_controllers() function into two booleans, one
   telling whether cgm supports it, and another telling us whether
   cgm supports it AND all controller cgroup paths are the same.

3. separately, do_cgm_enter with abs=true couldn't work if all
   cgroup paths were not the same.  So just ditch that helper and
   call lxc_cgmanager_enter() where needed, because the special
   cases would be more complicated.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

fix: grep not match interface listed by `ip link list`

Interfaces listed by `ip link list` are prefixed with the index
identifier. The pattern "^$BRNAME" does not match.

- dependencies to ifconfig and ip removed
- wait until interface flagged with IFF_UP

Ref: https://github.com/torvalds/linux/blob/master/include/uapi/linux/if.h

Signed-off-by: Joshua Brunner <j.brunner@nexbyte.com>

tests: Fix unpriv test

Don't use $TUSER as it's not defined. Also don't include
lxc-test-usernic in extra_DIST.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

change version to 1.1.0.alpha1 in configure.ac

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

pivot_root: umount ., not /

This fixes pivot_root on 3.11 and older kernels.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

sysconfig/lxc: Reverse sourcing logic

This prevents scripts running with -e to fail when lxc-net doesn't
exist.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

apparmor: restrict signal and ptrace for processes

Restrict signal and ptrace for processes running under the container
profile. Rules based on AppArmor base abstraction. Add unix rules for
processes running under the container profile.

Signed-off-by: Jamie Strandboge <jamie@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

add file/func/line to debug info

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

fixups to init script rework

- move action() from common to sysvinit wrapper since its only really
  applicable for sysvinit and not the other init systems

- fix bug in action() fallback, need to shift away msg before executing action

- make lxc-net 98 so it starts before lxc-container (99), otherwise the lxcbr0
  won't be available when containers are autostarted

- make the default RUNTIME_PATH be /var/run instead of /run. On older
  distros (like ol6.5) /run doesn't exist. lxc-net will create this directory
  and attempt to create the dnsmasq.pid file in it, but this will fail when
  SELinux is enabled because the directory will have the default_t type.
  Newer systems have /var/run symlinked to /run so you get to the same place
  in that case.

- add %postun to remove lxc-dnsmasq user when pkgs are removed

- fix bug in lxc-oracle template that was creating /var/lock/subsys/lxc as
  a dir and interfering with the init scripts

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Rework init scripts

This commit is based on the work of:
Signed-off-by: Michael H. Warfield <mhw@WittsEnd.com>
A generic changelog would be:
- Bring support for lxcbr0 to all distributions
- Share the container startup and network configuration logic across
   distributions and init systems.
- Have all the init scripts call the helper script.
- Support for the various different distro-specific configuration
   locations to configure lxc-net and container startup.

Changes on top of Mike's original version:
- Remove sysconfig/lxc-net as it's apparently only there as a
   workaround for an RPM limitation and is breaking Debian systems by
   including a useless file which will get registered as a package provided
   conffile in the dpkg database and will therefore cause conffile prompts
   on upgrades...
- Go with a consistant coding style in the various init scripts.
- Split out the common logic from the sysvinit scripts and ship both in
   their respective location rather than have them be copies.
- Fix the upstart jobs so they actually work (there's no such thing as
   libexec on Debian systems).

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

doc: Drop lxc.pivotdir from Japanese lxc.container.conf(5)

Update for commit 2d489f9

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

apparmor: silence 'silent' mount denials

newer lxc uses 'silent' when remounting on shutdown. Silence that denial too

Author: Jamie Strandboge <jamie@canonical.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Include network prefix when ipv4/ipv6 keys are queried

Signed-off-by: Sergio Jimenez <tripledes@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

add src/python-lxc/setup.py into .gitignore

Signed-off-by: S.Çağlar Onur <caglar@10ur.org>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Fix presentation of IPv6 addresses and gateway

Signed-off-by: Andre Nathan <andre@digirati.com.br>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Initialize cgroups on lxc-checkpoint -r

With cgmanager, the cgroups are polled on demand, so these steps aren't needed.
However, with cgfs, lxc doesn't know about the cgroups for a container and so
it can't report any of the statistics about e.g. how much memory or CPU a
container is using.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

lxc-checkpoint should fail if criu gets signal

The ->checkpoint() API call didn't exit correctly if criu was killed by a
signal instead of exiting, so lxc-checkpoint didn't fail correctly as a result.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

doc: Update Japanese lxc-top(1) for porting C version

Update for commit 7dc6f6e

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

doc: Add lxc.aa_allow_incomplete flag to Japanese man

Update Japanese lxc.container.conf(5) for commit 93c709b

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

port lxc-top from lua to C for wider availability

- keep but rename the lua version as an example of how to use the lua API

- got rid of the fairly useless --max argument

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

download: Make --keyserver actually work

Reported-by: NeilGreenwood <neil.greenwood@gmail.com>
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

doc: Add description about ignoring lxc.cgroup.use when using cgmanager

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Fix typo in lsm.h breaking android build

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

Fix the unprivileged tests cgroup management

To cover all the cases we have around, we need to:
- Attempt to use cgm if present (preferred)
- Attempt to use cgmanager directly over dbus otherwise
- Fallback to cgroupfs

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>

document the new lxc.aa_allow_incomplete flag

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Fix build error(ISO C90 specs violation) in lxc.c

This patch fixes following build errors.

running build_ext
building '_lxc' extension
creating build/temp.linux-x86_64-3.4
gcc -pthread -Wno-unused-result -Werror=declaration-after-statement -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong --param=ssp-buffer-size=4 -fPIC -I../../src -I../../src -I/usr/include/python3.4m -c lxc.c -o ./build/temp.linux-x86_64-3.4/lxc.o
lxc.c: In function ‘convert_tuple_to_char_pointer_array’:
lxc.c:49:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
     char **result = (char**) calloc(argc + 1, sizeof(char*));
     ^
lxc.c:60:9: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
         char *str = NULL;
         ^
lxc.c: In function ‘Container_get_cgroup_item’:
lxc.c:822:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
     char* value = (char*) malloc(sizeof(char)*len + 1);
     ^
lxc.c: In function ‘Container_get_config_item’:
lxc.c:861:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
     char* value = (char*) malloc(sizeof(char)*len + 1);
     ^
lxc.c: In function ‘Container_get_keys’:
lxc.c:903:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
     char* value = (char*) malloc(sizeof(char)*len + 1);
     ^
cc1: some warnings being treated as errors
error: command 'gcc' failed with exit status 1
Makefile:472: recipe for target 'all' failed
make[3]: *** [all] Error 1
make[3]: Leaving directory '/home/masami/codes/lxc/src/python-lxc'
Makefile:394: recipe for target 'all-recursive' failed
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory '/home/masami/codes/lxc/src'
Makefile:338: recipe for target 'all' failed
make[1]: *** [all] Error 2
make[1]: Leaving directory '/home/masami/codes/lxc/src'
Makefile:484: recipe for target 'all-recursive' failed
make: *** [all-recursive] Error 1

build env:
distribution: Arch Linux
gcc version 4.9.1 20140903 (prerelease) (GCC)

Signed-off-by: Masami Ichikawa <masami256@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

apparmor: make sure sysfs and securityfs are mounted when checking for mount feature

Otherwise the check will return false if securityfs was not mounted
by the container's configuration. In the past we let that quietly
proceed, but unconfined. Now that we restrict such container
starts, this caused lxc-test-apparmor to fail.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>

apparmor: improve behavior when kernel lacks mount restrictions (v2)

(Dwight, I took the liberty of adding your Ack but the code did
change a bit to continue passing the char *label from attach.
Tested that "lxc-start -n u1 -s lxc.aa_profile=p2; lxc-attach -n u1"
does attach you to the p2 profile)

Apparmor policies require mount restrictions to fullfill many of
their promises - for instance if proc can be mounted anywhere,
then 'deny /proc/sysrq-trigger w' prevents only accidents, not
malice.

The mount restrictions are not available in the upstream kernel.
We can detect their presence through /sys.  In the past, when
we detected it missing, we would not enable apparmor.  But that
prevents apparmor from helping to prevent accidents.

At the same time, if the user accidentaly boots a kernel which
has regressed, we do not want them starting the container thinking
they are more protected than they are.

This patch:

1. adds a lxc.aa_allow_incomplete = 1 container config flag.  If
not set, then any container which is not set to run unconfined
will refuse to run.   If set, then the container will run with
apparmor protection.

2. to pass this flag to the apparmor driver, we pass the container
configuration (lxc_conf) to the lsm_label_set hook.

3. add a testcase.  To test the case were a kernel does not
provide mount restrictions, we mount an empty directory over
the /sys/kernel/security/apparmor/features/mount directory.  In
order to have that not be unmounted in a new namespace, we must
test using unprivileged containers (who cannot remove bind mounts
which hide existing mount contents).

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

pivot_root: switch to a new mechanism (v2)

This idea came from Andy Lutomirski. Instead of using a
temporary directory for the pivot_root put-old, use "." both
for new-root and old-root. Then fchdir into the old root
temporarily in order to unmount the old-root, and finally
chdir back into our '/'.

Drop lxc.pivotdir from the lxc.container.conf manpage.

Warn when we see a lxc.pivotdir entry (but keep it in the
lxc.conf for now).

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

log: fix quiet mode

quiet mode was overriden by the double call of lxc_log_init
see lxc_container_new

use lxc_log_options_no_override in order to fix this

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: William Dauchy <william@gandi.net>

support use of 'all' containers when cgmanager supports it

Introduce a new list of controllers just containing "all".

Make the lists of controllers null-terminated.

If the cgmanager api version is high enough, use the 'all' controller
rather than walking all controllers, which should greatly reduce the
amount of dbus overhead.  This will be especially important for
those going through a cgproxy.

Also remove the call to cleanup cgroups when a cgroup existed.  That
usually fails (and failure is ignored) since the to-be-cleaned-up
cgroup is busy, but we shouldn't even be trying.  Note this can
create for extra un-cleanedup cgroups, however it's better than us
accidentally removing a cgroup that someone else had created and was
about to use.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

lxc-checkpoint should actually log things

Looks like lxc-checkpoint was missing the log inititalization code, so it never
actually logged anything when the options were provided.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

tests: require criu >= 1.3.1 for criu test

CRIU 1.3 has a pretty crippling deadlock which will cause dumping containers to
fail fairly often. This is fixed in criu 1.3.1, so we shouldn't run the tests
on anything less than that.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

c/r: use --restore-sibling option in CRIU

After looking through some logs, it is a little cleaner to do it as
below, instead of what I originally posted.

Tycho

In order for LXC to be the parent of the restored process, CRIU needs to
restore init as its sibling, not as its child. This was previously accomplished
essentially via luck :). CRIU now has a --restore-sibling option which forces
this behavior that LXC expects. See more discussion in this thread:
http://lists.openvz.org/pipermail/criu/2014-September/thread.html#16330

v2: don't pass --restore-sibling to dump. This is mostly cosmetic, but will
look less confusing in the logs if people ever look at them.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

lxc_map_ids: add a comment

Explain why we insist that root use newuidmap if it is available.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

lxc-gentoo: keep original uid/gid of files/dirs when installing

Call tar with --numeric-owner option to use numbers for user/group
names because the whole uid/gid in rootfs should be consistently
unchanged as in original stage3 tarball and private portage.

Signed-off-by: TAMUKI Shoichi <tamuki@linet.gr.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

finalize handler in lxcapi_restore

We can also narrow the scope of this, since we only need it in the process that
is actually going to use it.

Reported-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

Exit on errors in restore()'s worker

If we just return here, we end up with two processes executing the caller's
code, which is not good.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

Allow criu >= 1.3 in c/r test

criu version 1.3 has been tagged, which has the minimal set of patches to allow
checkpointing and restoring containers. lxc-test-checkpoint-restore is now
skipped on any version of criu lower than 1.3.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

lxc-checkpoint: use --force-irmap criu option

This option is required when migrating containers across hosts; it is used to
restore inotify via file paths instead of file handles, which aren't preserved
across hosts.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

lxc-plamo: keep original uid/gid of files/dirs when installing

Regardless of whether "installpkg" command exists or not, install the
command temporarily with static linked tar command into the lxc cache
directory to keep the original uid/gid of files/directories. Also,
use sed command instead of ed command for simplicity.

Signed-off-by: TAMUKI Shoichi <tamuki@linet.gr.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

config: fix the handling of lxc.hook and hwaddrs in unexpanded config

And add a testcase.

The code to update hwaddrs in a clone was walking through the container
configuration and re-printing all network entries. However network
entries from an include file which should not be printed out were being
added to the unexpanded config. With this patch, at clone we simply
update the hwaddr in-place in the unexpanded configuration file, making
sure to make the same update to the expanded network configuration.

The code to update out lxc.hook statements had the same problem.
We also update it in-place in the unexpanded configuration, though
we mirror the logic we use when updating the expanded configuration.
(Perhaps that should be changed, to simplify future updates)

This code isn't particularly easy to review, so testcases are added
to make sure that (1) extra lxc.network entries are not added (or
removed), even if they are present in an included file, (2) lxc.hook
entries are not added, (3) hwaddr entries are updated, and (4)
the lxc.hook entries are properly updated (only when they should be).

Reported-by: Stéphane Graber <stgraber@ubuntu.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Discontinue the use of in-line comments

Those aren't supported, it's just a lucky coincidence that they weren't
causing problems.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

Report container exit status to monitord

When managing containers, I need to take action based on container
exit status. For instance, if it exited abnormally (status!=0), I
sometime want to respawn it automatically. Or, when invoking
`lxc-stop` I want to know if it terminated gracefully (ie on `SIGTERM`)
or on `SIGKILL` after a timeout.

This patch adds a new message type `lxc_msg_exit_code,` to preserve
ABI. It sends the raw status code as returned by `waitpid` so that
listening application may want to apply `WEXITSTATUS` before. This is
what `lxc-monitor` does.

Signed-off-by: Jean-Tiare LE BIGOT <jean-tiare.le-bigot@ovh.net>

lxc-cgm: fix issue with nested chowning

To ask cgmanager to chown files as an unpriv user, we must send the
request from the container's namespace (with our own userid also
mapped in).  However when we create a new namespace then we must
open a new dbus connection, so that our credential and the credential
on the dbus socket match.  Otherwise the proxy will refuse the request.

Because we were warning about this failure but not exiting, the failure
was not noticed until the unprivileged container went on to try to
administer its cgroups, i.e. creating a container inside itself.

Fix this by having the do_chown_cgroup create a new cgmanager connection.
In order to reduce the number of connections, since the list of subsystems
is global anyway, don't call do_chown_cgroup once for each controller,
just call it once and have it run over all controllers.

(This patch does not change the fact that we don't fail if the
chown failed.  I think we should change that, but let's do it in a
later patch)

Reported-by: Stéphane Graber <stgraber@ubuntu.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

doc: Translate lxc-checkpoint(1) into Japanese

Update for commit 735f2c6

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Add lxc-restore-net to extra_DIST

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

Fix build failure due to wrong test name

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

handle hashed command socket names (v2)

With the new hashed command socket names (e85898415c), it's possible to
have something like below;

[caglar@qop:~/go/src/github.com/lxc/go-lxc(master)] cat /proc/net/unix | grep lxc
0000000000000000: 00000002 00000000 00010000 0001 01 53465 @lxc/d086e835c86f4b8d/command
[...]

list_active_containers reads /proc/net/unix to find all running
containers but this new format no longer includes the container name or
its lxcpath.

This patch introduces two new commands (LXC_CMD_GET_NAME and
LXC_CMD_GET_LXCPATH) and starts to use those in list_active_containers
call.

changes since v1:
- added sanity check proposed by Serge

Signed-off-by: S.Çağlar Onur <caglar@10ur.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

Add support for checkpoint and restore via CRIU

This patch adds support for checkpointing and restoring containers via CRIU.
It adds two api calls, ->checkpoint and ->restore, which are wrappers around
the CRIU CLI. CRIU has an RPC API, but reasons for preferring exec() are
discussed in [1].

To checkpoint, users specify a directory to dump the container metadata (CRIU
dump files, plus some additional information about veth pairs and which
bridges they are attached to) into this directory. On restore, this
information is read out of the directory, a CRIU command line is constructed,
and CRIU is exec()d. CRIU uses the lxc-restore-net callback (which in turn
inspects the image directory with the NIC data) to properly restore the
network.

This will only work with the current git master of CRIU; anything as of
a152c843 should work. There is a known bug where containers which have been
restored cannot be checkpointed [2].

[1]: http://lists.openvz.org/pipermail/criu/2014-July/015117.html
[2]: http://lists.openvz.org/pipermail/criu/2014-August/015876.html

v2: fixed some problems with the s/int/bool return code form api function
v3: added a testcase, fixed up the man page synopsis
v4: fix a small typo in lxc-test-checkpoint-restore
v5: remove a reference to the old CRIU_PATH, and a bad error about the same

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

build: Make setup.py run from srcdir to avoid distutils errors

distutils can't handle paths to source files containing '..'. It will
try to navigate away from the build directory and fail. To fix that,
before building the python module, transform all the path variables then
cd to the srcdir, and set the build directory manually.

This is hopefully the last needed fix to use separate build and
source diretories.

Signed-off-by: Daniel Miranda <danielkza2@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

build: don't remove configuration template on clean

Now that default.conf is generated/linked during the configuration
phase, it should not longer be removed in the 'clean' stage, or
subsequent builds will fail. Only remove it during 'dist-clean'.

Signed-off-by: Daniel Miranda <danielkza2@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

tests: Copy the download cache when available [v2]

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

Prevent compiler warning by initializing ifindex

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

lxc-user-nic: be more paranoid

Just setting path isn't enough. Clear the whole environment, and only set
$PATH. It's all we need - ovs-vsctl is running fine this way.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

lxc-archlinux: Properly set default locale in /etc/locale.conf

Signed-off-by: Bill Kolokithas <kolokithas.b@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Additional checks in ALTLinux template

Added check of services in container before start or stop.
Added check of syslog config existence prior changing.

Signed-off-by: Denis Pynkin <dans@altlinux.org>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Update the openvswitch bridge attach code

1. don't determine ovs-vsctl path at configure time, do it at runtime

2. lxc-user-nic: set a sane path to protect from unpriv users

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

use lxcpath as unprivileged containers log directory

Signed-off-by: S.Çağlar Onur <caglar@10ur.org>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

statvfs: do nothing if statvfs does not exist (android/bionic)

If statvfs does not exist, then don't recalculate mount flags
at remount.

If someone does need this, they could replace the code (only
if !HAVE_STATVFS) with code parsing /proc/self/mountinfo (which
exists in the recent git history)

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

lxc_mount_auto_mounts: honor existing nodev etc at remounts

Same problem as we had with mount_entry(). lxc_mount_auto_mounts()
sometimes does bind mount followed by remount to change options.
With recent kernels it must pass any preexisting NODEV/NOSUID/etc
flags.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

mount_entry: use statvfs

Use statvfs instead of parsing /proc/self/mountinfo to check for the
flags we need to and into the msbind mount flags. This will be faster
and the code is cleaner.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

build: Fix support for split build and source dirs

Building LXC in a separate target directory, by running configure from
outside the source tree, failed with multiple errors, mostly in the
Python and Lua extensions, due to assuming the source dir and build dir
are the same in a few places. To fix that:

- Pre-process setup.py with the appropriate directories at configure
  time
- Introduce the build dir as an include path in the Lua Makefile
- Link the default container configuration file from the alternatives
  in the configure stage, instead of setting a variable and using it
  in the Makefile

Signed-off-by: Daniel Miranda <danielkza2@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

chmod container dir to 0770 (v2)

This prevents u2 from going into /home/u1/.local/share/lxc/u1/rootfs
and running setuid-root applications to get write access to u1's
container rootfs.

v2: set umask to 002 for the mkdir. Otherwise if umask happens to be,
say, 022, then user does not have write permissions under the container
dir and creation of $containerdir/partial file will fail.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

load_config_locked: update unexp network

When we read a lxc.network.hwaddr line, if it contained any 'x's then
those get quitely filled in at config_network_hwaddr. If that happens
then we want to save the autogenerated hwaddr in the unexpanded config
so that when we write it to disk, it is saved.

This patch dumbly re-generates the network configuration in the
unexp configuration every time we load a config file, just as we do
after every clone.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

ignore SIGKILL (CTRL-C) and SIGQUIT (CTRL-\) - issue #313

Signed-off-by: S.Çağlar Onur <caglar@10ur.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

show additional info if btrfs subvolume deletion fails (issue #315)

Unprivileged users require "-o user_subvol_rm_allowed" mount option for btrfs.
Make the INFO level message to ERROR to make it clear, which now says following;

[caglar@qop:~] lxc-destroy -n rubik
lxc_container: Is the rootfs mounted with -o user_subvol_rm_allowed?
lxc_container: Error destroying rootfs for rubik
Destroying rubik failed

Signed-off-by: S.Çağlar Onur <caglar@10ur.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

lxc_map_ids: don't do bogus chekc for newgidmap

If we didn't find newuidmap, then simply require the caller to be
root and write to /proc/self/uidmap manually. Checking for
newgidmap to exist is bogus.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

Update plamo template

- If "installpkg" command does not exist, lxc-plamo temporarily
  install the command with static linked tar command into the lxc
  cache directory.  The tar command does not refer to passwd/group
  files, which means that only a few files/directories are extracted
  with wrong user/group ownership.  To avoid this, the installpkg
  command now uses the standard tar command in the system.
- Change mode to 666 for $rootfs/dev/null to allow write access for
  all users.
- Small fix in usage message.

Signed-off-by: TAMUKI Shoichi <tamuki@linet.gr.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: KATOH Yasufumi <karma@jazz.email.ne.jp>

doc: Fix Japanese translation of lxc.containers.conf(5)

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

tests: Call sync before testing a shutdown

This should avoid tests failure when the machine running the tests has
either very slow disks or a lot of data waiting to be flushed.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

do_mount_entry: add nexec, nosuid, nodev, rdonly flags if needed at remount

See http://lkml.org/lkml/2014/8/13/746 and its history.  The kernel now refuses
mounts if we don't add ro,nosuid,nodev,noexec flags if they were already there.

Also use the newly found info to skip remount if unneeded.  For background, if
you want to create a read-only bind mount, then you must first mount(2) with
MS_BIND to create the bind mount, then re-mount(2) again to get the new mount
options to apply.  So if this wasn't a bind mount, or no new mount options were
introduced, then we don't do the second mount(2).

null_endofword() and get_field() were not changed, only moved up in
the file.

(Note, while I can start containers inside a privileged container with
this patch, most of the lxc tests still fail with the kernel in question;
Andy's patch seems to still be needed - a kernel with which is available
at https://launchpad.net/~serge-hallyn/+archive/ubuntu/userns-natty
ppa:serge-hallyn/userns-natty)

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

monitor: fix sockname calculation for long lxcpaths

A long enough lxcpath (and small PATH_MAX through crappy defines) can cause
the creation of the string to be hashed to fail.  So just use alloca to
get the size string we need.

More importantly, while I can't explain it, if lxcpath is too long, setting
sockname[sizeof(addr->sun_path)-2] to \0 simply doesn't seem to work.  So set
sockname[sizeof(addr->sun_path)-3] to \0, which does work.

With this, and with

lxc.lxcpath = /opt/lxc0123456789/lxc0123456789/lxc0123456789/lxc0123456789/lxc0123456789/lxc0123456789/lxc0123456789/lxc0123456789/lxc0123456789/lxc0123456789

in /etc/lxc/lxc.conf, I can run lxc-wait just fine.  Without it, it fails
(as does lxc-start -d, which uses lxc_wait to verify the container started)

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

command socket: use hash if needed

The container command socket is an abstract unix socket containing
the lxcpath and container name. Those can be too long. In that case,
use the hash of the lxcpath and lxcname. Continue to use the path and
name if possible to avoid any back compat issues.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Revert "chmod container dir to 0770"

This commit broke the testsuite for unprivileged containers as the
container directory is now 0750 with the owner being the container root
and the group being the user's group, meaning that the parent user can
only enter the directory, not create entries in there.

This reverts commit c86da6a3ac517b78e6f710df7efe2f51d153b73c.

Fix typo in the previous commit...

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

Add extra debugging

This is an hybrid between Micahel's original patch and me making the new
debugging statements look like our existing ones.

Signed-off-by: "Micahel J. Evans" <mjevans1983@gmail.com>
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>