]> git.proxmox.com Git - mirror_lxc.git/log
mirror_lxc.git
6 years agoInclude -devel suffix in version string
Stéphane Graber [Fri, 5 Jan 2018 20:20:55 +0000 (15:20 -0500)]
Include -devel suffix in version string

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
6 years agoFix broken indentation
Stéphane Graber [Fri, 5 Jan 2018 20:19:30 +0000 (15:19 -0500)]
Fix broken indentation

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
6 years agoMerge pull request #2067 from brauner/2018-01-03/allow_fully_unprivileged_containers
Serge Hallyn [Thu, 4 Jan 2018 16:26:01 +0000 (10:26 -0600)]
Merge pull request #2067 from brauner/2018-01-03/allow_fully_unprivileged_containers

conf: write "deny" to /proc/[pid]/setgroups

6 years agoMerge pull request #2068 from brauner/2018-01-03/cleanup_command_after_revert
Serge Hallyn [Thu, 4 Jan 2018 16:21:17 +0000 (10:21 -0600)]
Merge pull request #2068 from brauner/2018-01-03/cleanup_command_after_revert

commands: fully revert set_running_config_item()

6 years agocgfsng: only establish mapping once
Christian Brauner [Thu, 4 Jan 2018 14:28:12 +0000 (15:28 +0100)]
cgfsng: only establish mapping once

When we deleted cgroups for unprivileged containers we used to allocate a new
mapping and clone a new user namespace each time we delete a cgroup. This of
course meant - on a cgroup v1 system - doing this >= 10 times when all
controllers were used. Let's not to do this and only allocate and establish a
mapping once.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoconf: rework userns_exec_1()
Christian Brauner [Thu, 4 Jan 2018 14:01:06 +0000 (15:01 +0100)]
conf: rework userns_exec_1()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoconf: non-functional changes
Christian Brauner [Thu, 4 Jan 2018 13:59:42 +0000 (14:59 +0100)]
conf: non-functional changes

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoconf: write "deny" to /proc/[pid]/setgroups
Christian Brauner [Wed, 3 Jan 2018 15:28:40 +0000 (16:28 +0100)]
conf: write "deny" to /proc/[pid]/setgroups

When fully unprivileged users run a container that only maps their own {g,u}id
and they do not have access to setuid new{g,u}idmap binaries we will write the
idmapping directly. This however requires us to write "deny" to
/proc/[pid]/setgroups otherwise any write to /proc/[pid]/gid_map will be
denied.

On a sidenote, this patch enables fully unprivileged containers. If you now set
lxc.net.[i].type = empty no privilege whatsoever is required to run a container.

Enhances #2033.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Felix Abecassis <fabecassis@nvidia.com>
Cc: Jonathan Calmels <jcalmels@nvidia.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2069 from stgraber/master
Christian Brauner [Thu, 4 Jan 2018 09:29:43 +0000 (10:29 +0100)]
Merge pull request #2069 from stgraber/master

gentoo: Add support for .xz tarballs

6 years agoMerge pull request #2070 from hallyn/2018-01-03/staticlibcap
Christian Brauner [Thu, 4 Jan 2018 09:29:18 +0000 (10:29 +0100)]
Merge pull request #2070 from hallyn/2018-01-03/staticlibcap

configure.ac: fix the check for static libcap

6 years agoconfigure.ac: fix the check for static libcap
Serge Hallyn [Thu, 4 Jan 2018 03:02:53 +0000 (21:02 -0600)]
configure.ac: fix the check for static libcap

The existing check doesn't work, because when you statically
link a program against libc, any functions not called are not
included.  So cap_init() which we check for is not there in
the built binary.

So instead just check whether a "gcc -lcap -static" works.
If libcap.a is not available it will fail, if it is it will
succeed.

Signed-off-by: Serge Hallyn <shallyn@cisco.com>
6 years agogentoo: Add support for .xz tarballs
Stéphane Graber [Wed, 3 Jan 2018 23:06:33 +0000 (18:06 -0500)]
gentoo: Add support for .xz tarballs

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
6 years agocommands: fully revert set_running_config_item()
Christian Brauner [Wed, 3 Jan 2018 17:28:58 +0000 (18:28 +0100)]
commands: fully revert set_running_config_item()

The noop implementation is pointless.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2065 from brauner/2017-01-01/revert_set_running_config_item
Stéphane Graber [Wed, 3 Jan 2018 17:12:39 +0000 (12:12 -0500)]
Merge pull request #2065 from brauner/2017-01-01/revert_set_running_config_item

lxccontainer: revert set_running_config_item()

6 years agoMerge pull request #2066 from brauner/2017-01-02/support_no_root_mappings
Serge Hallyn [Wed, 3 Jan 2018 03:42:06 +0000 (21:42 -0600)]
Merge pull request #2066 from brauner/2017-01-02/support_no_root_mappings

Support configurations without root mapping

6 years agoconf: detect if devpts can be mounted with gid=5
Christian Brauner [Tue, 2 Jan 2018 23:11:38 +0000 (00:11 +0100)]
conf: detect if devpts can be mounted with gid=5

Closes #2033.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocgfsng: use init {g,u}id
Christian Brauner [Tue, 2 Jan 2018 22:41:10 +0000 (23:41 +0100)]
cgfsng: use init {g,u}id

If no id mapping for the container's root id is defined try to us the id
mappings specified via lxc.init.{g,u}id.

Closes #2033.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoconf{ile}: detect ns{g,u}id mapping for root
Christian Brauner [Tue, 2 Jan 2018 22:27:55 +0000 (23:27 +0100)]
conf{ile}: detect ns{g,u}id mapping for root

Closes #2033.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoconf: adapt userns_exec_1()
Christian Brauner [Tue, 2 Jan 2018 21:31:16 +0000 (22:31 +0100)]
conf: adapt userns_exec_1()

Closes #2033.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoconf: adapt idmap helpers
Christian Brauner [Tue, 2 Jan 2018 21:15:17 +0000 (22:15 +0100)]
conf: adapt idmap helpers

- mapped_hostid_entry()
- idmap_add()

Closes #2033.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agolxccontainer: revert set_running_config_item()
Christian Brauner [Mon, 1 Jan 2018 20:56:23 +0000 (21:56 +0100)]
lxccontainer: revert set_running_config_item()

- As discussed we will have a proper API extension that will allow updating
  various parts of a running container. The prior approach wasn't a good idea.

- Revert this is not a problem since we haven't released any version with the
  set_running_config_item() API extension.

- I'm not simply reverting so that master users can still call into new
  liblxc's without crashing the container. This is achieved by keeping the
  commands callback struct member number identical.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2062 from brauner/2017-12-25/capture_output_of_short_lived_init_p...
Serge Hallyn [Sat, 30 Dec 2017 23:27:48 +0000 (17:27 -0600)]
Merge pull request #2062 from brauner/2017-12-25/capture_output_of_short_lived_init_process

mainloop: capture output of short-lived init procs

6 years agomainloop: use epoll_create1(EPOLL_CLOEXEC)
Christian Brauner [Tue, 26 Dec 2017 19:57:12 +0000 (20:57 +0100)]
mainloop: use epoll_create1(EPOLL_CLOEXEC)

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoconsole: do not allow non-pty devices on open()
Christian Brauner [Tue, 26 Dec 2017 17:00:08 +0000 (18:00 +0100)]
console: do not allow non-pty devices on open()

We don't allow non-pty devices anyway so don't let open() create unneeded
files.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agostart: properly cleanup mainloop
Christian Brauner [Tue, 26 Dec 2017 12:45:12 +0000 (13:45 +0100)]
start: properly cleanup mainloop

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2063 from marcosps/lxcconfig_help
Christian Brauner [Sat, 30 Dec 2017 20:05:41 +0000 (21:05 +0100)]
Merge pull request #2063 from marcosps/lxcconfig_help

lxc_config: Add -h and --help flags handler

6 years agolxc_config: Add -h and --help flags handler
Marcos Paulo de Souza [Sat, 30 Dec 2017 18:35:52 +0000 (16:35 -0200)]
lxc_config: Add -h and --help flags handler

As the other tools already handle, show usage message when -h or --help
are used.

Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
6 years agomainloop: capture output of short-lived init procs
Christian Brauner [Mon, 25 Dec 2017 13:53:40 +0000 (14:53 +0100)]
mainloop: capture output of short-lived init procs

The handler for the signal fd will detect when the init process of a container
has exited and cause the mainloop to close. However, this can happen before the
console handlers - or any other events for that matter - are handled. So in the
case of init exiting we still need to allow for all buffered input to the
console to be handled before exiting. This allows us to capture output from
short-lived init processes.

This is conceptually equivalent to my implementation of ExecReaderToChannel()
https://github.com/lxc/lxd/blob/master/shared/util_linux.go#L527

Closes #1694.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agomainloop: add mainloop macros
Christian Brauner [Mon, 25 Dec 2017 13:52:39 +0000 (14:52 +0100)]
mainloop: add mainloop macros

This makes it clearer why handlers return what value.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2058 from brauner/2017-12-22/bugfixes
Serge Hallyn [Fri, 22 Dec 2017 22:10:14 +0000 (16:10 -0600)]
Merge pull request #2058 from brauner/2017-12-22/bugfixes

start: fix death signal

6 years agostart: handle setting death signal smarter
Christian Brauner [Fri, 22 Dec 2017 21:52:42 +0000 (22:52 +0100)]
start: handle setting death signal smarter

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agostart: fix death signal
Christian Brauner [Fri, 22 Dec 2017 21:17:44 +0000 (22:17 +0100)]
start: fix death signal

On set{g,u}id() the kernel does:

  /* dumpability changes */
if (!uid_eq(old->euid, new->euid) ||
    !gid_eq(old->egid, new->egid) ||
    !uid_eq(old->fsuid, new->fsuid) ||
    !gid_eq(old->fsgid, new->fsgid) ||
    !cred_cap_issubset(old, new)) {
if (task->mm)
set_dumpable(task->mm, suid_dumpable);
task->pdeath_signal = 0;
smp_wmb();
}

which means we need to re-enable the deat signal after the set{g,u}id().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2057 from brauner/2017-12-22/bugfixes
Serge Hallyn [Fri, 22 Dec 2017 19:50:59 +0000 (13:50 -0600)]
Merge pull request #2057 from brauner/2017-12-22/bugfixes

start: simplify cgroup namespace preservation

6 years agostart: simplify cgroup namespace preservation
Christian Brauner [Fri, 22 Dec 2017 16:18:50 +0000 (17:18 +0100)]
start: simplify cgroup namespace preservation

Since we are now dumpable we can open /proc/<child-pid>/ns/cgroup so let's
avoid the overhead of sending around fds.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agostart: make us dumpable
Christian Brauner [Fri, 22 Dec 2017 16:11:45 +0000 (17:11 +0100)]
start: make us dumpable

When set set{u,g}id() the kernel will make us undumpable. This is unnecessary
since we can guarantee that whatever is running inside the child process at
this point this is fully trusted by the parent. Making us dumpable let's users
use debuggers on the child process before the exec as well and also allows us
to open /proc/<child-pid> files in lieu of the child.
Note, that we only need to perform the prctl(PR_SET_DUMPABLE, ...) if our
effective uid on the host is not 0. If our effective uid on the host is 0 then
we will keep all capabilities in the child user namespace across set{g,u}id().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2042 from brauner/2017-12-15/bugfixes
Serge Hallyn [Thu, 21 Dec 2017 22:30:11 +0000 (16:30 -0600)]
Merge pull request #2042 from brauner/2017-12-15/bugfixes

start: tweaks + bugfixes

6 years agoMerge pull request #2052 from brauner/2017-12-19/unprivileged_btrfs_regression
Serge Hallyn [Thu, 21 Dec 2017 22:08:18 +0000 (16:08 -0600)]
Merge pull request #2052 from brauner/2017-12-19/unprivileged_btrfs_regression

btrfs: fix unprivileged snapshot creation

6 years agostart: log closing cmd socket and STOPPED state
Christian Brauner [Sat, 16 Dec 2017 13:39:12 +0000 (14:39 +0100)]
start: log closing cmd socket and STOPPED state

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agostart: use lxc_raw_clone_cb() where possible
Christian Brauner [Fri, 15 Dec 2017 16:42:31 +0000 (17:42 +0100)]
start: use lxc_raw_clone_cb() where possible

This way we can rely on the kernel's copy-on-write support similar to fork().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agonamespace: add lxc_raw_clone_cb()
Christian Brauner [Fri, 15 Dec 2017 16:35:43 +0000 (17:35 +0100)]
namespace: add lxc_raw_clone_cb()

This is a copy-on-write (no stack passed) variant of lxc_clone().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agonamespace: comment lxc_{raw_}clone()
Christian Brauner [Fri, 15 Dec 2017 16:35:07 +0000 (17:35 +0100)]
namespace: comment lxc_{raw_}clone()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agotree-wide: s/getpid()/lxc_raw_getpid()/g
Christian Brauner [Sat, 16 Dec 2017 01:07:43 +0000 (02:07 +0100)]
tree-wide: s/getpid()/lxc_raw_getpid()/g

This is to avoid bad surprises caused by older glibc's pid cache (up to 2.25)
when using clone().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agonamespace: add lxc_raw_getpid()
Christian Brauner [Sat, 16 Dec 2017 00:23:17 +0000 (01:23 +0100)]
namespace: add lxc_raw_getpid()

Because of older glibc's pid cache (up to 2.25) whenever clone() is called the
child must must retrieve it's own pid via lxc_raw_getpid().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agotests: expand lxc_raw_clone() tests
Christian Brauner [Fri, 15 Dec 2017 16:03:09 +0000 (17:03 +0100)]
tests: expand lxc_raw_clone() tests

- test CLONE_VFORK
- test CLONE_FILES

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2047 from brauner/2017-12-18/attach_lsm_confinement
Serge Hallyn [Thu, 21 Dec 2017 21:56:51 +0000 (15:56 -0600)]
Merge pull request #2047 from brauner/2017-12-18/attach_lsm_confinement

attach: simplify significantly

6 years agoattach: handle /proc with hidepid={1,2} property
Christian Brauner [Wed, 20 Dec 2017 23:42:37 +0000 (00:42 +0100)]
attach: handle /proc with hidepid={1,2} property

Receive fd for LSM security module before we set{g,u}id(). The reason is that
on set{g,u}id() the kernel will a) make us undumpable and b) we will change our
effective uid. This means our effective uid will be different from the
effective uid of the process that created us which means that this processs no
longer has capabilities in our namespace including CAP_SYS_PTRACE. This means
we will not be able to read and /proc/<pid> files for the process anymore when
/proc is mounted with hidepid={1,2}. So let's get the lsm label fd before the
set{g,u}id().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoattach: use lxc_raw_clone()
Christian Brauner [Wed, 20 Dec 2017 12:14:33 +0000 (13:14 +0100)]
attach: use lxc_raw_clone()

This let's us simplify the whole file a lot and makes things way clearer. It
also let's us avoid the infamous pid cache.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoattach: simplify significantly
Christian Brauner [Mon, 18 Dec 2017 01:46:10 +0000 (02:46 +0100)]
attach: simplify significantly

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2055 from marcosps/cgfsng_debug
Christian Brauner [Wed, 20 Dec 2017 13:19:57 +0000 (14:19 +0100)]
Merge pull request #2055 from marcosps/cgfsng_debug

cgfsng: Add new macro to print errors

6 years agoMerge pull request #2013 from 3XX0/oci-dhcp-improvements
Christian Brauner [Wed, 20 Dec 2017 01:48:04 +0000 (02:48 +0100)]
Merge pull request #2013 from 3XX0/oci-dhcp-improvements

Improve the dhclient hook for OCI compat

6 years agocgfsng: Add new macro to print errors
Marcos Paulo de Souza [Wed, 20 Dec 2017 01:43:47 +0000 (23:43 -0200)]
cgfsng: Add new macro to print errors

At this point, macros such DEBUG or ERROR does not take effect because
this code is called from cgroup_ops_init(cgroup.c), which runs with
__attribute__((constructor)), before any log level is set form any tool
like lxc-start, so these messages are lost.

For now on, use the same LXC_DEBUG_CGFSNG environment variable to
control these messages.

Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
6 years agolxc-oci: add DHCP option leveraging dhclient hooks
Jonathan Calmels [Mon, 11 Dec 2017 21:53:15 +0000 (13:53 -0800)]
lxc-oci: add DHCP option leveraging dhclient hooks

Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
6 years agolxc-oci: read configuration from oci.common.conf if available
Jonathan Calmels [Fri, 8 Dec 2017 06:24:48 +0000 (22:24 -0800)]
lxc-oci: read configuration from oci.common.conf if available

Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
6 years agolxc-net: add LXC_DHCP_PING boolean option
Jonathan Calmels [Fri, 8 Dec 2017 06:15:10 +0000 (22:15 -0800)]
lxc-net: add LXC_DHCP_PING boolean option

Excerpt from dnsmasq(8):
By default, the DHCP server will attempt to ensure that an address in not
in use before allocating it to a host. It does this by sending an ICMP echo
request (aka "ping") to the address in question. If it gets a reply, then the
address must already be in use, and another is tried. This flag disables this check.

This is useful if one expects all the containers to get an IP address
from the LXC authoritative DHCP server and wants to speed up the process
of getting a lease.

Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
6 years agohooks: dhclient hook improvements
Jonathan Calmels [Fri, 8 Dec 2017 06:04:36 +0000 (22:04 -0800)]
hooks: dhclient hook improvements

- Merge dhclient-start and dhclient-stop into a single hook.
- Wait for a lease before returning from the hook.
- Generate a logfile when LXC log level is either DEBUG or TRACE.
- Rely on namespace file descriptors for the stop hook.
- Use settings from /<sysconf>/lxc/dhclient.conf if available.
- Attempt to cleanup if dhclient fails to shutdown properly.

Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
6 years agoMerge pull request #2048 from duguhaotian/master
Christian Brauner [Tue, 19 Dec 2017 14:09:41 +0000 (15:09 +0100)]
Merge pull request #2048 from duguhaotian/master

[monitor] wrong statement of break

6 years agoMerge pull request #2015 from flx42/nvidia-mount-hook
Christian Brauner [Tue, 19 Dec 2017 14:06:20 +0000 (15:06 +0100)]
Merge pull request #2015 from flx42/nvidia-mount-hook

hooks: add mount hook to configure access to NVIDIA GPUs

6 years agoMerge pull request #2050 from tanyifeng/small_fix
Christian Brauner [Tue, 19 Dec 2017 13:24:40 +0000 (14:24 +0100)]
Merge pull request #2050 from tanyifeng/small_fix

conf.c: small fix for args of mount_entry

6 years agoMerge pull request #2053 from tenforward/japanese
Christian Brauner [Tue, 19 Dec 2017 11:07:09 +0000 (12:07 +0100)]
Merge pull request #2053 from tenforward/japanese

Update Japanese lxc.container.conf(5)

6 years agodoc: Add relative option for lxc.mount.entry to Japanese lxc.container.conf(5)
KATOH Yasufumi [Tue, 19 Dec 2017 10:54:15 +0000 (19:54 +0900)]
doc: Add relative option for lxc.mount.entry to Japanese lxc.container.conf(5)

and:
* remove empty paragraph in English man
* untabify in Japanese man

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
6 years agodoc: Translate the hook of network into Japanese in lxc.container.conf(5)
KATOH Yasufumi [Tue, 19 Dec 2017 10:36:48 +0000 (19:36 +0900)]
doc: Translate the hook of network into Japanese in lxc.container.conf(5)

Update for commit 14a7b0f

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
6 years agodoc: Add the description of new style hook to Japanese lxc.containers.conf(5)
KATOH Yasufumi [Tue, 19 Dec 2017 10:08:22 +0000 (19:08 +0900)]
doc: Add the description of new style hook to Japanese lxc.containers.conf(5)

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
6 years agodoc: Add proc section to Japanese lxc.container.conf(5)
KATOH Yasufumi [Tue, 19 Dec 2017 06:54:23 +0000 (15:54 +0900)]
doc: Add proc section to Japanese lxc.container.conf(5)

Update for commit 61d7a73

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
6 years agodoc: Add sysctl section to Japanese lxc.container.conf(5)
KATOH Yasufumi [Tue, 19 Dec 2017 06:41:17 +0000 (15:41 +0900)]
doc: Add sysctl section to Japanese lxc.container.conf(5)

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
6 years agobtrfs: fix unprivileged snapshot creation
Christian Brauner [Tue, 19 Dec 2017 10:59:52 +0000 (11:59 +0100)]
btrfs: fix unprivileged snapshot creation

We already fixed privileged btrfs snapshot creation in:

commit 1c7222c084769a1d9406ca7dab943d8a5f016a56
Author: Christian Brauner <christian.brauner@ubuntu.com>
Date:   Tue Nov 28 13:51:03 2017 +0100

    btrfs: fix btrfs_snapshot()

    Closes #1956.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
but missed unprivileged btrfs snapshot creation. Fix it too.

Follow-up to #1956.
Closes #2051.

Reported-by: Oleg Freedhom overlayfs@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoconf.c: small fix for args of mount_entry
Yifeng Tan [Tue, 19 Dec 2017 09:35:01 +0000 (17:35 +0800)]
conf.c: small fix for args of mount_entry

Signed-off-by: Yifeng Tan <tanyifeng1@huawei.com>
6 years ago[monitor] wrong statement of break
独孤昊天 [Mon, 18 Dec 2017 06:52:25 +0000 (14:52 +0800)]
[monitor] wrong statement of break

if lxc_abstract_unix_connect fail and return -1,  this code never goto retry.

Signed-off-by: liuhao <liuhao27@huawei.com>
6 years agohooks: add mount hook to configure access to NVIDIA GPUs
Felix Abecassis [Tue, 19 Dec 2017 00:17:23 +0000 (16:17 -0800)]
hooks: add mount hook to configure access to NVIDIA GPUs

This hook requires the nvidia-container-cli tool provided by libnvidia-container:
https://github.com/nvidia/libnvidia-container

For containers that do not have CUDA_VERSION or NVIDIA_VISIBLE_DEVICES
set in the environment, the hook will be a no-op.

To enable in the configuration file:
lxc.hook.mount = /usr/local/share/lxc/hooks/nvidia

Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
6 years agoMerge pull request #2049 from brauner/2017-12-18/start_reap_attacher_process
Serge Hallyn [Mon, 18 Dec 2017 16:49:50 +0000 (10:49 -0600)]
Merge pull request #2049 from brauner/2017-12-18/start_reap_attacher_process

start: reap intermediate process

6 years agostart: reap intermediate process
Christian Brauner [Mon, 18 Dec 2017 13:08:02 +0000 (14:08 +0100)]
start: reap intermediate process

When we inherit namespaces we need to reap the attaching process.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2031 from tanyifeng/mask_and_readonly_path
Christian Brauner [Mon, 18 Dec 2017 11:12:59 +0000 (12:12 +0100)]
Merge pull request #2031 from tanyifeng/mask_and_readonly_path

conf.c: add relative option for lxc.mount.entry

6 years agoconf.c: add relative option for lxc.mount.entry
Yifeng Tan [Mon, 18 Dec 2017 16:50:58 +0000 (00:50 +0800)]
conf.c: add relative option for lxc.mount.entry

Signed-off-by: Yifeng Tan <tanyifeng1@huawei.com>
6 years agoMerge pull request #2040 from brauner/2017-12-14/bugfixes
Serge Hallyn [Fri, 15 Dec 2017 02:10:39 +0000 (20:10 -0600)]
Merge pull request #2040 from brauner/2017-12-14/bugfixes

lxc_init: fix cgroup parsing

6 years agoMerge pull request #2034 from brauner/2017-12-14/use_clone_in_run_command
Serge Hallyn [Thu, 14 Dec 2017 22:29:04 +0000 (16:29 -0600)]
Merge pull request #2034 from brauner/2017-12-14/use_clone_in_run_command

utils: use lxc_raw_clone() in run_command()

6 years agolxc_init: fix cgroup parsing
Christian Brauner [Thu, 14 Dec 2017 22:00:04 +0000 (23:00 +0100)]
lxc_init: fix cgroup parsing

coverity: #1426132
coverity: #1426133

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agotools: add missing break to lxc-execute
Christian Brauner [Thu, 14 Dec 2017 21:45:56 +0000 (22:45 +0100)]
tools: add missing break to lxc-execute

coverity: #1426131

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2039 from brauner/2017-12-14/fix_command_socket_race
Serge Hallyn [Thu, 14 Dec 2017 21:56:24 +0000 (15:56 -0600)]
Merge pull request #2039 from brauner/2017-12-14/fix_command_socket_race

commands: fix race when open()/close() cmd socket

6 years agoutils: use lxc_raw_clone() in run_command()
Christian Brauner [Thu, 14 Dec 2017 15:42:00 +0000 (16:42 +0100)]
utils: use lxc_raw_clone() in run_command()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agonamespace: add lxc_raw_clone()
Christian Brauner [Thu, 14 Dec 2017 14:31:54 +0000 (15:31 +0100)]
namespace: add lxc_raw_clone()

This is based on raw_clone in systemd but adapted to our needs. The main reason
is that we need an implementation of fork()/clone() that does guarantee us that
no pthread_atfork() handlers are run. While clone() in glibc currently doesn't
run pthread_atfork() handlers we should be fine but there's no guarantee that
this won't be the case in the future. So let's do the syscall directly - or as
direct as we can. An additional nice feature is that we get fork() behavior,
i.e. lxc_raw_clone() returns 0 in the child and the child pid in the parent.

Our implementation tries to make sure that we cover all cases according to
kernel sources. Note that we are not interested in any arguments that could be
passed after the stack.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2008 from tych0/share-ns-in-execute
Christian Brauner [Thu, 14 Dec 2017 20:37:41 +0000 (21:37 +0100)]
Merge pull request #2008 from tych0/share-ns-in-execute

add --share-$NS= support to lxc-execute

6 years agoMerge pull request #2037 from hallyn/2017-12-14/dir_detect_eperm
Christian Brauner [Thu, 14 Dec 2017 20:07:22 +0000 (21:07 +0100)]
Merge pull request #2037 from hallyn/2017-12-14/dir_detect_eperm

dir_detect: warn on eperm

6 years agoMerge pull request #2035 from adrianreber/master
Christian Brauner [Thu, 14 Dec 2017 20:06:17 +0000 (21:06 +0100)]
Merge pull request #2035 from adrianreber/master

criu: add feature check capability

6 years agocommands: fix race when open()/close() cmd socket
Christian Brauner [Thu, 14 Dec 2017 19:57:15 +0000 (20:57 +0100)]
commands: fix race when open()/close() cmd socket

When we report STOPPED to a caller and then close the command socket it is
technically possible - and I've seen this happen on the test builders - that a
container start() right after a wait() will receive ECONNREFUSED because it
called open() before we close(). So for all new state clients simply close the
command socket. This will inform all state clients that the container is
STOPPED and also prevents a race between a open()/close() on the command socket
causing a new process to get ECONNREFUSED because we haven't yet closed the
command socket.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocriu: add a test case for the criu feature check support
Adrian Reber [Wed, 13 Dec 2017 11:14:58 +0000 (12:14 +0100)]
criu: add a test case for the criu feature check support

This adds a simple test case which verifies that the new migrate() API
command 'MIGRATE_FEATURE_CHECK' works as expected.

If a feature does not exist on the currently running
architecture/kernel/criu combination it does not report an error as this
is a valid scenario.

Signed-off-by: Adrian Reber <areber@redhat.com>
6 years agocriu: add feature check capability
Adrian Reber [Wed, 13 Dec 2017 11:04:02 +0000 (12:04 +0100)]
criu: add feature check capability

For migration optimization features like pre-copy or post-copy migration
the support cannot be determined by simply looking at the CRIU version.
Features like that depend on the architecture/kernel/criu combination
and CRIU offers a feature checking interface to query if it is
supported.

This adds a LXC interface to query CRIU for those feature via the
migrate() API call. For the recent pre-copy migration support in LXD
this can be used to automatically detect if pre-copy migration should be
used.

In addition to the existing migrate() API commands this adds a new
command: 'MIGRATE_FEATURE_CHECK'.

The migrate_opts{} structure is extended by the member features_to_check
which is a bitmask defining which CRIU features should be queried.

Currently only the querying of the features FEATURE_MEM_TRACK and
FEATURE_LAZY_PAGES is supported.

Signed-off-by: Adrian Reber <areber@redhat.com>
6 years agodir_detect: warn on eperm
Serge Hallyn [Thu, 14 Dec 2017 19:16:02 +0000 (13:16 -0600)]
dir_detect: warn on eperm

if user has lxc.rootfs.path = /some/path/foo, but can't access
some piece of that path, then we'll get an unhelpful "failed to
mount" without any indication of the problem.

At least show that there is a permission problem.

Signed-off-by: Serge Hallyn <shallyn@cisco.com>
6 years agothe bike shed should be brilliant purple
Tycho Andersen [Thu, 14 Dec 2017 17:38:16 +0000 (17:38 +0000)]
the bike shed should be brilliant purple

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
6 years agoMerge pull request #2026 from brauner/2017-12-12/lxc_hook_version
Serge Hallyn [Thu, 14 Dec 2017 15:27:46 +0000 (09:27 -0600)]
Merge pull request #2026 from brauner/2017-12-12/lxc_hook_version

confile: add lxc.hook.version

6 years agonetwork: pass name of peer veth device
Christian Brauner [Thu, 14 Dec 2017 13:19:27 +0000 (14:19 +0100)]
network: pass name of peer veth device

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoconf: simplify run_script_argv()
Christian Brauner [Thu, 14 Dec 2017 13:15:18 +0000 (14:15 +0100)]
conf: simplify run_script_argv()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agonetwork: pass info in env if hook version is 1
Christian Brauner [Tue, 12 Dec 2017 12:30:54 +0000 (13:30 +0100)]
network: pass info in env if hook version is 1

Unblocks #2013.
Unblocks #2015.
Closes #1766.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agostart: pass namespaces as environment variables
Christian Brauner [Sun, 10 Dec 2017 12:53:32 +0000 (13:53 +0100)]
start: pass namespaces as environment variables

Unblocks #2013.
Unblocks #2015.
Closes #1766.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoconf: execute hooks based on lxc.hooks.version
Christian Brauner [Sun, 10 Dec 2017 11:54:00 +0000 (12:54 +0100)]
conf: execute hooks based on lxc.hooks.version

Unblocks #2013.
Unblocks #2015.
Closes #1766.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agostart: set LXC_HOOK_VERSION
Christian Brauner [Mon, 11 Dec 2017 11:10:37 +0000 (12:10 +0100)]
start: set LXC_HOOK_VERSION

This can be used by scripts to detect what version of the hooks are used.

Unblocks #2013.
Unblocks #2015.
Closes #1766.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoconfile: add lxc.hook.version
Christian Brauner [Sun, 10 Dec 2017 11:53:25 +0000 (12:53 +0100)]
confile: add lxc.hook.version

Unblocks #2013.
Unblocks #2015.
Closes #1766.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2030 from brauner/2017-12-13/fix_cgroup_namsepace_recording
Serge Hallyn [Thu, 14 Dec 2017 06:45:52 +0000 (00:45 -0600)]
Merge pull request #2030 from brauner/2017-12-13/fix_cgroup_namsepace_recording

start: fix cgroup namespace preservation

6 years agoSHARE_NS options should be before OPT_USAGE
Tycho Andersen [Thu, 14 Dec 2017 00:57:48 +0000 (00:57 +0000)]
SHARE_NS options should be before OPT_USAGE

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
6 years agoinit: don't kill(-1) if we aren't in a pid ns
Tycho Andersen [Fri, 8 Dec 2017 23:23:26 +0000 (23:23 +0000)]
init: don't kill(-1) if we aren't in a pid ns

...otherwise we'll kill everyone on the machine. Instead, let's explicitly
try to kill our children. Let's do a best effort against fork bombs by
disabling forking via the pids cgroup if it exists. This is best effort for
a number of reasons:

* the pids cgroup may not be available
* the container may have bind mounted /dev/null over pids.max, so the write
  doesn't do anything

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
6 years agoMerge pull request #2017 from brauner/generic/patch_testing
Stéphane Graber [Wed, 13 Dec 2017 18:26:32 +0000 (13:26 -0500)]
Merge pull request #2017 from brauner/generic/patch_testing

coverity: bugfixes

6 years agoMerge pull request #2025 from brauner/2017-12-12/fix_network_attach_and_detach
Stéphane Graber [Wed, 13 Dec 2017 18:22:31 +0000 (13:22 -0500)]
Merge pull request #2025 from brauner/2017-12-12/fix_network_attach_and_detach

lxccontainer: only attach netns on netdev detach