]> git.proxmox.com Git - mirror_lxc.git/log
mirror_lxc.git
6 years agoMerge pull request #2042 from brauner/2017-12-15/bugfixes
Serge Hallyn [Thu, 21 Dec 2017 22:30:11 +0000 (16:30 -0600)]
Merge pull request #2042 from brauner/2017-12-15/bugfixes

start: tweaks + bugfixes

6 years agoMerge pull request #2052 from brauner/2017-12-19/unprivileged_btrfs_regression
Serge Hallyn [Thu, 21 Dec 2017 22:08:18 +0000 (16:08 -0600)]
Merge pull request #2052 from brauner/2017-12-19/unprivileged_btrfs_regression

btrfs: fix unprivileged snapshot creation

6 years agostart: log closing cmd socket and STOPPED state
Christian Brauner [Sat, 16 Dec 2017 13:39:12 +0000 (14:39 +0100)]
start: log closing cmd socket and STOPPED state

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agostart: use lxc_raw_clone_cb() where possible
Christian Brauner [Fri, 15 Dec 2017 16:42:31 +0000 (17:42 +0100)]
start: use lxc_raw_clone_cb() where possible

This way we can rely on the kernel's copy-on-write support similar to fork().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agonamespace: add lxc_raw_clone_cb()
Christian Brauner [Fri, 15 Dec 2017 16:35:43 +0000 (17:35 +0100)]
namespace: add lxc_raw_clone_cb()

This is a copy-on-write (no stack passed) variant of lxc_clone().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agonamespace: comment lxc_{raw_}clone()
Christian Brauner [Fri, 15 Dec 2017 16:35:07 +0000 (17:35 +0100)]
namespace: comment lxc_{raw_}clone()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agotree-wide: s/getpid()/lxc_raw_getpid()/g
Christian Brauner [Sat, 16 Dec 2017 01:07:43 +0000 (02:07 +0100)]
tree-wide: s/getpid()/lxc_raw_getpid()/g

This is to avoid bad surprises caused by older glibc's pid cache (up to 2.25)
when using clone().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agonamespace: add lxc_raw_getpid()
Christian Brauner [Sat, 16 Dec 2017 00:23:17 +0000 (01:23 +0100)]
namespace: add lxc_raw_getpid()

Because of older glibc's pid cache (up to 2.25) whenever clone() is called the
child must must retrieve it's own pid via lxc_raw_getpid().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agotests: expand lxc_raw_clone() tests
Christian Brauner [Fri, 15 Dec 2017 16:03:09 +0000 (17:03 +0100)]
tests: expand lxc_raw_clone() tests

- test CLONE_VFORK
- test CLONE_FILES

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2047 from brauner/2017-12-18/attach_lsm_confinement
Serge Hallyn [Thu, 21 Dec 2017 21:56:51 +0000 (15:56 -0600)]
Merge pull request #2047 from brauner/2017-12-18/attach_lsm_confinement

attach: simplify significantly

6 years agoattach: handle /proc with hidepid={1,2} property
Christian Brauner [Wed, 20 Dec 2017 23:42:37 +0000 (00:42 +0100)]
attach: handle /proc with hidepid={1,2} property

Receive fd for LSM security module before we set{g,u}id(). The reason is that
on set{g,u}id() the kernel will a) make us undumpable and b) we will change our
effective uid. This means our effective uid will be different from the
effective uid of the process that created us which means that this processs no
longer has capabilities in our namespace including CAP_SYS_PTRACE. This means
we will not be able to read and /proc/<pid> files for the process anymore when
/proc is mounted with hidepid={1,2}. So let's get the lsm label fd before the
set{g,u}id().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoattach: use lxc_raw_clone()
Christian Brauner [Wed, 20 Dec 2017 12:14:33 +0000 (13:14 +0100)]
attach: use lxc_raw_clone()

This let's us simplify the whole file a lot and makes things way clearer. It
also let's us avoid the infamous pid cache.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoattach: simplify significantly
Christian Brauner [Mon, 18 Dec 2017 01:46:10 +0000 (02:46 +0100)]
attach: simplify significantly

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2055 from marcosps/cgfsng_debug
Christian Brauner [Wed, 20 Dec 2017 13:19:57 +0000 (14:19 +0100)]
Merge pull request #2055 from marcosps/cgfsng_debug

cgfsng: Add new macro to print errors

6 years agoMerge pull request #2013 from 3XX0/oci-dhcp-improvements
Christian Brauner [Wed, 20 Dec 2017 01:48:04 +0000 (02:48 +0100)]
Merge pull request #2013 from 3XX0/oci-dhcp-improvements

Improve the dhclient hook for OCI compat

6 years agocgfsng: Add new macro to print errors
Marcos Paulo de Souza [Wed, 20 Dec 2017 01:43:47 +0000 (23:43 -0200)]
cgfsng: Add new macro to print errors

At this point, macros such DEBUG or ERROR does not take effect because
this code is called from cgroup_ops_init(cgroup.c), which runs with
__attribute__((constructor)), before any log level is set form any tool
like lxc-start, so these messages are lost.

For now on, use the same LXC_DEBUG_CGFSNG environment variable to
control these messages.

Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
6 years agolxc-oci: add DHCP option leveraging dhclient hooks
Jonathan Calmels [Mon, 11 Dec 2017 21:53:15 +0000 (13:53 -0800)]
lxc-oci: add DHCP option leveraging dhclient hooks

Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
6 years agolxc-oci: read configuration from oci.common.conf if available
Jonathan Calmels [Fri, 8 Dec 2017 06:24:48 +0000 (22:24 -0800)]
lxc-oci: read configuration from oci.common.conf if available

Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
6 years agolxc-net: add LXC_DHCP_PING boolean option
Jonathan Calmels [Fri, 8 Dec 2017 06:15:10 +0000 (22:15 -0800)]
lxc-net: add LXC_DHCP_PING boolean option

Excerpt from dnsmasq(8):
By default, the DHCP server will attempt to ensure that an address in not
in use before allocating it to a host. It does this by sending an ICMP echo
request (aka "ping") to the address in question. If it gets a reply, then the
address must already be in use, and another is tried. This flag disables this check.

This is useful if one expects all the containers to get an IP address
from the LXC authoritative DHCP server and wants to speed up the process
of getting a lease.

Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
6 years agohooks: dhclient hook improvements
Jonathan Calmels [Fri, 8 Dec 2017 06:04:36 +0000 (22:04 -0800)]
hooks: dhclient hook improvements

- Merge dhclient-start and dhclient-stop into a single hook.
- Wait for a lease before returning from the hook.
- Generate a logfile when LXC log level is either DEBUG or TRACE.
- Rely on namespace file descriptors for the stop hook.
- Use settings from /<sysconf>/lxc/dhclient.conf if available.
- Attempt to cleanup if dhclient fails to shutdown properly.

Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
6 years agoMerge pull request #2048 from duguhaotian/master
Christian Brauner [Tue, 19 Dec 2017 14:09:41 +0000 (15:09 +0100)]
Merge pull request #2048 from duguhaotian/master

[monitor] wrong statement of break

6 years agoMerge pull request #2015 from flx42/nvidia-mount-hook
Christian Brauner [Tue, 19 Dec 2017 14:06:20 +0000 (15:06 +0100)]
Merge pull request #2015 from flx42/nvidia-mount-hook

hooks: add mount hook to configure access to NVIDIA GPUs

6 years agoMerge pull request #2050 from tanyifeng/small_fix
Christian Brauner [Tue, 19 Dec 2017 13:24:40 +0000 (14:24 +0100)]
Merge pull request #2050 from tanyifeng/small_fix

conf.c: small fix for args of mount_entry

6 years agoMerge pull request #2053 from tenforward/japanese
Christian Brauner [Tue, 19 Dec 2017 11:07:09 +0000 (12:07 +0100)]
Merge pull request #2053 from tenforward/japanese

Update Japanese lxc.container.conf(5)

6 years agodoc: Add relative option for lxc.mount.entry to Japanese lxc.container.conf(5)
KATOH Yasufumi [Tue, 19 Dec 2017 10:54:15 +0000 (19:54 +0900)]
doc: Add relative option for lxc.mount.entry to Japanese lxc.container.conf(5)

and:
* remove empty paragraph in English man
* untabify in Japanese man

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
6 years agodoc: Translate the hook of network into Japanese in lxc.container.conf(5)
KATOH Yasufumi [Tue, 19 Dec 2017 10:36:48 +0000 (19:36 +0900)]
doc: Translate the hook of network into Japanese in lxc.container.conf(5)

Update for commit 14a7b0f

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
6 years agodoc: Add the description of new style hook to Japanese lxc.containers.conf(5)
KATOH Yasufumi [Tue, 19 Dec 2017 10:08:22 +0000 (19:08 +0900)]
doc: Add the description of new style hook to Japanese lxc.containers.conf(5)

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
6 years agodoc: Add proc section to Japanese lxc.container.conf(5)
KATOH Yasufumi [Tue, 19 Dec 2017 06:54:23 +0000 (15:54 +0900)]
doc: Add proc section to Japanese lxc.container.conf(5)

Update for commit 61d7a73

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
6 years agodoc: Add sysctl section to Japanese lxc.container.conf(5)
KATOH Yasufumi [Tue, 19 Dec 2017 06:41:17 +0000 (15:41 +0900)]
doc: Add sysctl section to Japanese lxc.container.conf(5)

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
6 years agobtrfs: fix unprivileged snapshot creation
Christian Brauner [Tue, 19 Dec 2017 10:59:52 +0000 (11:59 +0100)]
btrfs: fix unprivileged snapshot creation

We already fixed privileged btrfs snapshot creation in:

commit 1c7222c084769a1d9406ca7dab943d8a5f016a56
Author: Christian Brauner <christian.brauner@ubuntu.com>
Date:   Tue Nov 28 13:51:03 2017 +0100

    btrfs: fix btrfs_snapshot()

    Closes #1956.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
but missed unprivileged btrfs snapshot creation. Fix it too.

Follow-up to #1956.
Closes #2051.

Reported-by: Oleg Freedhom overlayfs@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoconf.c: small fix for args of mount_entry
Yifeng Tan [Tue, 19 Dec 2017 09:35:01 +0000 (17:35 +0800)]
conf.c: small fix for args of mount_entry

Signed-off-by: Yifeng Tan <tanyifeng1@huawei.com>
6 years ago[monitor] wrong statement of break
独孤昊天 [Mon, 18 Dec 2017 06:52:25 +0000 (14:52 +0800)]
[monitor] wrong statement of break

if lxc_abstract_unix_connect fail and return -1,  this code never goto retry.

Signed-off-by: liuhao <liuhao27@huawei.com>
6 years agohooks: add mount hook to configure access to NVIDIA GPUs
Felix Abecassis [Tue, 19 Dec 2017 00:17:23 +0000 (16:17 -0800)]
hooks: add mount hook to configure access to NVIDIA GPUs

This hook requires the nvidia-container-cli tool provided by libnvidia-container:
https://github.com/nvidia/libnvidia-container

For containers that do not have CUDA_VERSION or NVIDIA_VISIBLE_DEVICES
set in the environment, the hook will be a no-op.

To enable in the configuration file:
lxc.hook.mount = /usr/local/share/lxc/hooks/nvidia

Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
6 years agoMerge pull request #2049 from brauner/2017-12-18/start_reap_attacher_process
Serge Hallyn [Mon, 18 Dec 2017 16:49:50 +0000 (10:49 -0600)]
Merge pull request #2049 from brauner/2017-12-18/start_reap_attacher_process

start: reap intermediate process

6 years agostart: reap intermediate process
Christian Brauner [Mon, 18 Dec 2017 13:08:02 +0000 (14:08 +0100)]
start: reap intermediate process

When we inherit namespaces we need to reap the attaching process.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2031 from tanyifeng/mask_and_readonly_path
Christian Brauner [Mon, 18 Dec 2017 11:12:59 +0000 (12:12 +0100)]
Merge pull request #2031 from tanyifeng/mask_and_readonly_path

conf.c: add relative option for lxc.mount.entry

6 years agoconf.c: add relative option for lxc.mount.entry
Yifeng Tan [Mon, 18 Dec 2017 16:50:58 +0000 (00:50 +0800)]
conf.c: add relative option for lxc.mount.entry

Signed-off-by: Yifeng Tan <tanyifeng1@huawei.com>
6 years agoMerge pull request #2040 from brauner/2017-12-14/bugfixes
Serge Hallyn [Fri, 15 Dec 2017 02:10:39 +0000 (20:10 -0600)]
Merge pull request #2040 from brauner/2017-12-14/bugfixes

lxc_init: fix cgroup parsing

6 years agoMerge pull request #2034 from brauner/2017-12-14/use_clone_in_run_command
Serge Hallyn [Thu, 14 Dec 2017 22:29:04 +0000 (16:29 -0600)]
Merge pull request #2034 from brauner/2017-12-14/use_clone_in_run_command

utils: use lxc_raw_clone() in run_command()

6 years agolxc_init: fix cgroup parsing
Christian Brauner [Thu, 14 Dec 2017 22:00:04 +0000 (23:00 +0100)]
lxc_init: fix cgroup parsing

coverity: #1426132
coverity: #1426133

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agotools: add missing break to lxc-execute
Christian Brauner [Thu, 14 Dec 2017 21:45:56 +0000 (22:45 +0100)]
tools: add missing break to lxc-execute

coverity: #1426131

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2039 from brauner/2017-12-14/fix_command_socket_race
Serge Hallyn [Thu, 14 Dec 2017 21:56:24 +0000 (15:56 -0600)]
Merge pull request #2039 from brauner/2017-12-14/fix_command_socket_race

commands: fix race when open()/close() cmd socket

6 years agoutils: use lxc_raw_clone() in run_command()
Christian Brauner [Thu, 14 Dec 2017 15:42:00 +0000 (16:42 +0100)]
utils: use lxc_raw_clone() in run_command()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agonamespace: add lxc_raw_clone()
Christian Brauner [Thu, 14 Dec 2017 14:31:54 +0000 (15:31 +0100)]
namespace: add lxc_raw_clone()

This is based on raw_clone in systemd but adapted to our needs. The main reason
is that we need an implementation of fork()/clone() that does guarantee us that
no pthread_atfork() handlers are run. While clone() in glibc currently doesn't
run pthread_atfork() handlers we should be fine but there's no guarantee that
this won't be the case in the future. So let's do the syscall directly - or as
direct as we can. An additional nice feature is that we get fork() behavior,
i.e. lxc_raw_clone() returns 0 in the child and the child pid in the parent.

Our implementation tries to make sure that we cover all cases according to
kernel sources. Note that we are not interested in any arguments that could be
passed after the stack.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2008 from tych0/share-ns-in-execute
Christian Brauner [Thu, 14 Dec 2017 20:37:41 +0000 (21:37 +0100)]
Merge pull request #2008 from tych0/share-ns-in-execute

add --share-$NS= support to lxc-execute

6 years agoMerge pull request #2037 from hallyn/2017-12-14/dir_detect_eperm
Christian Brauner [Thu, 14 Dec 2017 20:07:22 +0000 (21:07 +0100)]
Merge pull request #2037 from hallyn/2017-12-14/dir_detect_eperm

dir_detect: warn on eperm

6 years agoMerge pull request #2035 from adrianreber/master
Christian Brauner [Thu, 14 Dec 2017 20:06:17 +0000 (21:06 +0100)]
Merge pull request #2035 from adrianreber/master

criu: add feature check capability

6 years agocommands: fix race when open()/close() cmd socket
Christian Brauner [Thu, 14 Dec 2017 19:57:15 +0000 (20:57 +0100)]
commands: fix race when open()/close() cmd socket

When we report STOPPED to a caller and then close the command socket it is
technically possible - and I've seen this happen on the test builders - that a
container start() right after a wait() will receive ECONNREFUSED because it
called open() before we close(). So for all new state clients simply close the
command socket. This will inform all state clients that the container is
STOPPED and also prevents a race between a open()/close() on the command socket
causing a new process to get ECONNREFUSED because we haven't yet closed the
command socket.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocriu: add a test case for the criu feature check support
Adrian Reber [Wed, 13 Dec 2017 11:14:58 +0000 (12:14 +0100)]
criu: add a test case for the criu feature check support

This adds a simple test case which verifies that the new migrate() API
command 'MIGRATE_FEATURE_CHECK' works as expected.

If a feature does not exist on the currently running
architecture/kernel/criu combination it does not report an error as this
is a valid scenario.

Signed-off-by: Adrian Reber <areber@redhat.com>
6 years agocriu: add feature check capability
Adrian Reber [Wed, 13 Dec 2017 11:04:02 +0000 (12:04 +0100)]
criu: add feature check capability

For migration optimization features like pre-copy or post-copy migration
the support cannot be determined by simply looking at the CRIU version.
Features like that depend on the architecture/kernel/criu combination
and CRIU offers a feature checking interface to query if it is
supported.

This adds a LXC interface to query CRIU for those feature via the
migrate() API call. For the recent pre-copy migration support in LXD
this can be used to automatically detect if pre-copy migration should be
used.

In addition to the existing migrate() API commands this adds a new
command: 'MIGRATE_FEATURE_CHECK'.

The migrate_opts{} structure is extended by the member features_to_check
which is a bitmask defining which CRIU features should be queried.

Currently only the querying of the features FEATURE_MEM_TRACK and
FEATURE_LAZY_PAGES is supported.

Signed-off-by: Adrian Reber <areber@redhat.com>
6 years agodir_detect: warn on eperm
Serge Hallyn [Thu, 14 Dec 2017 19:16:02 +0000 (13:16 -0600)]
dir_detect: warn on eperm

if user has lxc.rootfs.path = /some/path/foo, but can't access
some piece of that path, then we'll get an unhelpful "failed to
mount" without any indication of the problem.

At least show that there is a permission problem.

Signed-off-by: Serge Hallyn <shallyn@cisco.com>
6 years agothe bike shed should be brilliant purple
Tycho Andersen [Thu, 14 Dec 2017 17:38:16 +0000 (17:38 +0000)]
the bike shed should be brilliant purple

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
6 years agoMerge pull request #2026 from brauner/2017-12-12/lxc_hook_version
Serge Hallyn [Thu, 14 Dec 2017 15:27:46 +0000 (09:27 -0600)]
Merge pull request #2026 from brauner/2017-12-12/lxc_hook_version

confile: add lxc.hook.version

6 years agonetwork: pass name of peer veth device
Christian Brauner [Thu, 14 Dec 2017 13:19:27 +0000 (14:19 +0100)]
network: pass name of peer veth device

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoconf: simplify run_script_argv()
Christian Brauner [Thu, 14 Dec 2017 13:15:18 +0000 (14:15 +0100)]
conf: simplify run_script_argv()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agonetwork: pass info in env if hook version is 1
Christian Brauner [Tue, 12 Dec 2017 12:30:54 +0000 (13:30 +0100)]
network: pass info in env if hook version is 1

Unblocks #2013.
Unblocks #2015.
Closes #1766.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agostart: pass namespaces as environment variables
Christian Brauner [Sun, 10 Dec 2017 12:53:32 +0000 (13:53 +0100)]
start: pass namespaces as environment variables

Unblocks #2013.
Unblocks #2015.
Closes #1766.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoconf: execute hooks based on lxc.hooks.version
Christian Brauner [Sun, 10 Dec 2017 11:54:00 +0000 (12:54 +0100)]
conf: execute hooks based on lxc.hooks.version

Unblocks #2013.
Unblocks #2015.
Closes #1766.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agostart: set LXC_HOOK_VERSION
Christian Brauner [Mon, 11 Dec 2017 11:10:37 +0000 (12:10 +0100)]
start: set LXC_HOOK_VERSION

This can be used by scripts to detect what version of the hooks are used.

Unblocks #2013.
Unblocks #2015.
Closes #1766.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoconfile: add lxc.hook.version
Christian Brauner [Sun, 10 Dec 2017 11:53:25 +0000 (12:53 +0100)]
confile: add lxc.hook.version

Unblocks #2013.
Unblocks #2015.
Closes #1766.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2030 from brauner/2017-12-13/fix_cgroup_namsepace_recording
Serge Hallyn [Thu, 14 Dec 2017 06:45:52 +0000 (00:45 -0600)]
Merge pull request #2030 from brauner/2017-12-13/fix_cgroup_namsepace_recording

start: fix cgroup namespace preservation

6 years agoSHARE_NS options should be before OPT_USAGE
Tycho Andersen [Thu, 14 Dec 2017 00:57:48 +0000 (00:57 +0000)]
SHARE_NS options should be before OPT_USAGE

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
6 years agoinit: don't kill(-1) if we aren't in a pid ns
Tycho Andersen [Fri, 8 Dec 2017 23:23:26 +0000 (23:23 +0000)]
init: don't kill(-1) if we aren't in a pid ns

...otherwise we'll kill everyone on the machine. Instead, let's explicitly
try to kill our children. Let's do a best effort against fork bombs by
disabling forking via the pids cgroup if it exists. This is best effort for
a number of reasons:

* the pids cgroup may not be available
* the container may have bind mounted /dev/null over pids.max, so the write
  doesn't do anything

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
6 years agoMerge pull request #2017 from brauner/generic/patch_testing
Stéphane Graber [Wed, 13 Dec 2017 18:26:32 +0000 (13:26 -0500)]
Merge pull request #2017 from brauner/generic/patch_testing

coverity: bugfixes

6 years agoMerge pull request #2025 from brauner/2017-12-12/fix_network_attach_and_detach
Stéphane Graber [Wed, 13 Dec 2017 18:22:31 +0000 (13:22 -0500)]
Merge pull request #2025 from brauner/2017-12-12/fix_network_attach_and_detach

lxccontainer: only attach netns on netdev detach

6 years agoMerge pull request #2024 from brauner/2017-11-12/fix_lxc_execute
Stéphane Graber [Wed, 13 Dec 2017 18:03:42 +0000 (13:03 -0500)]
Merge pull request #2024 from brauner/2017-11-12/fix_lxc_execute

tools: block using lxc-execute without config file

6 years agoMerge pull request #2022 from 3XX0/exec-run-script
Stéphane Graber [Wed, 13 Dec 2017 18:02:03 +0000 (13:02 -0500)]
Merge pull request #2022 from 3XX0/exec-run-script

conf: avoid spawning unnecessary subshells

6 years agoMerge pull request #2029 from brauner/2017-12-12/do_not_unconditionally_dup_stdfds_fo...
Stéphane Graber [Wed, 13 Dec 2017 17:58:58 +0000 (12:58 -0500)]
Merge pull request #2029 from brauner/2017-12-12/do_not_unconditionally_dup_stdfds_for_execute

start: do not unconditionally dup std{in,out,err}

6 years agoMerge pull request #2010 from tanyifeng/set_oom_score_adj
Christian Brauner [Wed, 13 Dec 2017 10:24:47 +0000 (11:24 +0100)]
Merge pull request #2010 from tanyifeng/set_oom_score_adj

confile: add lxc.proc.* to set proc filesystem

6 years agoconfile: add lxc.proc.* to set proc filesystem
Yifeng Tan [Thu, 7 Dec 2017 18:00:47 +0000 (02:00 +0800)]
confile: add lxc.proc.* to set proc filesystem

Signed-off-by: Yifeng Tan <tanyifeng1@huawei.com>
6 years agostart: fix cgroup namespace preservation
Christian Brauner [Tue, 12 Dec 2017 23:22:47 +0000 (00:22 +0100)]
start: fix cgroup namespace preservation

Prior to this patch we raced with a very short-lived init process. Essentially,
the init process could exit before we had time to record the cgroup namespace
causing the container to abort and report ABORTING to the caller when it
actually started just fine. Let's not do this.

(This uses syscall(SYS_getpid) in the the child to retrieve the pid just in case
we're on an older glibc version and we end up in the namespace sharing branch
of the actual lxc_clone() call.)

Additionally this fixes the shortlived tests. They were faulty so far and
should have actually failed because of the cgroup namespace recording race but
the ret variable used to return from the function was not correctly
initialized. This fixes it.
Furthermore, the shortlived tests used the c->error_num variable to determine
success or failure but this is actually not correct when the container is
started daemonized.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agotools: exit success when lxc-execute is daemonized
Christian Brauner [Tue, 12 Dec 2017 20:05:39 +0000 (21:05 +0100)]
tools: exit success when lxc-execute is daemonized

The error_num value doesn't tell us anything since the container hasn't exited.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agostart: do not unconditionally dup std{in,out,err}
Christian Brauner [Tue, 12 Dec 2017 19:09:06 +0000 (20:09 +0100)]
start: do not unconditionally dup std{in,out,err}

Starting with commit

    commit c5b93afba1d79c6861a6f45db2943b6f3cfbdab4
    Author: Li Feng <lifeng68@huawei.com>
    Date:   Mon Jul 10 17:19:52 2017 +0800

        start: dup std{in,out,err} to pty slave

        In the case the container has a console with a valid slave pty file descriptor
        we duplicate std{in,out,err} to the slave file descriptor so console logging
        works correctly. When the container does not have a valid slave pty file
        descriptor for its console and is started daemonized we should dup to
        /dev/null.

        Closes #1646.

Signed-off-by: Li Feng <lifeng68@huawei.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
we made std{err,in,out} a duplicate of the slave file descriptor of the console
if it existed. This meant we also duplicated all of them when we executed
application containers in the foreground even if some std{err,in,out} file
descriptor did not refer to a {p,t}ty. This blocked use cases such as:

    echo foo | lxc-execute -n -- cat

which are very valid and common with application containers but less common
with system containers where we don't have to care about this. So my suggestion
is to unconditionally duplicate std{err,in,out} to the console file descriptor
if we are either running daemonized - this ensures that daemonized application
containers with a single bash shell keep on working - or when we are not
running an application container. In other cases we only duplicate those file
descriptors that actually refer to a {p,t}ty. This logic is similar to what we
do for lxc-attach already.

Refers to #1690.
Closes #2028.

Reported-by: Felix Abecassis <fabecassis@nvidia.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoconf: non-functional changes
Christian Brauner [Sun, 10 Dec 2017 11:52:30 +0000 (12:52 +0100)]
conf: non-functional changes

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocoverity: #1426028
Christian Brauner [Sat, 9 Dec 2017 19:04:46 +0000 (20:04 +0100)]
coverity: #1426028

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocoverity: #1425857
Christian Brauner [Sat, 9 Dec 2017 19:00:40 +0000 (20:00 +0100)]
coverity: #1425857

remove logically dead code

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocoverity: #1425858
Christian Brauner [Sat, 9 Dec 2017 18:59:11 +0000 (19:59 +0100)]
coverity: #1425858

free allocated memory

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocoverity: #1425859
Christian Brauner [Sat, 9 Dec 2017 18:53:43 +0000 (19:53 +0100)]
coverity: #1425859

check return value of snprintf()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocoverity: #1425860
Christian Brauner [Sat, 9 Dec 2017 18:51:55 +0000 (19:51 +0100)]
coverity: #1425860

remove logically dead code

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocoverity: #1425861
Christian Brauner [Sat, 9 Dec 2017 18:51:03 +0000 (19:51 +0100)]
coverity: #1425861

free allocated memory

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocoverity: #1425862
Christian Brauner [Sat, 9 Dec 2017 18:48:48 +0000 (19:48 +0100)]
coverity: #1425862

initialize handler

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocoverity: #1425863
Christian Brauner [Sat, 9 Dec 2017 18:32:03 +0000 (19:32 +0100)]
coverity: #1425863

remove logically dead code

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocoverity: #1425866
Christian Brauner [Sat, 9 Dec 2017 18:26:52 +0000 (19:26 +0100)]
coverity: #1425866

free allocated memory

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocoverity: #1425867
Christian Brauner [Sat, 9 Dec 2017 18:22:32 +0000 (19:22 +0100)]
coverity: #1425867

do not pass NULL pointer to chdir()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocoverity: #1425869
Christian Brauner [Sat, 9 Dec 2017 18:18:09 +0000 (19:18 +0100)]
coverity: #1425869

do not unmap prematurely

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocoverity: #1425870
Christian Brauner [Sat, 9 Dec 2017 18:16:25 +0000 (19:16 +0100)]
coverity: #1425870

check snprintf() return value

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agolxccontainer: cleanup {attach,detach}_interface()
Christian Brauner [Sun, 10 Dec 2017 01:45:54 +0000 (02:45 +0100)]
lxccontainer: cleanup {attach,detach}_interface()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agolxccontainer: only attach netns on netdev detach
Christian Brauner [Sun, 10 Dec 2017 01:41:14 +0000 (02:41 +0100)]
lxccontainer: only attach netns on netdev detach

Detaching network namespaces as an unprivileged user is currently not possible
and attaching to the user namespace will mean we are not allowed to move the
network device into an ancestor network namespace.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocoverity: #1425874 + cleanup
Christian Brauner [Sat, 9 Dec 2017 18:12:48 +0000 (19:12 +0100)]
coverity: #1425874 + cleanup

- check for memory allocation failure
- free allocated memory
- cleanup function

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agoMerge pull request #2021 from 3XX0/overlay-oob-copy
Christian Brauner [Tue, 12 Dec 2017 10:40:21 +0000 (11:40 +0100)]
Merge pull request #2021 from 3XX0/overlay-oob-copy

overlay: fix out-of-bounds copy

6 years agoconf: avoid spawning unnecessary subshells
Jonathan Calmels [Mon, 11 Dec 2017 22:43:06 +0000 (14:43 -0800)]
conf: avoid spawning unnecessary subshells

Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
6 years agotools: block using lxc-execute without config file
Christian Brauner [Tue, 12 Dec 2017 00:38:40 +0000 (01:38 +0100)]
tools: block using lxc-execute without config file

Moving away from internal symbols we can't do hacks like we currently do in
lxc-start and call internal functions like lxc_conf_init(). This is unsafe
anyway. Instead, we should simply error out if the user didn't give us a
configuration file to use. lxc-start refuses to start in that case already.

Relates to discussion in https://github.com/lxc/go-lxc/pull/96#discussion_r155075560 .
Closes #2023.

Reported-by: Felix Abecassis <fabecassis@nvidia.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agooverlay: fix out-of-bounds copy
Jonathan Calmels [Mon, 11 Dec 2017 22:49:57 +0000 (14:49 -0800)]
overlay: fix out-of-bounds copy

Signed-off-by: Jonathan Calmels <jcalmels@nvidia.com>
6 years agoMerge pull request #2020 from brauner/2017-12-11/clone
Serge Hallyn [Mon, 11 Dec 2017 19:52:05 +0000 (13:52 -0600)]
Merge pull request #2020 from brauner/2017-12-11/clone

start: intelligently use clone() on ns sharing

6 years agotests: add namespace sharing tests
Christian Brauner [Mon, 11 Dec 2017 13:47:24 +0000 (14:47 +0100)]
tests: add namespace sharing tests

This also ensures that the new more efficient clone() way of sharing namespaces
is tested.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agostart: intelligently use clone() on ns sharing
Christian Brauner [Mon, 11 Dec 2017 11:55:23 +0000 (12:55 +0100)]
start: intelligently use clone() on ns sharing

When I first solved this problem I went for a fork() + setns() + clone() model.
This works fine but has unnecessary overhead for a couple of reasons:

- doing a full fork() including copying file descriptor table and virtual
  memory
- using pipes to retrieve the pid of the second child (the actual container
  process)

This can all be avoided by being a little smart in how we employ the clone()
syscall:

- using CLONE_VM will let us get rid of using pipes since we can simply write
  to the handler because we share the memory with our parent
- using CLONE_VFORK will also let us get rid of using pipes since the execution
  of the parent is suspended until the child returns
- using CLONE_VM will not cause virtual memory to be copied
- using CLONE_FILES will not cause the file descriptor table to be copied

Note that the intermediate clone() is used with CLONE_VM. Some glibc versions
used to reset the pid/tid to -1 when CLONE_VM was used without CLONE_THREAD.
But since the memory between parent and child is shared on CLONE_VM this would
invalidate the getpid() cache that glibc used to maintain and so getpid() in
the child would return the parent's pid. This is all fixed in newer glibc
versions where the getpid() cache is removed and the pid/tid is not reset
anymore. However, if for whatever reason you - dear commiter - somehow need to
get the pid of the dummy intermediate process for do_share_ns() you need to
call syscall(__NR_getpid) directly. The next lxc_clone() call does not employ
CLONE_VM and will be fine.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocoverity: #1425879
Christian Brauner [Sat, 9 Dec 2017 18:00:37 +0000 (19:00 +0100)]
coverity: #1425879

do not double close file descriptor

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocoverity: #1425883
Christian Brauner [Sat, 9 Dec 2017 17:54:28 +0000 (18:54 +0100)]
coverity: #1425883

ensure \0-termination

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocoverity: #1425884
Christian Brauner [Sat, 9 Dec 2017 17:46:56 +0000 (18:46 +0100)]
coverity: #1425884

free allocated memory

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
6 years agocoverity: #1428855
Christian Brauner [Sat, 9 Dec 2017 17:45:47 +0000 (18:45 +0100)]
coverity: #1428855

remove logically dead code

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>