Dwight Engen [Thu, 13 Feb 2014 21:13:03 +0000 (16:13 -0500)]
create fd, stdin, stdout, stderr symlinks in /dev
The kernel's Documentation/devices.txt says that these symlinks should
exist in /dev (they are listed in the "Compulsory" section). I'm not
currently adding nfsd and X0R since they are required for iBCS, but
they can be easily added to the array later if need be.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Acked-by: Michael H. Warfield <mhw@WittsEnd.com> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Stéphane Graber [Thu, 13 Feb 2014 17:42:21 +0000 (12:42 -0500)]
lxc-start-ephemeral: Use attach
With this change, systems that support it will use attach to run any
provided command.
This doesn't change the default behaviour of attaching to tty1, but it
does make it much easier to script or even get a quick shell with:
lxc-start-ephemeral -o p1 -n p2 -- /bin/bash
I'm doing the setgid,initgroups,setuid,setenv magic in python rather
than using the attach_wait parameters as I need access to the pwd module
in the target namespace to grab the required information.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Thu, 13 Feb 2014 16:17:48 +0000 (11:17 -0500)]
coverity: Do chdir following chroot
We used to do chdir(path), chroot(path). That's correct but not properly
handled coverity, so do chroot(path), chdir("/") instead as that's the
recommended way.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Thu, 13 Feb 2014 06:52:52 +0000 (00:52 -0600)]
overlayfs_clonepaths: if unpriv then rsync in a userns
This allows lxc-snapshot and lxc-clone -s from an overlayfs container
to work unprivileged. (lxc-clone -s from a directory backed container
already did work)
Stéphane Graber [Wed, 12 Feb 2014 22:46:06 +0000 (17:46 -0500)]
Fix some configure.ac issues
- Run on distro without lsb_release
- Don't try and interpret with_runtime_path as a command
- Don't print stuff on screen while in the middle of a check
Stéphane Graber [Wed, 12 Feb 2014 22:30:12 +0000 (17:30 -0500)]
travis: Build using the daily PPA
Now that we depend on seccomp2, the backport currently in precise is too
old to allow for a succesful build, so instead use ppa:ubuntu-lxc/daily
which contains recent versions of all needed build-dependencies.
Serge Hallyn [Wed, 12 Feb 2014 21:50:20 +0000 (15:50 -0600)]
seccomp: introduce v2 policy (v2)
v2 allows specifying system calls by name, and specifying
architecture. A policy looks like:
2
whitelist
open
read
write
close
mount
[x86]
open
read
Also use SCMP_ACT_KILL by default rather than SCMP_ACT_ERRNO(31) -
which confusingly returns 'EMLINK' on x86_64. Note this change
is also done for v1 as I think it is worthwhile.
With this patch, I can in fact use a seccomp policy like:
2
blacklist
mknod errno 0
after which 'sudo mknod null c 1 3' silently succeeds without
creating the null device.
changelog v2:
add blacklist support
support default action
support per-rule action
Stéphane Graber [Wed, 12 Feb 2014 16:58:15 +0000 (11:58 -0500)]
lxc-start-ephemeral: Allow unprivileged run
This allows running lxc-start-ephemeral using overlayfs. aufs remains
blocked as it hasn't been looked at and patched to work in the kernel at
this point (not sure if it ever wil).
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Wed, 12 Feb 2014 04:20:03 +0000 (22:20 -0600)]
check for access to lxcpath
The previous check for access to rootfs->path failed in the case of
overlayfs or loop backign stores. Instead just check early on for
access to lxcpath.
Instead force a copy clone. Else if the user makes a change
to the original container, the snapshot will be affected.
The user should first create a snapshot clone, then use
and snapshot that clone while leaving the original container
untouched.
TAMUKI Shoichi [Sat, 8 Feb 2014 09:15:40 +0000 (18:15 +0900)]
lxc-plamo: various small changes
- Change redirection of fd 200 to 9 (greater than 9 may conflict with
fd the shell uses internally)
- Replace numeric line addressing of ed to regular expression to avoid
correcting the line addressing at each modification of init scripts
- Correct the option order (trivial)
Serge Hallyn [Fri, 7 Feb 2014 19:00:50 +0000 (13:00 -0600)]
add_device_node: act in a chroot
The goal is to avoid an absolute symlink in the guest redirecting
us to the host's /dev. Thanks to the libvirt team for considering
that possibility!
We want to work on kernels which do not support setns, so we simply
chroot into the container before doing any rm/mknod. If /dev/vda5
is a symlink to /XXX, or /dev is a symlink to /etc, this is now
correctly resolved locally in the chroot.
We would have preferred to use realpath() to check that the resolved
path is not changed, but realpath across /proc/pid/root does not
work as expected.
Stéphane Graber [Thu, 6 Feb 2014 22:11:51 +0000 (17:11 -0500)]
download: Fix previous change
The previous change to support http proxies only worked when http_proxy
was set... Instead add some detection code and only use :80 when using
http_proxy.
That's a bit of a workaround, but it's the only way I could find to get
GPG to work with http_proxy.
Dwight Engen [Wed, 5 Feb 2014 21:59:26 +0000 (16:59 -0500)]
split cgroup handling into discrete backends
- refactor cgroup into two backends, the classic cgfs driver and the new
cgmanager. Instead of lxc_handler knowing about the internals of each,
have it just store an opaque pointer to a struct that is private to
each backend.
- rename a couple of cgroup functions for consistency: those that are
considered an API (ie. exported by lxc.h) begin with lxc_ and those that
are not are just cgroup_*
- made as many backend routines static as possible, only cg*_ops_init is
exported
- made a nrtasks op which is needed by the utmp code for monitoring
container shutdown, currently only implemented for the cgfs backend
TAMUKI Shoichi [Thu, 6 Feb 2014 10:38:39 +0000 (19:38 +0900)]
templates: improve refusing to run unprivileged
For all templates except lxc-ubuntu-cloud and lxc-download, detect not
only --mapped-uid but also --mapped-gid and error out. Detecting will
not be done after -- parameter because of non-option parameters.
Also, change the mode of lxc-archlinux.in 100755 to 100644.
lxc.id_map bug when writing directly to /proc/pid/[ug]id_map [PATCH]
lxc.id_map bug when writing directly to /proc/pid/[ug]id_map
There's some code in src/lxc/conf.c that sets up the UID/GID mapping. It
can use the external newuidmap/newgidmap tools, or it can write to
/proc/pid/[ug]id_map directly. The latter case is broken: lines are written
without a newline (\n) at the end. This patch fixes that. Note that
I did not check if the newuidmap/newgidmap case still works. It should,
but I wasn't able to test it.
Signed-off-by: Miquel van Smoorenburg <mikevs@xs4all.net> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Tue, 4 Feb 2014 18:03:05 +0000 (13:03 -0500)]
logging: Add lxc_log_options_no_override function
In current LXC, loglevel and logfile are write-once functions.
That behaviour was appropriate when those two were first introduced
(pre-API) but with current API, one would expect to be able to
set_config_item those multiple times.
So instead, introduce lxc_log_options_no_override which when called
turns those two config keys read-only and have all existing binaries
which use log_init call that function once they're done setting the
value requested by the user.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Tue, 4 Feb 2014 16:16:07 +0000 (11:16 -0500)]
templates: Refuse to run unprivileged
Only the download and ubuntu-cloud templates work with unprivileged
containers, for all others, detect --mapped-uid and error out as early
as possible, recommending the use of the download template.
Harald Dunkel [Sun, 2 Feb 2014 20:33:15 +0000 (21:33 +0100)]
support a custom CentOS repository
This change introduces a flag --repo to the lxc-centos template
to allow using a local repository (e.g. a loop mounted installer
iso on your web server).
Signed-off-by: Harald Dunkel <harri@afaics.de> Acked-by: Michael H. Warfield <mhw@WittsEnd.com> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Mon, 3 Feb 2014 21:11:16 +0000 (15:11 -0600)]
cgmanager: have root escape to root cgroup before starting
If a user in cgroup /a/b/c does 'lxc-start -n u1', then u1
should be started under /a/b/c/u1. However if he does
'sudo lxc-start -n u1', then that cgroup shoudl start under
/lxc/u1.
Stéphane Graber [Fri, 31 Jan 2014 13:56:55 +0000 (13:56 +0000)]
shutdown: Rework API and lxc-stop
With this change, shutdown() will no longer call stop() after the
timeout, instead it'll just return false and it's up to the caller to
then call stop() if appropriate.
This also updates the bindings, tests and other scripts.
lxc-stop is then updated to do proper option checking and use shutdown,
stop or reboot as appropriate.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Fri, 31 Jan 2014 13:03:44 +0000 (13:03 +0000)]
cgmanager: chmod the container's base directory 775
In order for attach to work, the container owner must be able to
write to the tasks file. Therefore we make the container's cgroup
owned by the container root group, but the container owner uid.
So for the container root to be allowed to create new cgroups, it
needs group write perms.
With this patch, an unprivileged container with an
lxc.mount.auto = cgroup entry entry can run the cgproxy and pass
all cgmanager tests.
Acls would have been another way to do this, but are not yet being
used/exported by cgmanager.
Serge Hallyn [Thu, 30 Jan 2014 14:18:30 +0000 (14:18 +0000)]
cgmanager: support lxc.mount.auto = cgroup
If it (or any variation thereof) is in the container configuration,
then mount /sys/fs/cgroup/cgmanager.lower (if it exists) or
/sys/fs/cgroup/cgmanager into the container so it can run a
cgproxy.
Also make sure to clear our groups when we start or attach to a
container. Else with unprivileged containers we end up with
lots of nogroups listed in /proc/1/status.