Edvinas Klovas [Sat, 3 May 2014 17:15:36 +0000 (19:15 +0200)]
archlinux template: added sigpwr handling to systemd (lxc-stop)
archlinux is using systemd and systemd's configuration does not have any
services setup to handle sigpwr hook which is sent by lxc-stop command. By
enabling sigpwr service we make sure that lxc-stop will work.
Serge Hallyn [Thu, 1 May 2014 20:27:55 +0000 (15:27 -0500)]
cgmanager: use absolute cgroup path to switch cgroups at attach
If an unprivileged user does 'lxc-start -n u1' in one
login session, followed by 'lxc-attach -n u1' in another
session, the attach will fail if the sessions are in different
cgroups. The same is true of lxc-cgroup commands.
Address this by using the GetPidCgroupAbs and MovePidAbs
which work with the containers' cgroup path relative to
the cgproxy.
Since GetPidCgroupAbs is new to api version 3 in cgmanager,
use the old method if we are on an older cgmanager.
Serge Hallyn [Fri, 2 May 2014 18:36:32 +0000 (13:36 -0500)]
cgmanager: also handle named subsystems (like name=systemd)
Read /proc/self/cgroup instead of /proc/cgroups, so as to catch
named subsystems. Otherwise the contaienrs will not be fully
moved into the container cgroups.
lxc.mount.auto: improve defaults for cgroup and cgroup-full
If the user specifies cgroup or cgroup-full without a specifier (:ro,
:rw or :mixed), this changes the behavior. Previously, these were
simple aliases for the :mixed variants; now they depend on whether the
container also has CAP_SYS_ADMIN; if it does they resolve to the :rw
variants, if it doesn't to the :mixed variants (as before).
If a container has CAP_SYS_ADMIN privileges, any filesystem can be
remounted read-write from within, so initially mounting the cgroup
filesystems partially read-only as a default creates a false sense of
security. It is better to default to full read-write mounts to show the
administrator what keeping CAP_SYS_ADMIN entails.
If an administrator really wants both CAP_SYS_ADMIN and the :mixed
variant of cgroup or cgroup-full automatic mounts, they can still
specify that explicitly; this commit just changes the default without
specifier.
Currently, setup_caps and dropcaps_except both use the same parsing
logic for parsing capabilities (try to identify by name, but allow
numerical specification). Since this is a common routine, separate it
out to improve maintainability and reuseability.
Signed-off-by: Christian Seiler <christian@iwakd.de> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Ubuntu containers have had trouble with automatic cgroup mounting that
was not read-write (i.e. lxc.mount.auto = cgroup{,-full}:{ro,mixed}) in
containers without CAP_SYS_ADMIN. Ubuntu's mountall program reads
/lib/init/fstab, which contains an entry for /sys/fs/cgroup. Since
there is no ro option specified for that filesystem, mountall will try
to remount it readwrite if it is already mounted. Without
CAP_SYS_ADMIN, that fails and mountall will interrupt boot and wait for
user input on whether to proceed anyway or to manually fix it,
effectively hanging container bootup.
This patch makes sure that /sys/fs/cgroup is always a readwrite tmpfs,
but that the actual cgroup hierarchy paths (/sys/fs/cgroup/$subsystem)
are readonly if :ro or :mixed is used. This still has the desired
effect within the container (no cgroup escalation possible and programs
get errors if they try to do so anyway), while keeping Ubuntu
containers happy.
Stéphane Graber [Tue, 6 May 2014 03:34:04 +0000 (22:34 -0500)]
python-lxc: minor fixes to __init__.py
Set a base class for the network object and set the encoding in the
header. Neither of those changes are required for python3 but they do
make it easier for anyone trying to make a python2 binding.
Stéphane Graber [Mon, 5 May 2014 15:51:19 +0000 (10:51 -0500)]
lxc-ls: Force running against containers without python
When using --nesting, we exec ourselves in the container context, if we
somehow need to dynamically-load modules from there, things break. So
make sure we pre-load everything we may need.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Fri, 2 May 2014 16:35:10 +0000 (11:35 -0500)]
cgfs: don't mount /sys/fs/cgroup readonly
/sys/fs/cgroup is just a size-limited tmpfs, and making it ro does
nothing to affect our ability alter mount settings of its subdirs.
OTOH making it ro can upset mountall in the container which tries
to remount it rw, which may be refused.
Stéphane Graber [Fri, 2 May 2014 15:16:51 +0000 (11:16 -0400)]
lxc-ls: Allow the use of --groups without --fancy
There wasn't a good reason for that limit, we can simply make the code
slightly slower when --groups is passed and still have the expected
output even without --fancy.
Stéphane Graber [Thu, 1 May 2014 22:35:21 +0000 (18:35 -0400)]
lxc-ls: Update lxc.group handling
This introduces a new -g/--group argument to filter containers based on
their groups.
This supports the rather obvious: --group blah
Which will only list containers that are in group blah.
It may also be passed multiple times: --group blah --group bleh
Which will list containers that are in either (or both) blah or bleh.
And it also takes: --group blah,bleh --group doh
Which will list containers that are either in BOTH blah and bleh or in doh.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Michael H. Warfield <mhw@WittsEnd.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
lxc-oracle: fix warnings/errors from some rpm scriptlets
- Some scriptlets expect fstab to exist so create it before doing the
yum install
- Set the rootfs selinux label same as the hosts or else the PREIN script
from initscripts will fail when running groupadd utmp, which prevents
creation of OL4.x containers on hosts > OL6.x.
- Move creation of devices into a separate function
When outputing the lxc.arch setting, use i686 instead of x86 since the
later is not a valid input to setarch, nor will the kernel output
UTS_MACHINE as x86. The kernel sets utsname.machine to i[3456]86, which
all map to PER_LINUX32.
This only converts punctuation marks from FULLWIDTH COMMA/FULL STOP to
IDEOGRAPHIC COMMA/FULL STOP in Japanese man pages. The contents of man
pages do not change at all.
When attempting to create the compulsory symlinks in /dev,
check for the existence of the link using stat first before
blindly attempting to create the link.
This works around an apparent quirk in the kernel VFS on read-only
file systems where the returned error code might be EEXIST or EROFS
depending on previous access to the /dev directory and its entries.
Reported-by: William Dauchy <william@gandi.net> Signed-off-by: Michael H. Warfield <mhw@WittsEnd.com> Tested-by: William Dauchy <william@gandi.net>
Originally we kept snapshots under /var/lib/lxcsnaps. If a
separate btrfs is mounted at /var/lib/lxc, then we can't
make btrfs snapshots under /var/lib/lxcsnaps.
This patch moves the default directory to /var/lib/lxc/lxcsnaps.
If /var/lib/lxcsnaps already exists, then use that. Don't allow
any container to be used with the name 'lxcsnaps'.
lxc startup: manually mark every shared mount entry as slave
If you 'ip netns add x1', this creates /run/netns and /run/netns/x1
as shared mounts. When a container starts, it umounts these after
pivot_root, and the umount is propagated to the host.
Worse, doing mount("", "/", NULL, MS_SLAVE|MS_REC, NULL) does not
suffice to change those, even after binding /proc/mounts onto
/etc/mtab.
So, I give up. Do this manually, walking over /proc/self/mountinfo
and changing the mount propagation on everything marked as shared.
With this patch, lxc-start no longer unmounts /run/netns/* on the
host.
This makes it so that the host doesn't need to have an old, compat
version of db43_load installed by using the db_load from the just
installed container. Some newer distributions do not even have an old
enough compat-db4 package available.
lxc-oracle: allow installing from arbitrary yum repo
With this change, you can install a container from a mounted .iso, or any
yum repo with the necessary packages. Unlike the --url option, the repo
does not need to be a mirror of public-yum, but the arch and release must
be specified. For example to install OL6.5 from an .iso image:
mount -o loop OracleLinux-R6-U5-Server-x86_64-dvd.iso /mnt
lxc-create -n OL6.5 -t oracle -- --baseurl=file:///mnt -a x86_64 -R 6.5
The template will create two yum .repo files within the container such that
additional packages can be installed from local media, or the container can
be updated from public-yum, whichever is available. Local media must be bind
mounted from the host onto the containers' /mnt for the former .repo to work:
Recent fixes in the apparmor kernel code is now making at least the CI
environment and quite possibly some others fail due to an invalid path
in the pivot_root stanza.
So update both lines to allow a more generic pivot_root call for
anything in LXC's work directory.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
In this patch I tried to stick with each file's coding style, however I
think we should probably change that. Every main() should always not
return and only exit; they should always return EXIT_SUCCESS or EXIT_FAILURE
with the only exceptions being cases where we are returning a child's
exit status (lxc_execute, lxc_attach, lxc_init).
When rebooting an unprivileged container, netpipe starts out
as not -1. If count_veths somehow changed this could lead
to trying to send data over nonexistent pipe. (Ok can't
*really* happen, as it currently stands, but it's an open
end)
Leonid Isaev [Tue, 1 Apr 2014 02:24:31 +0000 (22:24 -0400)]
archlinux: Code cleanups (v2)
Cleanups:
1. Do not modify container's /etc/hosts (archlinux uses /etc/nsswitch.conf)
2. Remove duplicate lines from config
3. Print a nicer final message
4. Get rid of some grep's
Signed-off-by: Leonid Isaev <lisaev@umail.iu.edu> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Leonid Isaev [Mon, 31 Mar 2014 21:14:34 +0000 (17:14 -0400)]
archlinux: Code cleanups
Cleanups:
1. Do not modify container's /etc/hosts (archlinux uses /etc/nsswitch.conf)
2. Remove duplicate lines from config
3. Print a nicer final message
4. Get rid of some grep's in favor of bash regex
Signed-off-by: Leonid Isaev <lisaev@umail.iu.edu> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
When lxc-info's stdout is not line buffered (ie. "lxc-info -n foo |more")
the first three lines will be duplicated. This is because c->get_ips()
comes next and it forks and the child will exit() causing its fds to be
closed which flushes out its (fork duplicated) stdio buffers. The lines are
then duplicated when the parent actually gets around to flushing out its
stdio. This causes problems for programs (such as the lxc-webpanel) which
are popen()ing lxc-info.
The fix here isn't necessarily the right one, but does show what the
problem is. Seems like maybe we should fix this inside of get_ips(), for
other API callers as well.
Allow writes to kernel.shm*, net.*, kernel/domainname and
kernel/hostname,
Also fix a bug in the lxc-generate-aa-rules.py script in a
path which wasn't being exercised before, which returned a
path element rather than its child.
This should help it run better on slow test environment like the LXC CI
armhf builder.
- Wait longer for the container to start
- Wait longer for the container to shutdown
- On failure to shutdown, kill the container
- Always destroy the container if it's around
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Sat, 29 Mar 2014 02:05:31 +0000 (21:05 -0500)]
apparmor: auto-generate the blacklist rules
This uses the generate-apparmor-rules.py script I sent out some time
ago to auto-generate apparmor rules based on a higher level set of
block/allow rules.
Add apparmor policy testcase to make sure that some of the paths we
expect to be denied (and allowed) write access to are in fact in
effect in the final policy.
With this policy, libvirt in a container is able to start its
default network, which previously it could not.
v2: address feedback from stgraber
put lxc-generate-aa-rules.py into EXTRA_DIST
add lxc-test-apparmor, container-base and container-rules to .gitignore
take lxc-test-apparmor out of EXTRA_DIST
make lxc-generate-aa-rules.py pep8-compliant
don't automatically generate apparmor rules
This is only bc we can't be guaranteed that python3 will be
available.
Dwight Engen [Thu, 27 Mar 2014 20:46:38 +0000 (16:46 -0400)]
add yum plugin to repatch rootfs on yum update
oracle-template: Split patching rootfs vs one time setup into separate
shell functions so the template can be run with --patch.
oracle-template: Update to install the yum plugin and itself (as lxc-patch)
into a container. The plugin just runs lxc-patch --patch <path> so it is
fairly generic, but in this case it is running a copy of the template inside
the container.
Serge Hallyn [Thu, 27 Mar 2014 15:36:06 +0000 (10:36 -0500)]
move lxc-init to /sbin/init.lxc
Using the multiarch dir causes problems when running lxc-execute
on amd64 with an i386 container. /sbin/lxc-init is a more confusing
name and will show up in 'lxc<tab>'. /sbin/init.lxc should be quite
obvious as an init for lxc.
Add LXC_NET_NONE to known lxc_network_types, so parsing a config
file with lxc.network.type = none does not result in failure
(e.g. doc/examples/lxc-no-netns.conf). Options have also been
reordered to match the enum in conf.h.
Serge Hallyn [Tue, 25 Mar 2014 20:50:06 +0000 (15:50 -0500)]
commands: handle epipe
If we start a lxc_wait on a container while it is exiting, it is
possible that we open the command socket, then the command socket
monitor closes all its mainloop sockets and exit, then we send our
credentials. Then we get killed by SIGPIPE.
Handle that case, recognizing that if we get sigpipe then the
container is (now) stopped.
Added root_password_expired password control tuning knob.
Added the environment variable "root_password_expired" to
control if the initial, temporary, root password is initially
set up as "expired". If set to "yes" (default), the root password
is set as "expired" and the user must change it at first login.
Signed-off-by: Michael H. Warfield <mhw@WittsEnd.com> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Set timezone for new container if not previously defined.
If the container does not already contain an /etc/localtime
timezone definition, then copy a definition from the host to
the container. This is often a symlink to an appropriate
system timezone definition files and is presumed to exist in
Signed-off-by: Michael H. Warfield <mhw@WittsEnd.com> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Fix arch cross-build when running distro cross-build.
Corner case existed when building a cross-arch container (i686 on x86_64)
on a cross-distro host (Fedora container on Ubuntu host). Fixed the
arch "fixup" code to do the right thing when running from the bootstrap.
Signed-off-by: Michael H. Warfield <mhw@WittsEnd.com> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Dwight Engen [Tue, 11 Mar 2014 19:44:54 +0000 (15:44 -0400)]
fix console stdin,stdout,stderr fds
The fds for stdin,stdout,stderr that we were leaving open for /sbin/init
in the container were those from /dev/tty or lxc.console (if given), which
wasn't right. Inside the container it should only have access to the pty
that lxc creates representing the console.
This was noticed because busybox's init was resetting the termio on its
stdin which was effecting the actual users terminal instead of the pty.
This meant it was setting icanon so were were not passing keystrokes
immediately to the pty, and hence command line history/editing wasn't
working.
Fix by dup'ing the console pty to stdin,stdout,stderr just before
exec()ing /sbin/init. Fix fd leak in error handling that I noticed while
going through this code.
Also tested with lxc.console = none, lxc.console = /dev/tty7 and no
lxc.console specified.
V2: The first version was getting EBADF sometimes on dup2() because
lxc_console_set_stdfds() was being called after lxc_check_inherited()
had already closed the fds for the pty. Fix by calling
lxc_check_inherited() as late as possible which also extends coverage
of open fd checked code.
V3: Don't move lxc_check_inherited() since it needs to be called while
the tmp proc mount is still mounted. Move call to lxc_console_set_stdfds()
just before it.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Thu, 20 Mar 2014 04:55:00 +0000 (23:55 -0500)]
mutex cgmanager access
It looks like either libdbus or libnih is showing some corruption with
threaded access to the cgmanager-client library. Until we can
straighten that out, mutex access to the cgmanager.
The worst part of this is having to take and drop the mutex at every
fork. This also means that we can't keep a connection open for the
duration of container startup, since that would deadlock forks.
If we were going to keep it like this, then we could get rid of some
code in start.c. However we take a performance hit here which I
really hope we can rectify soon.
The other approach we could take would be to keep a global count of
references to cgroup_manager. Mutex the open, close, and each use
of the cgroup_manager proxy (and the inc/dec of the refcount). This
way we could in fact keep the connection open for the duration of
container start. The atfork handler child_fn would have to close
the connection if open.
Holger Amann [Wed, 19 Mar 2014 06:06:13 +0000 (07:06 +0100)]
debian: Symlink /etc/mtab
/etc/mtab doesn’t exist after bootstrapping a debian container, and will
be created as regular file after first start.
That leads to at least two errors:
- output of `mount` is wrong and get messed up the more often you
start/stop the container
- /dev/pts/ptmx has wrong permissions