1. in container_free, set c->privlock to NULL before calling
sem_destroy, to prevent a window where another thread could call
sem_wait(c->privlock) while c->privlock is not NULL but is already
destroyed.
2. in container_get, check for numthreads < 0 before calling lxclock.
Once numthreads is 0, it never goes back up.
* When the get()er checks numthreads the first time, one of the following
* is true:
* 1. freer has set numthreads = 0. get() returns 0
* 2. freer is between lxclock and setting numthreads to 0. get()er will
* sem_wait on privlock, get lxclock after freer() drops it, then see
* numthreads is 0 and exit without touching lxclock again..
* 3. freer has not yet locked privlock. If get()er runs first, then put()er
* will see --numthreads = 1 and not call lxc_container_free().
*/
oracle template: install additional user specified pkgs
Fix lxc-create to not word split template arguments. This makes
lxc-create -n ol -t oracle -- -r "at cronie wget" work since the argument
to -r will be passed as one arg instead of three.
Fix oracle template -u option to shift the correct amount.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Commit 37c3dfc9 sets the wait status on only the child pid. It
intended to match the pid only once to protect against pid reuse but it
won't because the indicator was reset to 0 every time at the top of the
loop. If the child pid is reused, the wait status will be set again.
Fix by setting indicator outside the loop.
Commit e3642c43 added lxc_copy_file for use in 64e1ae63. The use of it
was removed in commit 1bc60a65. Removing it reduces dead code and the
footprint of liblxc.
API shouldn't be calling create for already defined containers or destroy for non defined ones
Currently it always calls create/destroy which might be confusing for the code
that checks the return value of those calls to determine whether operation
completed successfully or not.
>>> c = lxc.Container("r")
>>> c.create("ubuntu")
True
>>> c.create("ubuntu")
True
>>> c.create("ubuntu")
True
>>> c.create("ubuntu")
True
>>> c.create("ubuntu")
>>> c.destroy()
True
>>> c.destroy()
lxc-destroy: 'r' does not exist
False
>>> c.destroy()
lxc-destroy: 'r' does not exist
False
Make lxc.functions return the default lxcpath if /etc/lxc/lxc.conf doesn't provide one
Currently it returns the default path only if /etc/lxc/lxc.conf missing.
Since default lxc.conf doesn't contain lxcpath variable (this is at least the case in ubuntu) all tools fails if one doesn't give -P
caglar@qgq:~/Project/lxc/examples$ sudo /usr/bin/lxc-create -n test
lxc-create: no configuration path defined
Signed-off-by: S.Çağlar Onur <caglar@10ur.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Christian Seiler [Sat, 30 Mar 2013 14:45:39 +0000 (15:45 +0100)]
lxc-attach: Implement --clear-env and --keep-env
This patch introduces the --clear-env and --keep-env options for
lxc-attach, that allows the user to specify whether the environment
should be passed on inside the container or not.
This is to be expanded upon in later versions, this patch only
introduces the most basic functionality.
Signed-off-by: Christian Seiler <christian@iwakd.de> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Christian Seiler [Sat, 30 Mar 2013 14:45:38 +0000 (15:45 +0100)]
lxc-shutdown: Make all processes exit before timeout if shutdown works
The following rationale is for using the -t option:
Currently, lxc-shutdown uses a subprocess for the timeout handling,
where a 'sleep $TIMEOUT' is executed, which will kill the main process
after the timeout has occurred, thus causing the main process to stop
the container hard with lxc-stop.
On the other hand, if the timeout is not reached, the main process
kills the subprocess. The trouble now is that if you kill a shell that
is running in the background, the kill will only take effect as soon as
the program currently running in the shell exits.
This in turn means that the subprocess will never terminate before
reaching the timeout. In an interactive shell, this does not matter,
since people will just not notice the process and lxc-shutdown returns
immediately. In a non-interactive enironment, however, there may be
circumstances that cause the calling program to wait until even that
subprocess is terminated, which means that shutdown will always take as
long as the timeout, even if the container shuts down quite a bit
earlier.
This change makes sure that also all subprocesses of the background
process are killed from the main process. This will immediately
terminate the background process, thus ensuring the desired behaviour.
Signed-off-by: Christian Seiler <christian@iwakd.de> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Thu, 28 Mar 2013 15:34:06 +0000 (10:34 -0500)]
rcfile shouldn't be recorded in lxc_conf if the attempt to load a config file fails
Though it's more subtle than that. If the file doesn't exist or we
can't access it, then don't record it. But if we have parse errors,
then do.
This is mainly to help out API users who try to read a container
configuration file before calling c->create(). If the file doesn't
exist, then without this patch the subsequent create() will not
use the default /etc/lxc/default.conf. The API user could check
for the file ahead of time, but this check makes his life easier
without costing us anything.
When you shut that down, the container stick around and can be
restarted. Now lxc-clone will recognize such a container by the
presence of the delta0/ which contains the read-write overlayfs
layer. This means you can do incremental development of containers,
i.e.
lxc-create -t ubuntu -n r1
lxc-start-ephemeral --keep-data -o r1 -n r1-2
# make some changes, poweroff
lxc-clone -o r1-2 -n r1-3
# make some changes...
lxc-clone -o r1-3 -n r1-4
# etc...
Now, as for design changes... from a higher level
1. lxc-clone should be re-written in c and exported through the
api.
2. lxc-clone should support overlayfs and aufs
3. lxc-start-ephemeral should become a thin layer which clones a
container, starts and stops and destroys it.
at a lower level,
1. the api should support container->setup_mounts
2. lxc-clone should be written as a set of backend classes which
can copy mounts to each other. So when you load a container
which is lvm-backed, it creates a lvm backend class. That
class instance can be converted into a loopback or qemu-nbd
or directory backed class. A directory-backed class can be
converted into a overlayfs or aufs backed class, which (a)
uses the dirctory-backed class as the read-only base, and (b)
pins the base container (so it can't be deleted until all
snapshots are deleted).
David Ward [Wed, 27 Mar 2013 01:27:52 +0000 (21:27 -0400)]
Set all mounts to MS_SLAVE when starting a container without a rootfs
If the filesystem mounts on the host have the MS_SHARED or MS_SLAVE
flag set, and a container without a rootfs is started, then any new
mounts created inside the container are currently propagated into
the host. In addition to mounts placed in the configuration file of
the container or performed manually after startup, the automatic
mounting of /proc by lxc-execute will propagate back into the host,
effectively crippling the entire system. This can be prevented by
setting the MS_SLAVE flag on all mounts (inside the container's own
mount namespace) during startup if a rootfs is not configured.
Signed-off-by: David Ward <david.ward@ll.mit.edu> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
== lxc-ubuntu-cloud support per architecture ==
amd64: amd64, i386
i386: i386
armel: armel, armhf
armhf: armhf, armel
Note that most of the foreign architectures on x86 are supported
through the use of qemu-user-static. This one however isn't yet
support for cloud images (I'll send a patch for 1.0).
Also, qemu-user-static is technically able to emulate amd64 on i386
but qemu-debootstrap doesn't appear to know that and fails quite miserably.
We may also want to add a test for amd64 kernel but i386 userspace, which
is a valid combination that allows running an amd64 container on an i386
host without requiring emulation, but that's for another patch.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Tue, 26 Mar 2013 16:38:47 +0000 (12:38 -0400)]
EXTRA_DIST: Fix missing files with "make dist"
I recently noticed that the generated tarballs with "make dist"
were incomplete unless the configure script was run on a machine
with all possible build dependencies.
That's wrong as you clearly don't need those dependencies to generate
the tarball. This change fixes that.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Tue, 26 Mar 2013 15:03:47 +0000 (11:03 -0400)]
python: Fix runtime failure on armhf
Recent testing on Ubuntu armhf showed that the python module was
failing to import. After some time tracking the issue down, the problem
was identified as being a non-terminated list of get/setters.
This commit fixes that issue as well as a few other potential ones that
were identified during debugging.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Ryota Ozaki [Sun, 17 Mar 2013 14:21:31 +0000 (23:21 +0900)]
Use $localstatedir/log/lxc for default log path
When we install lxc by manual (configure; make; make install),
all files are installed under /usr/local/. Configuration files
and setting files of containers are stored under /usr/local/ too,
however, only log files are stored under /var/log/ not
/usr/local/var/log.
This patch changes the default log path to $localstatedir/log/lxc
(by default $localstatedir is /usr/local/var) where is an ordinary
directory, which is probably expected and unsurprising.
Signed-off-by: Ryota Ozaki <ozaki.ryota@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Thu, 14 Mar 2013 03:21:15 +0000 (23:21 -0400)]
Add missing config.h includes.
conf.h and start.h weren't explicitly including config.h which meant that
depending on the ordering of the includes in whatever was including conf.h
or start.h, some pieces of the structs defined in those may be missing.
This led amongst other problems to the lxc_conf struct being wrong by 8 bytes
for functions from commands.c, leading to lxc-stop always failing.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Signed-off-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Wed, 13 Mar 2013 02:34:26 +0000 (21:34 -0500)]
cgroups: don't mount under init's cgroup
1. deeper hierarchy has steep performance costs
2. init may be under /init, but containers should be under /lxc
3. in a nested container we like to bind-mount $cgroup_path/$c/$c.real
into $cgroup_path - but task 1's cgroup is $c/$c.real, so a nested
container would be in $c/$c.real/lxc, which would become
/$c/$c.real/$c/$c.real/lxc when expanded
4. this pulls quite a bit of code (of mine) which is always nice
Dwight Engen [Mon, 11 Mar 2013 20:36:25 +0000 (16:36 -0400)]
uidmap: fix writing multiple ranges
The kernel requires a single atomic write for setting the /proc
idmap files. We were calling write(2) more than once when multiple
ranges were configured so instead build a buffer to pass in one write(2)
call.
Change id types to unsigned long to handle large id mappings gracefully.
Fix max id in example comment.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
I remember discussion about implementing proper way to shutdown
guests using different signals, so here's a patch proposal.
It allows to use specific signal numbers to shutdown guests
gracefully, for example SIGRTMIN+4 starts poweroff.target in
systemd.
Signed-off-by: Alexander Vladimirov <alexander.idkfa.vladimirov@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Dwight Engen [Tue, 12 Mar 2013 17:04:35 +0000 (13:04 -0400)]
oracle template: fixes for older releases
This fixes some issues found by Oracle QA, including several cosmetic
errors seen during container bootup.
The rpm database needs moving on Debian hosts similar to on Ubuntu.
I took Serge's suggestions: Do the yum install in an unshared
mount namespace so the /proc mount done during OL4 install doesn't
pollute the host. No need to blacklist ipv6 modules.
Make the default release 6.3, unless the host is OL, then default
to the same version as the host (same as Ubuntu template does).
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Wed, 6 Mar 2013 19:41:04 +0000 (13:41 -0600)]
attach: handle apparmor transitions in !NEWNS cases
If we're not attaching to the mount ns , then don't enter the
container's apparmor policy. Since we're running binaries from the host
and not the container, that actually seems the sane thing to do (besides
also the lazier thing).
If we dont' do this patch, then we will need to move the apparmor attach
past the procfs remount, will need to also mount securityfs if available,
and for the !remount_proc_sys case we'll want to mount those just long
enough to do the apparmor transition.
lxc-attach: User namespaces: Use init's user & group id when attaching
When attaching to a container with a user namespace, try to detect the
user and group ids of init via /proc and attach as that same user. Only
if that is unsuccessful, fall back to (0, 0).
Signed-off-by: Christian Seiler <christian@iwakd.de>
lxc-attach: Default to /bin/sh if shell cannot be determined or exec'd
If getpwuid() fails and also the fallback of spawning of a 'getent'
process, and the user specified no command to execute, default to
/bin/sh and only fail if even that is not available. This should ensure
that unless the container is *really* weird, no matter what, the user
should always end up with a shell when calling lxc-attach with no
further arguments.
Signed-off-by: Christian Seiler <christian@iwakd.de>
lxc-attach: Try really hard to determine login shell
If no command is specified, and using getpwuid() to determine the login
shell fails, try to spawn a process that executes the utility 'getent'.
getpwuid() may fail because of incompatibilities between the NSS
implementations on the host and in the container.
Signed-off-by: Christian Seiler <christian@iwakd.de>
Serge Hallyn [Fri, 1 Mar 2013 20:53:20 +0000 (14:53 -0600)]
cgroup: improve support for multiple lxcpaths (v3)
Add a monitor command to get the cgroup for a running container. This
allows container r1 started from /var/lib/lxc and container r1 started
from /home/ubuntu/lxcbase to pick unique cgroup directories (which
will be /sys/fs/cgroup/$subsys/lxc/r1 and .../r1-1), and all the lxc-*
tools to get that path over the monitor at lxcpath.
Rework the cgroup code. Before, if /sys/fs/cgroup/$subsys/lxc/r1
already existed, it would be moved to 'deadXXXXX', and a new r1 created.
Instead, if r1 exists, use r1-1, r1-2, etc.
I ended up removing both the use of cgroup.clone_children and support
for ns cgroup. Presumably we'll want to put support for ns cgroup
back in for older kernels. Instead of guessing whether or not we
have clone_children support, just always explicitly do the only thing
that feature buys us - set cpuset.{cpus,mems} for newly created cgroups.
Note that upstream kernel is working toward strict hierarchical
limit enforcements, which will be good for us.
NOTE - I am changing the lxc_answer struct size. This means that
upgrades to this version while containers are running will result
in lxc_* commands on pre-running containers will fail.
Changelog: (v3)
implement cgroup attach
fix a subtle bug arising when we lxc_get_cgpath() returned
STOPPED rather than -1 (STOPPED is 0, and 0 meant success).
Rename some functions and add detailed comments above most.
Drop all my lxc_attach changes in favor of those by Christian
Seiler (which are mostly the same, but improved).
When you clone a new user_ns, the child cannot write to the fds
opened by the parent. Hnadle this by doing an extra fork. The
grandparent hangs around and waits for its child to tell it the
pid of of the grandchild, which will be the one attached to the
container. The grandparent then moves the grandchild into the
right cgroup, then waits for the child who in turn is waiting on
the grandchild to complete.
Secondly, when attaching to a new user namespace, your old uid is
not valid, so you are uid -1. This patch simply does setid+setuid
to 0 if that is the case. We probably want to be smarter, but
for now this allows lxc-attach to work.
Signed-off-by: Christian Seiler <christian@iwakd.de>
Stéphane Graber [Fri, 1 Mar 2013 16:12:20 +0000 (11:12 -0500)]
python api_test: Drop use of @LXCPATH@
The python api test script was using @LXCPATH@ for one of its checks.
Now that the lxcpath is exposed by the lxc python module directly, this
can be dropped and api_test.py can now become a simple python file without
needing pre-processing by autoconf.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Thu, 28 Feb 2013 23:04:46 +0000 (18:04 -0500)]
lxc-ls: Implement support for nested containers
Add initial support for showing and querying nested containers.
This is done through a new --nesting argument to lxc-ls and uses
lxc-attach to go look for sub-containers.
Known limitations include the dependency on setns support for the PID
and NETWORK namespaces and the assumption that LXCPATH for the sub-containers
matches that of the host.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Tue, 19 Feb 2013 20:44:19 +0000 (15:44 -0500)]
Add example hooks from Ubuntu package
We've been shipping those two hooks for a while in Ubuntu.
Yesterday I reworked them to use the new environment variables and
avoid hardcoding any path that we have available as a variable.
I tested both to work on Ubuntu 13.04 but they should work just as well
on any distro shipping with the cgroup hierarchy in /sys/fs/cgroup and
with ecryptfs available.
Those are intended as example and distros are free to drop them, they
should however be working without any change required, at least on Ubuntu.
Serge Hallyn [Tue, 19 Feb 2013 20:39:31 +0000 (14:39 -0600)]
remove redundant, too-early call to clearenv in api_start call.
Ok, took a look, what happened was the clearenv calls used to be
in lxc_start and lxccontainer and lxc_execute (do lxc_start() callers)
themselves. I moved those into do_start(), but the calls in
lxccontainer.c were never removed.
They should simply be removed altogether. Trivial patch follows.
Stéphane Graber [Mon, 18 Feb 2013 23:59:42 +0000 (18:59 -0500)]
lxc-ubuntu{-cloud}: Config layout tweaking
This commit tweaks the layout of the config file for the Ubuntu templates.
With this, we now get a clear network config group, then a path related group,
then a bunch of random config options and the end of the config is apparmor,
capabilities and cgroups.
Serge Hallyn [Thu, 14 Feb 2013 16:30:55 +0000 (10:30 -0600)]
lxc_monitor_open: prepend lxcpath
This is needed for lxc_wait and lxc_monitor to handle lxcpath. However,
the full path name is limited to 108 bytes. Should we use a md5sum of
the lxcpath instead of the path itself?
In any case, with this patch, lxc-wait and lxc-monitor work right with
respect to multiple lxcpaths.
The lxcpath is added to the lxc_handler to make it available most of the
places we need it.
I also remove function prototypes in monitor.h for two functions which
are not defined or used anywhere.
TODO: make cgroups tolerate multiple same-named containers.
all working with the right containers (module cgroup stuff).
To do:
* lxc monitor needs to be made to handle cgroups.
This is another very invasive one. I started doing this as
a part of this set, but that gets hairy, so I'm sending this
separately. Note that lxc-wait and lxc-monitor don't work
without this, and there may be niggles in what I said works
above - since start.c is doing lxc_monitor_send_state etc
to the shared abstract unix domain socket.
* Need to handle the cgroup conflicts.
Dwight Engen [Tue, 12 Feb 2013 20:54:47 +0000 (15:54 -0500)]
legacy ls: only output appropriate directories/containers
For lxc-ls without --active, only output a directory in lxc_path if it
contains a file named config. This avoids extra directories that may
exist in lxc_path, for example .snapshot if lxc_path is an nfs mount.
For lxc-ls with --active, don't output . if there are no active
containers.
Dwight Engen [Mon, 11 Feb 2013 22:31:39 +0000 (17:31 -0500)]
Update Lua API
Add [gs]et_config_path from API to Lua binding. Add additional optional
parameter to container_new(). Add tests for these new Lua API bindings.
Commit 2a59a681 changed the meaning of lxc_path_get() in the binding,
causing lua script breakage. Reinstate original behavior of
lxc_path_get() and rename it to lxc_default_config_path_get() to make
its intent clearer.