Ryota Ozaki [Sun, 17 Mar 2013 14:21:31 +0000 (23:21 +0900)]
Use $localstatedir/log/lxc for default log path
When we install lxc by manual (configure; make; make install),
all files are installed under /usr/local/. Configuration files
and setting files of containers are stored under /usr/local/ too,
however, only log files are stored under /var/log/ not
/usr/local/var/log.
This patch changes the default log path to $localstatedir/log/lxc
(by default $localstatedir is /usr/local/var) where is an ordinary
directory, which is probably expected and unsurprising.
Signed-off-by: Ryota Ozaki <ozaki.ryota@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Thu, 14 Mar 2013 03:21:15 +0000 (23:21 -0400)]
Add missing config.h includes.
conf.h and start.h weren't explicitly including config.h which meant that
depending on the ordering of the includes in whatever was including conf.h
or start.h, some pieces of the structs defined in those may be missing.
This led amongst other problems to the lxc_conf struct being wrong by 8 bytes
for functions from commands.c, leading to lxc-stop always failing.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Signed-off-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Wed, 13 Mar 2013 02:34:26 +0000 (21:34 -0500)]
cgroups: don't mount under init's cgroup
1. deeper hierarchy has steep performance costs
2. init may be under /init, but containers should be under /lxc
3. in a nested container we like to bind-mount $cgroup_path/$c/$c.real
into $cgroup_path - but task 1's cgroup is $c/$c.real, so a nested
container would be in $c/$c.real/lxc, which would become
/$c/$c.real/$c/$c.real/lxc when expanded
4. this pulls quite a bit of code (of mine) which is always nice
Dwight Engen [Mon, 11 Mar 2013 20:36:25 +0000 (16:36 -0400)]
uidmap: fix writing multiple ranges
The kernel requires a single atomic write for setting the /proc
idmap files. We were calling write(2) more than once when multiple
ranges were configured so instead build a buffer to pass in one write(2)
call.
Change id types to unsigned long to handle large id mappings gracefully.
Fix max id in example comment.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
I remember discussion about implementing proper way to shutdown
guests using different signals, so here's a patch proposal.
It allows to use specific signal numbers to shutdown guests
gracefully, for example SIGRTMIN+4 starts poweroff.target in
systemd.
Signed-off-by: Alexander Vladimirov <alexander.idkfa.vladimirov@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Dwight Engen [Tue, 12 Mar 2013 17:04:35 +0000 (13:04 -0400)]
oracle template: fixes for older releases
This fixes some issues found by Oracle QA, including several cosmetic
errors seen during container bootup.
The rpm database needs moving on Debian hosts similar to on Ubuntu.
I took Serge's suggestions: Do the yum install in an unshared
mount namespace so the /proc mount done during OL4 install doesn't
pollute the host. No need to blacklist ipv6 modules.
Make the default release 6.3, unless the host is OL, then default
to the same version as the host (same as Ubuntu template does).
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Wed, 6 Mar 2013 19:41:04 +0000 (13:41 -0600)]
attach: handle apparmor transitions in !NEWNS cases
If we're not attaching to the mount ns , then don't enter the
container's apparmor policy. Since we're running binaries from the host
and not the container, that actually seems the sane thing to do (besides
also the lazier thing).
If we dont' do this patch, then we will need to move the apparmor attach
past the procfs remount, will need to also mount securityfs if available,
and for the !remount_proc_sys case we'll want to mount those just long
enough to do the apparmor transition.
lxc-attach: User namespaces: Use init's user & group id when attaching
When attaching to a container with a user namespace, try to detect the
user and group ids of init via /proc and attach as that same user. Only
if that is unsuccessful, fall back to (0, 0).
Signed-off-by: Christian Seiler <christian@iwakd.de>
lxc-attach: Default to /bin/sh if shell cannot be determined or exec'd
If getpwuid() fails and also the fallback of spawning of a 'getent'
process, and the user specified no command to execute, default to
/bin/sh and only fail if even that is not available. This should ensure
that unless the container is *really* weird, no matter what, the user
should always end up with a shell when calling lxc-attach with no
further arguments.
Signed-off-by: Christian Seiler <christian@iwakd.de>
lxc-attach: Try really hard to determine login shell
If no command is specified, and using getpwuid() to determine the login
shell fails, try to spawn a process that executes the utility 'getent'.
getpwuid() may fail because of incompatibilities between the NSS
implementations on the host and in the container.
Signed-off-by: Christian Seiler <christian@iwakd.de>
Serge Hallyn [Fri, 1 Mar 2013 20:53:20 +0000 (14:53 -0600)]
cgroup: improve support for multiple lxcpaths (v3)
Add a monitor command to get the cgroup for a running container. This
allows container r1 started from /var/lib/lxc and container r1 started
from /home/ubuntu/lxcbase to pick unique cgroup directories (which
will be /sys/fs/cgroup/$subsys/lxc/r1 and .../r1-1), and all the lxc-*
tools to get that path over the monitor at lxcpath.
Rework the cgroup code. Before, if /sys/fs/cgroup/$subsys/lxc/r1
already existed, it would be moved to 'deadXXXXX', and a new r1 created.
Instead, if r1 exists, use r1-1, r1-2, etc.
I ended up removing both the use of cgroup.clone_children and support
for ns cgroup. Presumably we'll want to put support for ns cgroup
back in for older kernels. Instead of guessing whether or not we
have clone_children support, just always explicitly do the only thing
that feature buys us - set cpuset.{cpus,mems} for newly created cgroups.
Note that upstream kernel is working toward strict hierarchical
limit enforcements, which will be good for us.
NOTE - I am changing the lxc_answer struct size. This means that
upgrades to this version while containers are running will result
in lxc_* commands on pre-running containers will fail.
Changelog: (v3)
implement cgroup attach
fix a subtle bug arising when we lxc_get_cgpath() returned
STOPPED rather than -1 (STOPPED is 0, and 0 meant success).
Rename some functions and add detailed comments above most.
Drop all my lxc_attach changes in favor of those by Christian
Seiler (which are mostly the same, but improved).
When you clone a new user_ns, the child cannot write to the fds
opened by the parent. Hnadle this by doing an extra fork. The
grandparent hangs around and waits for its child to tell it the
pid of of the grandchild, which will be the one attached to the
container. The grandparent then moves the grandchild into the
right cgroup, then waits for the child who in turn is waiting on
the grandchild to complete.
Secondly, when attaching to a new user namespace, your old uid is
not valid, so you are uid -1. This patch simply does setid+setuid
to 0 if that is the case. We probably want to be smarter, but
for now this allows lxc-attach to work.
Signed-off-by: Christian Seiler <christian@iwakd.de>
Stéphane Graber [Fri, 1 Mar 2013 16:12:20 +0000 (11:12 -0500)]
python api_test: Drop use of @LXCPATH@
The python api test script was using @LXCPATH@ for one of its checks.
Now that the lxcpath is exposed by the lxc python module directly, this
can be dropped and api_test.py can now become a simple python file without
needing pre-processing by autoconf.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Thu, 28 Feb 2013 23:04:46 +0000 (18:04 -0500)]
lxc-ls: Implement support for nested containers
Add initial support for showing and querying nested containers.
This is done through a new --nesting argument to lxc-ls and uses
lxc-attach to go look for sub-containers.
Known limitations include the dependency on setns support for the PID
and NETWORK namespaces and the assumption that LXCPATH for the sub-containers
matches that of the host.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Tue, 19 Feb 2013 20:44:19 +0000 (15:44 -0500)]
Add example hooks from Ubuntu package
We've been shipping those two hooks for a while in Ubuntu.
Yesterday I reworked them to use the new environment variables and
avoid hardcoding any path that we have available as a variable.
I tested both to work on Ubuntu 13.04 but they should work just as well
on any distro shipping with the cgroup hierarchy in /sys/fs/cgroup and
with ecryptfs available.
Those are intended as example and distros are free to drop them, they
should however be working without any change required, at least on Ubuntu.
Serge Hallyn [Tue, 19 Feb 2013 20:39:31 +0000 (14:39 -0600)]
remove redundant, too-early call to clearenv in api_start call.
Ok, took a look, what happened was the clearenv calls used to be
in lxc_start and lxccontainer and lxc_execute (do lxc_start() callers)
themselves. I moved those into do_start(), but the calls in
lxccontainer.c were never removed.
They should simply be removed altogether. Trivial patch follows.
Stéphane Graber [Mon, 18 Feb 2013 23:59:42 +0000 (18:59 -0500)]
lxc-ubuntu{-cloud}: Config layout tweaking
This commit tweaks the layout of the config file for the Ubuntu templates.
With this, we now get a clear network config group, then a path related group,
then a bunch of random config options and the end of the config is apparmor,
capabilities and cgroups.
Serge Hallyn [Thu, 14 Feb 2013 16:30:55 +0000 (10:30 -0600)]
lxc_monitor_open: prepend lxcpath
This is needed for lxc_wait and lxc_monitor to handle lxcpath. However,
the full path name is limited to 108 bytes. Should we use a md5sum of
the lxcpath instead of the path itself?
In any case, with this patch, lxc-wait and lxc-monitor work right with
respect to multiple lxcpaths.
The lxcpath is added to the lxc_handler to make it available most of the
places we need it.
I also remove function prototypes in monitor.h for two functions which
are not defined or used anywhere.
TODO: make cgroups tolerate multiple same-named containers.
all working with the right containers (module cgroup stuff).
To do:
* lxc monitor needs to be made to handle cgroups.
This is another very invasive one. I started doing this as
a part of this set, but that gets hairy, so I'm sending this
separately. Note that lxc-wait and lxc-monitor don't work
without this, and there may be niggles in what I said works
above - since start.c is doing lxc_monitor_send_state etc
to the shared abstract unix domain socket.
* Need to handle the cgroup conflicts.
Dwight Engen [Tue, 12 Feb 2013 20:54:47 +0000 (15:54 -0500)]
legacy ls: only output appropriate directories/containers
For lxc-ls without --active, only output a directory in lxc_path if it
contains a file named config. This avoids extra directories that may
exist in lxc_path, for example .snapshot if lxc_path is an nfs mount.
For lxc-ls with --active, don't output . if there are no active
containers.
Dwight Engen [Mon, 11 Feb 2013 22:31:39 +0000 (17:31 -0500)]
Update Lua API
Add [gs]et_config_path from API to Lua binding. Add additional optional
parameter to container_new(). Add tests for these new Lua API bindings.
Commit 2a59a681 changed the meaning of lxc_path_get() in the binding,
causing lua script breakage. Reinstate original behavior of
lxc_path_get() and rename it to lxc_default_config_path_get() to make
its intent clearer.
Serge Hallyn [Mon, 11 Feb 2013 20:43:41 +0000 (14:43 -0600)]
pass lxcpath to lxc_command
The previous lxcpath patches added support for a custom LXCPATH set
through a system-wide configuration file.
This was also exposed through the C api, so that a custom lxcpath could
be set at the container object instanciation time, or set at runtime.
However the command sock filename was always located under the global
lxcpath, which could be confusing, and would be a problem for users
with insufficient perms to the system-wide lxc path (i.e. if setting
lxcpath to $HOME/lxcbase). This patch changes that by passing the
lxcpath to all callers of lxc_command().
It remains to add an lxcpath command line argument to most of the
command line tools (which are not using the C api) - lxc-start,
lxc-info, lxc-stop, etc.
At this point it becomes tempting to do something like
c = lxc.Container("r1", "/var/lib/lxc")
c2 = lxc.Container("r1", "$HOME/lxcbase")
However, that's problematic - those two will use the same directory
names for cgroup directories.
What would be the best way to handle this? One way (which I kind
of like) is to give up on naming the cgroups after the container.
use mkstemp for the cgroup name, let lxc keep track of the cgroup
name based on the command socket, and make users use lxc-cgroup to get
and change settings.
Stéphane Graber [Mon, 11 Feb 2013 18:45:20 +0000 (13:45 -0500)]
python-lxc: Update for new calls
Add the two new calls to the API and add the new container_path
parameter to the constructor (optional).
This also extends list_containers to support the config_path parameter.
At this point none of the actual tools are changed to make use of those
as we'll probably want to make sure all the tools get the extra option
at once.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Tested-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Fri, 8 Feb 2013 22:06:32 +0000 (16:06 -0600)]
lxc api: fix some config_path oddities
1. When calling c->set_config_path(), update configfile. I.e. if we
are setting the config_path to /var/lib/lxc, then the configfile should
be changed to /var/lib/lxc/$container/config
2. Add an optional configpath argument to lxc_container_new. If NULL,
then the default will be used (as before). If set, then the passed-in
path will be used. This way you can do
Stéphane Graber [Sat, 9 Feb 2013 19:52:12 +0000 (14:52 -0500)]
lxc-create: Improve the layout of the config
This simply adds an extra blank line between the original lxc config
and the template generated options.
In typical use cases, this means that we'll now get the header, then
a blank line, then default.conf content, then a blank line and finally
the template generated config.
The wording of the header is also changed slightly so that it fits in
the usual 80 columns.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Fri, 8 Feb 2013 21:01:02 +0000 (16:01 -0500)]
lxc.functions isn't a shell script
lxc.functions.in is meant to be sourced, not to be called as a script.
So as it's not executable and not meant to be, it shouldn't have
a /bin/sh shebang.
This fixes an error reported by lintian.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Fri, 8 Feb 2013 16:07:53 +0000 (11:07 -0500)]
Drop lxc-setcap and lxc-setuid
As discussed earlier this week, lxc-setcap and lxc-setuid have been
in pretty bad shape lately. Most if not all distros recommend against
using them or don't ship them at all.
With the ongoing work to get user namespaces working in upstream LXC,
we think it's best to drop those two now as we prepare to land proper
setuid helpers to deal with user namespaces.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Wed, 6 Feb 2013 21:11:19 +0000 (15:11 -0600)]
Switch from use of LXCPATH to a configurable default_lxc_path
Here is a patch to introduce a configurable system-wide
lxcpath. It seems to work with lxc-create, lxc-start,
and basic python3 lxc usage through the api.
For shell functions, a new /usr/share/lxc/lxc.functions is
introduced which sets some of the basic global variables,
including evaluating the right place for lxc_path.
I have not converted any of the other python code, as I was
not sure where we should keep the common functions (i.e.
for now just default_lxc_path()).
configure.ac: add an option for setting the global config file name.
utils: add a default_lxc_path() function
Use default_lxc_path in .c files
define get_lxc_path() and set_lxc_path() in C api
use get_lxc_path() in lua api
create sh helper for getting default path from config file
fix up scripts to use lxc.functions
Changelog:
feb6:
fix lxc_path in lxc.functions
utils.c: as Dwight pointed out, don't close a NULL fin.
utils.c: fix the parsing of lxcpath line
lxc-start: print which rcfile we are using
commands.c: As Dwight alluded to, the sockname handling was just
ridiculous. Clean that up.
use Dwight's recommendation for lxc.functions path: $datadir/lxc
make lxccontainer->get_config_path() return const char *
Per Dwight's suggestion, much nicer than returning strdup.
feb6 (v2):
lxccontainer: set c->config_path before using it.
convert legacy lxc-ls
Dwight Engen [Thu, 24 Jan 2013 16:42:22 +0000 (11:42 -0500)]
add lua binding for the lxc API
The lua binding is based closely on the python binding. Also included are
a test program for excercising the binding, and an lxc-top utility for
showing statistics on running containers.
Serge Hallyn [Thu, 24 Jan 2013 18:04:54 +0000 (12:04 -0600)]
use a default per-container logfile
Until now, if a lxc-* (i.e. lxc-start) command did not specify a logfile
(with -o logfile), the default was effectively 'none'. With this patch,
the default becomes a per-container log file.
If a container config file specifies 'lxc.logfile', that will override
the default. If a '-o logfile' argument is specifed at lxc-start,
then that will override both the default and the configuration file
entry. Finally, '-o none' can be used to avoid having a logfile at
all (in other words, the previous default), and that will override
a lxc.logfile entry in the container configuration file.
If the user does not have rights to open the default, then 'none' will
be used. However, in that case an error will show up on console. (We
can work on removing that if it annoys people, but I think it is
helpful, at least while we're still ironing this set out) If the user
or container configuration file specified a logfile, and the user does
not have rights to open the default, then the action will fail.
One slight "mis-behavior" which I have not fixed (and may not fix) is
that if a lxc.logfile is specified, the default logfile will still
get created before we read the configuration file to find out there
is a lxc.logfile entry.
changelog: Jan 24:
add --enable-configpath-log configure option
When we log to /var/lib/lxc/$container/$container.log, several things
need to be done differently than when we log into /var/log/lxc (for
instance). So give it a configure option so we know what to do
When the user specifies a logfile, we bail if we can't open it. But
when opening the default logfile, the user may not have rights to
open it, so in that case ignore it and continue as if using 'none'.
When using /var/lib/lxc/$c/$c.log, we use $LOGPATH/$name/$name.log.
Otherwise, we use $LOGPATH/$name.log.
When using /var/lib/lxc/$c/$c.log, don't try to create the log path
/var/lib/lxc/$c. It can only not exist if the container doesn't
exist. We don't want to create the directory in that case. When
using /var/log/lxc, then we do want to create the path if it does
not exist.
Serge Hallyn [Wed, 16 Jan 2013 05:02:20 +0000 (23:02 -0600)]
use a default per-container logfile
[ Thanks to Stéphane and Dwight for the feedback on the previous patch ]
Until now, if a lxc-* (i.e. lxc-start) command did not specify a logfile
(with -o logfile), the default was effectively 'none'. With this patch,
the default becomes $LOGPATH/<container>/<container>.log. LOGPATH is
specified at configure time with '--with-log-path='. If unspecified, it
is $LXCPATH, so that logs for container r2 will show up at
/var/lib/lxc/r2/r2/log. LOGPATH must exist, while lxc will make sure to
create $LOGPATH/<name>. As another example, Ubuntu will likely specify
--with-log-path=/var/log/lxc (and place /var/log/lxc into
debian/lxc.dirs), placing r2's logs in /var/log/lxc/r2/r2.log.
If a container config file specifies 'lxc.logfile', that will override
the default. If a '-o logfile' argument is specifed at lxc-start,
then that will override both the default and the configuration file
entry. Finally, '-o none' can be used to avoid having a logfile at
all (in other words, the previous default), and that will override
a lxc.logfile entry in the container configuration file.
Matthias Brugger [Tue, 22 Jan 2013 18:00:41 +0000 (19:00 +0100)]
lxc-setcap.in: Set path to lxc-init
In lxc-setcap the path to lxc-init wasn't set right, so that
a call to the script failed with an error. This patch sets
the path to the right directory.
Dwight Engen [Tue, 22 Jan 2013 20:59:44 +0000 (15:59 -0500)]
use which instead of type
This is for consistency with the rest of lxc, and also because type checks for
shell builtins, a behavior that we do not want in these cases. Ensure stderr
for which is redirected to /dev/null also.
Serge Hallyn [Thu, 17 Jan 2013 15:53:33 +0000 (09:53 -0600)]
don't leak the rootfs.pin fd into the container
Only the container parent needs to keep that fd open. Close it
as soon as the container's first task is spawned. Else it can
show up in /proc/$$/fd in the container.
Stéphane Graber [Tue, 15 Jan 2013 17:44:50 +0000 (12:44 -0500)]
conf.c: Cast st_uid and st_gid to int
In eglibc st_uid and st_gid are defined as unsigned integers, in bionic those
are defined as unsigned long (which is inconsistent with the kernel's
defintion that's uint_32).
To workaround this problem, simply cast those two to int.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Tue, 15 Jan 2013 00:03:06 +0000 (18:03 -0600)]
Implement userid mappings (enable user namespaces)
The 3.8 kernel now supporst uid mappings, so I believe it's appropriate
to proceed with this patchset.
The container config supports new entries of the form:
lxc.id_map = U 100000 0 10000
lxc.id_map = G 100000 0 10000
meaning map 'virtual' uids (in the container) 0-10000 to uids
100000-110000 on the host, and same for gids. So long as there are
mappings specified in the container config, then CONFIG_NEWUSER will
be used when the container is cloned. This means that container
setup is no longer done with root privilege on the host, only root
privilege in the container. Therefore cgroup setup is moved from the
init task to the monitor task.
To use this patchset, you currently need to either use the raring
kernel at ppa:serge-hallyn/usern-natty, or build your own kernel
from either git://kernel.ubuntu.com/serge/quantal-userns.git.
(Alternatively you can use Eric's tree at the latest userns-always-map-*
branch at
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git
but you will likely want to at least enable tmpfs mounts in user namespaces)
You also need to chown the files in the container rootfs into the
mapped range. There is a utility at
https://code.launchpad.net/~serge-hallyn/+junk/nsexec to do this.
uidmapshift does the chowning, while the container-userns-convert
script nicely wraps that program. So I simply
will create a container which is shifted so uid 0 in the container
is uid 200000 on the host.
TODO: when doing setuid(0), need to only do that if 0 is one of the
ids we map to. Similarly, when dropping capabilities, need to only
not do that if 0 is one of the ids we map to. However, the question
of what to do for 'weird' containers in private user namespaces is
one I'm punting for later.
Serge Hallyn [Mon, 14 Jan 2013 23:32:44 +0000 (23:32 +0000)]
setup cgroups from parent
This is a first step to enabling user namespaces. When starting a
container in a new user namespace, the child will not have the
rights to write to the cgroup fs. (We can give it that right, but
don't always want to have to).
At the parent, we don't want to setup_cgroups() before the child
has set itself up. But we also don't want to wait until it has
started running it's init, since that is racy.
Therefore introduce a new sync point. The child will let the
parent know when it is ready to be confined, and wait for the
parent to respond that it has done so. Then the child will finish
constraining itself with LSM and seccomp and execute init.