Dwight Engen [Thu, 24 Jan 2013 16:42:22 +0000 (11:42 -0500)]
add lua binding for the lxc API
The lua binding is based closely on the python binding. Also included are
a test program for excercising the binding, and an lxc-top utility for
showing statistics on running containers.
Serge Hallyn [Thu, 24 Jan 2013 18:04:54 +0000 (12:04 -0600)]
use a default per-container logfile
Until now, if a lxc-* (i.e. lxc-start) command did not specify a logfile
(with -o logfile), the default was effectively 'none'. With this patch,
the default becomes a per-container log file.
If a container config file specifies 'lxc.logfile', that will override
the default. If a '-o logfile' argument is specifed at lxc-start,
then that will override both the default and the configuration file
entry. Finally, '-o none' can be used to avoid having a logfile at
all (in other words, the previous default), and that will override
a lxc.logfile entry in the container configuration file.
If the user does not have rights to open the default, then 'none' will
be used. However, in that case an error will show up on console. (We
can work on removing that if it annoys people, but I think it is
helpful, at least while we're still ironing this set out) If the user
or container configuration file specified a logfile, and the user does
not have rights to open the default, then the action will fail.
One slight "mis-behavior" which I have not fixed (and may not fix) is
that if a lxc.logfile is specified, the default logfile will still
get created before we read the configuration file to find out there
is a lxc.logfile entry.
changelog: Jan 24:
add --enable-configpath-log configure option
When we log to /var/lib/lxc/$container/$container.log, several things
need to be done differently than when we log into /var/log/lxc (for
instance). So give it a configure option so we know what to do
When the user specifies a logfile, we bail if we can't open it. But
when opening the default logfile, the user may not have rights to
open it, so in that case ignore it and continue as if using 'none'.
When using /var/lib/lxc/$c/$c.log, we use $LOGPATH/$name/$name.log.
Otherwise, we use $LOGPATH/$name.log.
When using /var/lib/lxc/$c/$c.log, don't try to create the log path
/var/lib/lxc/$c. It can only not exist if the container doesn't
exist. We don't want to create the directory in that case. When
using /var/log/lxc, then we do want to create the path if it does
not exist.
Serge Hallyn [Wed, 16 Jan 2013 05:02:20 +0000 (23:02 -0600)]
use a default per-container logfile
[ Thanks to Stéphane and Dwight for the feedback on the previous patch ]
Until now, if a lxc-* (i.e. lxc-start) command did not specify a logfile
(with -o logfile), the default was effectively 'none'. With this patch,
the default becomes $LOGPATH/<container>/<container>.log. LOGPATH is
specified at configure time with '--with-log-path='. If unspecified, it
is $LXCPATH, so that logs for container r2 will show up at
/var/lib/lxc/r2/r2/log. LOGPATH must exist, while lxc will make sure to
create $LOGPATH/<name>. As another example, Ubuntu will likely specify
--with-log-path=/var/log/lxc (and place /var/log/lxc into
debian/lxc.dirs), placing r2's logs in /var/log/lxc/r2/r2.log.
If a container config file specifies 'lxc.logfile', that will override
the default. If a '-o logfile' argument is specifed at lxc-start,
then that will override both the default and the configuration file
entry. Finally, '-o none' can be used to avoid having a logfile at
all (in other words, the previous default), and that will override
a lxc.logfile entry in the container configuration file.
Matthias Brugger [Tue, 22 Jan 2013 18:00:41 +0000 (19:00 +0100)]
lxc-setcap.in: Set path to lxc-init
In lxc-setcap the path to lxc-init wasn't set right, so that
a call to the script failed with an error. This patch sets
the path to the right directory.
Dwight Engen [Tue, 22 Jan 2013 20:59:44 +0000 (15:59 -0500)]
use which instead of type
This is for consistency with the rest of lxc, and also because type checks for
shell builtins, a behavior that we do not want in these cases. Ensure stderr
for which is redirected to /dev/null also.
Serge Hallyn [Thu, 17 Jan 2013 15:53:33 +0000 (09:53 -0600)]
don't leak the rootfs.pin fd into the container
Only the container parent needs to keep that fd open. Close it
as soon as the container's first task is spawned. Else it can
show up in /proc/$$/fd in the container.
Stéphane Graber [Tue, 15 Jan 2013 17:44:50 +0000 (12:44 -0500)]
conf.c: Cast st_uid and st_gid to int
In eglibc st_uid and st_gid are defined as unsigned integers, in bionic those
are defined as unsigned long (which is inconsistent with the kernel's
defintion that's uint_32).
To workaround this problem, simply cast those two to int.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Tue, 15 Jan 2013 00:03:06 +0000 (18:03 -0600)]
Implement userid mappings (enable user namespaces)
The 3.8 kernel now supporst uid mappings, so I believe it's appropriate
to proceed with this patchset.
The container config supports new entries of the form:
lxc.id_map = U 100000 0 10000
lxc.id_map = G 100000 0 10000
meaning map 'virtual' uids (in the container) 0-10000 to uids
100000-110000 on the host, and same for gids. So long as there are
mappings specified in the container config, then CONFIG_NEWUSER will
be used when the container is cloned. This means that container
setup is no longer done with root privilege on the host, only root
privilege in the container. Therefore cgroup setup is moved from the
init task to the monitor task.
To use this patchset, you currently need to either use the raring
kernel at ppa:serge-hallyn/usern-natty, or build your own kernel
from either git://kernel.ubuntu.com/serge/quantal-userns.git.
(Alternatively you can use Eric's tree at the latest userns-always-map-*
branch at
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git
but you will likely want to at least enable tmpfs mounts in user namespaces)
You also need to chown the files in the container rootfs into the
mapped range. There is a utility at
https://code.launchpad.net/~serge-hallyn/+junk/nsexec to do this.
uidmapshift does the chowning, while the container-userns-convert
script nicely wraps that program. So I simply
will create a container which is shifted so uid 0 in the container
is uid 200000 on the host.
TODO: when doing setuid(0), need to only do that if 0 is one of the
ids we map to. Similarly, when dropping capabilities, need to only
not do that if 0 is one of the ids we map to. However, the question
of what to do for 'weird' containers in private user namespaces is
one I'm punting for later.
Serge Hallyn [Mon, 14 Jan 2013 23:32:44 +0000 (23:32 +0000)]
setup cgroups from parent
This is a first step to enabling user namespaces. When starting a
container in a new user namespace, the child will not have the
rights to write to the cgroup fs. (We can give it that right, but
don't always want to have to).
At the parent, we don't want to setup_cgroups() before the child
has set itself up. But we also don't want to wait until it has
started running it's init, since that is racy.
Therefore introduce a new sync point. The child will let the
parent know when it is ready to be confined, and wait for the
parent to respond that it has done so. Then the child will finish
constraining itself with LSM and seccomp and execute init.
Serge Hallyn [Mon, 14 Jan 2013 23:32:43 +0000 (23:32 +0000)]
clean up syncs
Always unblock parent when child setup fails, rather than just
exiting.
Also remove a duplicate call to setup_cgroup(). We'll want it
close to there for userns, but not right there - that's too late,
and could happen after container init has done something bad
without cgroup restrictions.
Christian Seiler [Tue, 15 Jan 2013 13:44:25 +0000 (14:44 +0100)]
Multiple IP addresses: add them in the correct order
Make sure that when configuring containers that have interfaces containing
multiple IP addresses they are added in the order of the configuration file
(i.e. the first being the primary one) and not the reverse order.
Signed-off-by: Christian Seiler <christian@iwakd.de> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Ok... Here's the patch again. Since Serge is removing the loglevel
structure member, this patch no longer references that element.
From the original description:
1) Removes run_makedev() and the call to it from conf.c per discussion.
2) Adds an lxc.hook.autodev hook.
Note: This hook is very close (one routine level abstracted) from where
the run_makedev was called. Anyone really rrreeeaaalllyyy needing
MAKEDEV can add it in with a small shim script to do whatever they want
under whatever distro they're using, so no functionality is lost there.
3) Added a number of environment variables for all the hook scripts to
reference to assist in execution. Things like LXC_ROOTFS_MOUNT could be
very useful but others were added as well. Room for more if anyone has
an itch. All in one spot in lxc_start.c.
4) clearenv and putenv( "container=lxc" ) calls were moved to just after
the "start" hook in the container just prior to actually firing up the
container so we could use environment variables prior to that and have
them flushed them before firing up init. Nice side effect is that you
can define environment variables and then call lxc-start and have them
show up in those hooks scripts.
5) I actually DID update the man page for lxc.conf! I guess I lied when
I said I wouldn't get that done.
[... and ...]
I added the rcfile to the lxc_conf structure as suggested and moved the
setenv bundle from lxc-start.c over to start.c just prior to calling
run_lxc_hooks for the pre-start hook.
Signed-off-by: Michael H. Warfield <mhw@WittsEnd.com> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Fri, 11 Jan 2013 18:39:31 +0000 (12:39 -0600)]
remove logfile and loglevel from struct lxc_conf
The options are still supported in the lxc configuration file.
However they are stored only in local variables in src/lxc/log.c,
which can be read using two new functions:
int lxc_log_get_level(void);
const char *lxc_log_get_file(void);
Changelog: jan 14:
have lxc_log_init use lxc_log_set_file(), have lxc_log_set_file() take
a const char *, and have it keep its own strdup'd copy of the filename.
Stéphane Graber [Sun, 13 Jan 2013 20:29:26 +0000 (15:29 -0500)]
lxcutmp.c: Fix typo causing build failure
In a previous change I added an ifdef for HAVE_SYS_TIMERFD_h
rather than HAVE_SYS_TIMERFD_H, leading to a missing include of
sys/timerfd.h on platforms that support it and ultimately to a build
failure.
Stéphane Graber [Fri, 11 Jan 2013 17:29:55 +0000 (12:29 -0500)]
Build lxcutmp.c without timerfd.h or utmpx.h
This adds a local implementation of the bits we need form timerfd.h and
utmpx.h so that the LXC utmp watch can be used with libc that don't implement
the same functions as eglibc.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Dwight Engen [Thu, 10 Jan 2013 20:45:22 +0000 (15:45 -0500)]
use pkg-config to ensure python3-devel is installed
The Python.h header varies in location by distribution, so instead use
pkg-config to ensure the python3 devel package is installed. Tested with
Ubuntu 12.04 and Fedora 17. Fixes --enable-python on Fedora 17.
Add 'config' option to lxc-archlinux template and fix getopt string
This option allows user to control installation repository and options
using alternative pacman configuration file.
Also remove unnecessary sed invocation during container configuration.
Signed-off-by: Alexander Vladimirov <alexander.idkfa.vladimirov@gmail.com> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Stéphane Graber [Thu, 10 Jan 2013 22:10:51 +0000 (17:10 -0500)]
utmp.h: Don't fail when utmpx.h isn't present
Following a comment on the mailing-list, I made utmp.h return -1
when it's disabled, the problem with that is that it prevents the
container from starting completely, which isn't quite what I wanted.
This change makes the function succeed, the container will therefore
start but without utmp handler.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Fri, 4 Jan 2013 18:56:13 +0000 (13:56 -0500)]
Don't call setup_mount_entries if the list is empty
There's no good reason to call setup_mount_entries if we don't have any
lxc.mount.entry. This also avoids an issue on bionic where the tmpfile()
call in setup_mount_entries requires the presence of /tmp which isn't the
case by default.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Thu, 3 Jan 2013 16:51:52 +0000 (11:51 -0500)]
lxc_unshare: Replace getpw*_r by getpw*
Bionic and maybe some other libc implementations lack the _r nss functions.
This replaces our current getpwnam_r and getpwuid_r calls by getpwnam and
getpwuid.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Thu, 3 Jan 2013 17:24:18 +0000 (12:24 -0500)]
caps.h: Rename __errno to ___errno
At least bionic defines __errno, so this was causing a conflict in caps.h
leading to build failure. Renaming to ___errno avoids that conflicting
definition.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Thu, 3 Jan 2013 17:24:16 +0000 (12:24 -0500)]
Add a bionic_alphasort function on bionic
alphasort doesn't have the right signature on bionic which causes the build to
fail. This implements a new bionic_alphasort function when building on bionic
providing the right signature and a functional equivalent of glibc's alphasort.
This signature problem with alphasort was fixed in upstream bionic but hasn't
been released yet. This commit can therefore be reverted as soon as the
following commit hits the Android NDK: 40e467ec668b59be25491bd44bf348a884d6a68d
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Thu, 3 Jan 2013 17:24:14 +0000 (12:24 -0500)]
Workaround missing functions in other libc
Some libc implementation (bionic) is lacking some of the syscall functions
that are present in the glibc.
For those, detect at build time the they are missing and implement a minimal
syscall() wrapper that will essentially give the same result as the glibc
function.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Thu, 3 Jan 2013 17:24:13 +0000 (12:24 -0500)]
personality.h: Make the personality code optional
Some platforms don't have personality.h in their C library, this change
adds buildtime detection for the header and turns off the personality setting
code in those cases.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Thu, 20 Dec 2012 15:11:03 +0000 (16:11 +0100)]
Don't hard depend on capability.h and libcap
In the effort to make LXC work with non-standard Linux distros, this change
allows for the user to build LXC without capability support through a new
--disable-capabilities option to configure.
This effectively will cause LXC not to link against libcap and will turn all
the _cap_ functions into no-ops.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Thu, 3 Jan 2013 17:24:06 +0000 (12:24 -0500)]
No need to link against rt and util on bionic
When building on bionic, -lrt and -lutil only cause a build failure.
Dropping those fixes the build, so it appears that the symbols are defined
in the main library.
This commit moves -lrt and -lutil under a !IS_BIONIC check.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Tue, 8 Jan 2013 17:02:53 +0000 (12:02 -0500)]
Replace all reference to ushort by unsigned short
ushort appears to be a glibc specific type which doesn't exist in
bionic, this commit simply replace all occurences by the equivalent
unsigned short type.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Set umask before populating /dev and restore it after.
According to docs, mknod clears each permission bit whose
corresponding bit in the process umask is set, so we should fix it
before creating device nodes.
Signed-off-by: Alexander Vladimirov <alexander.idkfa.vladimirov@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Dwight Engen [Wed, 19 Dec 2012 00:15:33 +0000 (19:15 -0500)]
fix lxc-wait waiting forever for FREEZING, FROZEN, THAWED states
These states are kept by the kernel in the freezer.state cgroup item, and
are never set in handler->state with lxc_set_state(). If lxc transitions
a container to/from the freezer after an lxc-wait for one of the above
states has already started, the lxc-wait will never see the new state. This
change has lxc send the new state to the lxc-monitor socket.
Dwight Engen [Fri, 14 Dec 2012 20:38:35 +0000 (15:38 -0500)]
oracle template: add support for creating ol4 container from ovm template
Also: disable the interactive part of ovmd so ol5,6 containers won't
hang if started for the first time with -d. Don't let containers do rawio,
or have access to /dev/rtc0, they can mess up the hosts system clock among
other things.
Dwight Engen [Thu, 27 Dec 2012 22:01:26 +0000 (17:01 -0500)]
separate console device from console log
lxc-start -c makes the named file/device the container's console, but using
this with a regular file in order to get a log of the console output does
not work very well if you also want to login on the console. This change
implements an additional option (-L) to simply log the console's output to
a file.
Both options can be used separately or together. For example to get a usable
console and log: lxc-start -n name -c /dev/tty8 -L console.log
The console state is cleaned up more when lxc_delete_console is called, and
some of the clean up paths in lxc_create_console were fixed.
The lxc_priv and lxc_unpriv macros were modified to make use of gcc's local
label feature so they can be expanded more than once in the same function.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Wed, 2 Jan 2013 18:47:18 +0000 (13:47 -0500)]
kill -s expects the signal name without SIG
The previous lxc-shutdown change replaced 'kill SIG<name>' by
'kill -s SIG<name>'. Although this works with busybox where it was tested,
this doesn't actually work with all kill implementations. Some requiring just
the signal name without the prefix.
This changes "-s SIG<name>" by just "-s <name>". Tested with busybox and
standard kill.
Natanael Copa [Wed, 26 Dec 2012 21:31:56 +0000 (22:31 +0100)]
lxc-ps: use posix shell and awk instead of bash
Use awk to parse the output pf 'ps' and the tasks files for the
containers.
Use awk fields to find PID column rather than assume that the PID field
is exactly 5 chars wide and has a leading space ' PID'. This works as
long as the PID field is before the command or other field that include
spaces. This also makes it work with busybox 'ps'.
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Natanael Copa [Tue, 25 Dec 2012 16:08:55 +0000 (17:08 +0100)]
lxc-clone: use posix shell instead of bash
- avoid getopt --longoptions
- use 'which' instead of 'type' to detect existance of tools
- use 'grep -q -w' instead of bash substring variable expansion
${line:0:18}
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Serge Hallyn [Thu, 20 Dec 2012 05:58:44 +0000 (23:58 -0600)]
Support MS_SHARED /
(I'll be out until Jan 2, but in the meantime, here is hopefully a
little newyears gift - this seems to allow lxc-start with / being
MS_SHARED on the host)
When / is MS_SHARED (for instance with f18 and modern arch), lxc-start
fails on pivot_root. The kernel enforces that, when doing pivot_root,
the parent of current->fs->root (as well as the new root and the putold
location) not be MS_SHARED.
To work around this, check /proc/self/mountinfo for a 'shared:' in
the '/' line. If it is there, then create a tiny MS_SLAVE tmpfs dir to
serve as parent of /, recursively bind mount / into /root under that dir,
make it rslave, and chroot into it.
Tested with ubuntu raring image after doing 'mount --make-rshared /'.
Dwight Engen [Tue, 18 Dec 2012 21:12:34 +0000 (16:12 -0500)]
lxc-destroy container only if it is in the STOPPED state
Currently, lxc-destory will attempt to destroy a container if it is not in
the RUNNING state, but doing so is not good when the container is FROZEN, or
in other transitional states.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Dwight Engen [Tue, 11 Dec 2012 22:05:11 +0000 (17:05 -0500)]
Fix race/corruption with multiple lxc-start, lxc-execute
If you start more than one lxc-start/lxc-execute with the same name at the
same time, or just do an lxc-start/lxc-execute with the name of a container
that is already running, lxc doesn't figure out that the container with this
name is already running until fairly late in the initialization process: ie
when __lxc_start() -> lxc_poll() -> lxc_command_mainloop_add() attempts to
create the same abstract socket name.
By this point a fair amount of initialization has been done that actually
messes up the running container. For example __lxc_start() -> lxc_spawn() ->
lxc_cgroup_create() -> lxc_one_cgroup_create() -> try_to_move_cgname() moves
the running container's cgroup to a name of deadXXXXXX.
The solution in this patch is to use the atomic existence of the abstract
socket name as the indicator that the container is already running. To do
so, I just refactored lxc_command_mainloop_add() into an lxc_command_init()
routine that attempts to bind the socket, and ensure this is called earlier
before much initialization has been done.
In testing, I verified that maincmd_fd was still open at the time of lxc_fini,
so the entire lifetime of the container's run should be covered. The only
explicit close of this fd was in the reboot case of lxcapi_start(), which is
now moved to lxc_fini(), which I think is more appropriate.
Even though it is not checked any more, set maincmd_fd to -1 instead of 0 to
indicate its not open since 0 could be a valid fd.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Dwight Engen [Tue, 11 Dec 2012 17:39:16 +0000 (12:39 -0500)]
Don't attempt to symlink kmsg without rootfs->path
For example doing "lxc-execute -n tmpct /bin/bash" will call setup_kmsg(), but
in this case rootfs->mount/dev directory doesn't even exist so the call to
symlink fails with ENOENT. Commit f62b3449 made this failure not fatal, but
we should not even try it when we know it will fail. See similar code in
setup_tty(), setup_console(), etc.