arguments: remove trailing slashes for the input lxcpath
In lxc_cmd(), we use
snprintf(path, len, "%s/%s/command", lxcpath ? lxcpath : inpath, name);
to fill sock name, this assume lxcpath have no trailing slashes, so
if we use
lxc-info -n test -P /usr/local/var/lib/lxc_anon/
to get a running container's state, we will get state: STOPPED which
is wrong, because we combine a wrong sock name.
To fix this, just remove trailing slashes when parsing arguments.
The API header was included in a variety of ways before, standardize
those to "include <lxc/lxccontainer.h>" as this will always work both in
tree and on a system with the headers installed.
Expose underlying close_all_fds config value via API
Being able to set close_all_fds via API would be usefull for the
situations like running an application (let's say web server)
that controls the lifecycle of the container using the LXC API.
We don't want forked process to inherit parent's resource (file, socket, ...)
Signed-off-by: S.Çağlar Onur <caglar@10ur.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
get_ips accepts an interface name as a parameter but there was no
way to get the interfaces names from the container. This patch
introduces a new get_interfaces call to the API so that users
can obtain the name of the interfaces.
Support for python bindings also introduced as a part of this version.
pthread_mutex_lock() will only return an error if it was set to
PTHREAD_MUTEX_ERRORCHECK and we are recursively calling it (and
would otherwise have deadlocked). If that's the case then log a
message for future debugging and exit. Trying to "recover" is
nonsense at that point.
process_lock() was held over too long a time in lxcapi_start()
in the daemonize case. (note the non-daemonized case still needs a
check to enforce that it must NOT be called while threaded). Add
process_lock() at least across all open/close/socket() calls.
Anything done after a fork() doesn't need the locks as it is no
longer threaded - so some open/close/dups()s are not locked for
that reason. However, some common functions are called from both
threaded and non-threaded contexts. So after doing a fork(), do
a possibly-extraneous process_unlock() to make sure that, if we
were forked while pthread mutex was held, we aren't deadlocked by
nobody.
Tested that lp:~serge-hallyn/+junk/lxc-test still works with this
patch.
Christian Seiler [Thu, 12 Sep 2013 20:00:34 +0000 (22:00 +0200)]
Change rootfs pinning mechnism
Chane pinning mechanism: Use $rootfs/lxc.hold instead of $rootfs.hold
(in case $rootfs is a mountpoint itself), but delete the file
immediately after creating it (but keep it open). This will keep the
root filesystem busy but does not leave any unnecessary files lying
around.
Signed-off-by: Christian Seiler <christian@iwakd.de> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Christian Seiler [Wed, 11 Sep 2013 23:44:44 +0000 (01:44 +0200)]
Support for automatic mounting of filesystems
This patch adds the lxc.mount.auto configuration option that allows the
user to specify that certain standard filesystems should be
automatically pre-mounted when the container is started.
Currently, four things are implemented:
- /proc (mounted read-write)
- /sys (mounted read-only)
- /sys/fs/cgroup (special logic, see mailing list discussions)
- /proc/sysrq-trigger (see below)
/proc/sysrq-trigger may be used from within a container to trigger a
forced host reboot (echo b > /proc/sysrq-trigger) or do other things
that a container shouldn't be able to do. The logic here is to
bind-mount /dev/null over /proc/sysrq-trigger, so that that cannot
happen. This obviously only protects fully if CAP_SYS_ADMIN is not
available inside the container (otherwise that bind-mount could be
removed).
Signed-off-by: Christian Seiler <christian@iwakd.de> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Christian Seiler [Wed, 11 Sep 2013 23:44:43 +0000 (01:44 +0200)]
cgroup: Add lxc_setup_mount_cgroup to setup /sys/fs/cgroup inside the container
Add funbction to mount cgroup filesystem hierarchy into the container,
allowing only access to the parts that the container should have access
to, but none else.
Signed-off-by: Christian Seiler <christian@iwakd.de> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Christian Seiler [Wed, 11 Sep 2013 23:44:42 +0000 (01:44 +0200)]
cgroup: Split legacy 'ns' cgroup handling off from main cgroup handling
This patch splits off ns legacy cgroup handling from main cgroup
handling. It moves the creation of the cgroups before clone(), so that
the child will easily know which cgroups it will later belong to. Since
this is not possible for the renaming of the 'ns' cgroup, keep that
part after clone.
Signed-off-by: Christian Seiler <christian@iwakd.de> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
If a cgroup hierarchy has ns cgroup composed, then we need to treat
that differently:
1. The container init will have already been switched to a new cgroup
called after its pid.
2. We can't move the container init to new deeper cgroup directories.
So, if we detect an ns cgroup, don't bother trying to construct a new
name according to the pattern. Just rename the current one to the
container name, and save that path for us to later enter and remove.
Note I'm not dealing with the subpaths so nested containers probably
won't work. However as ns cgroup is very much legacy, that should be
ok. Eventually we should be able to drop ns cgroup support altogether,
but not just yet.
apparmor.c: drop newline when reading current profile
Otherwise we fail to recognize if we are already unconfined. Then,
if we want to *start* unconfined, and /proc is readonly, start fails
even though it should be able to proceed.
With this patch, that situation works.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Reported-by: Andre nathan <andre@digirati.com.br>
This patch rewrites most of the cgroup logic. It creates a set of data
structures to store the kernel state of the cgroup hierarchies and
their mountpoints.
Mainly, everything is now grouped with respect to the hierarchies of
the system. Multiple controllers may be mounted together or separately
to different hierarchies, the data structures reflect this.
Each hierarchy may have multiple mount points (that were created
previously using the bind mount method) and each of these mount points
may point to a different prefix inside the cgroup tree. The current
code does not make any assumptions regarding the mount points, it just
parses /proc/self/mountinfo to acquire the relevant information.
The only requirement is that the current cgroup of either init (if
cgroup.pattern starts with '/' and the tools are executed as root) or
the current process (otherwise) are accessible. The root cgroup need
not be accessible.
The configuration option cgroup.pattern is introduced. For
root-executed containers, it specifies which format the cgroups should
be in. Example values may include '/lxc/%n', 'lxc/%n', '%n' or
'/machine/%n.lxc'. Any occurrence of '%n' is replaced with the name of
the container (and if clashes occur in any hierarchy, -1, -2, etc. are
appended globally). If the pattern starts with /, new containers'
cgroups will be located relative to init's cgroup; if it doesn't, they
will be located relative to the current process's cgroup.
Some changes to the cgroup.h API have been done to make it more
consistent, both with respect to naming and with respect to the
parameters. This causes some changes in other parts of the code that
are included in the patch.
There has been some testing of this functionality, but there are
probably still quite a few bugs in there, especially for people with
different configurations.
Signed-off-by: Christian Seiler <christian@iwakd.de> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Newer glibc versions (that we can't require) allow for an additional
letter 'e' in the fopen mode that will cause the file to be opened with
the O_CLOEXEC flag, so that it will be closed if the program exec()s
away. This is important because if liblxc is used in a multithreaded
program, another thread might want to run a program. This options
prevents the leakage of file descriptors from LXC. This patch adds an
emulation for that that uses the open(2) syscall and fdopen(3). At some
later point in time, it may be dropped against fopen(..., "...e").
This commit also converts all fopen() calls in utils.c (where the
function is added) to fopen_cloexec(). Subsequently, other calls to
fopen() and open() should also be adapted.
Signed-off-by: Christian Seiler <christian@iwakd.de> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
global config: Unify parsing, add additional checks
Instead of duplicating the code for parsing the global config file for
each option, write one main function, lxc_global_config_value, that
does the parsing for an arbitrary option name and just call that
function from the existing ones.
Signed-off-by: Christian Seiler <christian@iwakd.de> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
error.c: don't return error if container init signaled
We log that at INFO level in case it is needed. However, in a modern
kernel a container which was shut down using 'shutdown' will always
have been signaled with SIGINT. Making lxc-start return an error to
reflect that seems overkill.
It's *conceivable* that someone is depending on this behavior, so I'm
sending this out for anyone to NACK, but if I hear no complaints I'll
apply.
Hopefully someone else will come in and spruce it up :) This
version is as simple as can be
lxc-snapshot -n a1
create a snapshot of a1
echo "second commit" > /tmp/a
lxc-snapshot -n a1 -c /tmp/a
create a snapshot of a1 with /tmp/a as a commit comment
lxc-snapshot -n a1 -L
list a1's snapshots
lxc-snapshot -n a1 -L -C
list a1's snapshots along with commit comments
lxc-snapshot -n a1 -r snap0 a2
restore snapshot 0 of a1 as container a2
Some easy nice-to-haves:
1. sort snapshots in the list
2. allow a comment to be given in-line
3. an option to remove a snapshot?
Removing a snapshot can just as well be done with
lxc-destroy -P /var/lib/lxcsnaps/c1 -n snap2
so I leave it to others to decide whether they really want
it, and provide the patch if so.
The api allows for creating, listing, and restoring of container
snapshots. Snapshots are created as snapshot clones of the
original container - i.e. btrfs and lvm will be done as snapshot,
a directory-backed container will have overlayfs snapshots. A
restore is a copy-clone, using the same backing store as the
original container had.
Changelog:
. remove lxcapi_snap_open, which wasn't defined anyway.
. rename get_comment to get_commentpath
. if no newname is specified at restore, use c->name (as we meant to)
rather than segving.
. when choosing a snapshot index, use the correct path to check for.
Natanael Copa [Fri, 6 Sep 2013 07:08:45 +0000 (09:08 +0200)]
lua: fix logic to enable lua support in configure
When there is no --enable-lua or --with-lua-pc, Lua should not be
enabled.
This fixes a bug introduced with 12e93188 (configure/makefile:
Allow specify Lua pkg-config file with --with-lua-pc) that caused
configure script to fail if lua headers was missing.
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Natanael Copa [Thu, 5 Sep 2013 15:13:07 +0000 (17:13 +0200)]
configure/makefile: Allow specify Lua pkg-config file with --with-lua-pc
Enable support for both Lua 5.1 and 5.2 by letting user specify the Lua
pkg-config package name. By default it will use 'lua' and try figure
out which version it is.
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Serge Hallyn [Fri, 14 Jun 2013 03:43:01 +0000 (22:43 -0500)]
introduce lxc.cap.keep
The lxc configuration file currently supports 'lxc.cap.drop', a list of
capabilities to be dropped (using the bounding set) from the container.
The problem with this is that over time new capabilities are added. So
an older container configuration file may, over time, become insecure.
Walter has in the past suggested replacing lxc.cap.drop with
lxc.cap.preserve, which would have the inverse sense - any capabilities
in that set would be kept, any others would be dropped.
Realistically both have the same problem - the sendmail capabilities
bug proved that running code with unexpectedly dropped privilege can be
dangerous. This patch gives the admin a choice: You can use either
lxc.cap.keep or lxc.cap.drop, not both.
Both continue to be ignored if a user namespace is in use.
lxc-commands: add a comment explaining CMD_* rules
We wish to ensure that, henceforth, newer lxc tools are always compatible
with older lxc monitors. Add a comment to commands.c to explain the
rule we wish to enforce to this end.
Serge Hallyn [Thu, 29 Aug 2013 15:41:19 +0000 (10:41 -0500)]
start.c: handle potential signal flood
Signalfd does not guarantee that we'll get an event for every signal.
So if 3 tasks exit at the same time, we may get only one sigchld
event. Therefore, in signal_handler(), always check whether init has
exited. Do with with WNOWAIT so that we can still wait4 to cleanup
the init after lxc_poll() exists (rather than complicating the code).
Note - there is still a race in the kernel which can cause the
container init to become a defunct child of the host init (!). This
doesn't solve that, but is a potential (if very unlikely) race which
apw pointed out while we were trying to create a reproducer for the
kernel bug.
Serge Hallyn [Thu, 22 Aug 2013 15:27:40 +0000 (10:27 -0500)]
api: convert lxc_start
Normal lxc-start usage tends to be "lxc-start -n name [-P lxcpath]".
This causes $lxcpath/$name/config to be the configuration for the
container. However, lxc-start is more flexible than that. You can
specify a custom configuration file, in which case $lxcpath/$name/config
is not used. You can also (in addition or in place of either of these)
specify configuration entries one-by-one using "-s lxc.utsname=xxx".
To support this using the API, if we are not using
$lxcpath/$name/config then we put ourselves into a custom lxcpath
called (configurable using LXCPATH) /var/lib/lxc_anon. To stop a
container so created, then, you would use
lxc-stop -P /var/lib/lxc_anon -n name
TODO: we should walk over the list of &defines by hand and set them
using c->set_config_item. I haven't done that in this patch.
Scott Moser [Thu, 22 Aug 2013 19:38:48 +0000 (15:38 -0400)]
hooks/ubuntu-cloud-prep: add hostname to meta-data
prior to my enabling of the clone hook, the setting of the hostname
was being done by writing to /etc/hostname. Instead of relying on that
we're now writing 'local-hostname' into the metadata for the instance.
cloud-init then reads this and sets the hostname properly.
We are also writing /etc/hostname with the new hostname explicitly. This is
useful/necessary because on network bringup of eth0, dhclient will submit its
hosname. The updating done by cloud-init occurs to late, and thus
the dhcp request goes out with the un-configured hostname and dns doens't
work correctly.
Signed-off-by: Scott Moser <smoser@ubuntu.com> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Wed, 21 Aug 2013 19:43:52 +0000 (14:43 -0500)]
Track snapshot dependencies (v2)
(Will push in a bit barring any objections)
lvm, btrfs, and zfs snapshots each do an ok job of handling deletions
for us - a btrfs snapshot does fine after the original is removed,
while zfs and lvm will both refuse to allow the original to be deleted
while the snapshot exists.
Overlayfs doesn't do this for us. So, for overlayfs snapshots, track
the dependencies.
When c2 is created as an overlayfs snapshot of dir-backed c1, then
1. c2's lxc_rdepends file will contain
c1_lxcpath
c1_lxcname
2. c1's lxc_snapshots will contain "1"
c1 cannot be deleted so long as lxc_snapshots exists and contains
a non-zero number.
The contents of lxc_snapshots and lxc_rdepends are protected by
container_disk_lock() and at lxc_clone by the new container not yet
being accessible.
(Originally I was going to keep them in the container config, but the
problem with using $lxcpath/$name/config is that api users could end up
calling c->save_config() with a cached old value of snapshots/rdepends.)
Changelog:
aug 21: check for fprintf and fclose failures