Natanael Copa [Wed, 12 Jun 2013 09:18:04 +0000 (11:18 +0200)]
lxc-init: continue even if we fail to mount /dev/mqueue
The 'lxc-init' (a lightweight init process used by lxc-execute in place
of upstart etc) tries to mount /dev/mqueue during startup. If that fails
(for instance due to missing support for mqueue in kernel) then it
aborts execution and returns -1. This is unreasonable as very few
applications actually need /dev/mqueue.
This similar to what we do with /dev/shm.
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Qiang Huang [Fri, 7 Jun 2013 07:27:32 +0000 (15:27 +0800)]
lxc-execute: allow lxc-init to log only when we have a valid log level
Right now if we use lxc-execute without log level set, we get error:
lxc: invalid log priority NOTSET.
Because we set log level manually in execute_start(), but didn't
check if we have a valid log level or not, so fix it.
Serge Hallyn [Mon, 3 Jun 2013 16:19:01 +0000 (18:19 +0200)]
lxc_create: support 'lxc-create -t <template> -h'
With the lxc-create script, 'lxc-create -t template -h' used to call
'template -h' to get template-specific help. The api based lxc-create
did not yet support that.
Add a 'helpfn' method to the lxc_arguments, which is called at the end
of printhelp, and passed the lxc_arguments. Use that in lxc_create to
reintroduce the desired behavior.
Natanael Copa [Tue, 28 May 2013 08:25:14 +0000 (10:25 +0200)]
lxc-alpine: download a static package manager if its missing
If the package manager, apk-tools is missing, then:
- download a static binary and public keys
- verify the keys against embedded checksum
- verify the signature of the static binary against the downloaded keys
- use the verified static binary
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org> Signed-off-by: Kaarle Ritvanen <kaarle.ritvanen@datakunkku.fi> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Fri, 31 May 2013 14:02:33 +0000 (16:02 +0200)]
configure/makefile: rename default_conf to distro_conf
configure/makefile: rename default_conf to distro_conf, since it is a per-distro
default. Then we'll be able to use the symbol LXC_DEFAULT_CONF in the code to
refer to the installed file.
Serge Hallyn [Thu, 30 May 2013 16:22:16 +0000 (11:22 -0500)]
waitpid at abort to make sure we can rmdir cgroups
If we abort the container start, and don't wait for the init task to be
reaped after we kill it, then we can't remove the container cgroup
because it is not empty.
Serge Hallyn [Wed, 29 May 2013 17:26:25 +0000 (12:26 -0500)]
lxccontainer: don't lock around getstate and freeze/unfreeze (v2)
Those go through commands.c and are already mutex'ed that way.
Also remove a unmatched container_disk_unlock in lxcapi_create.
Since is_stopped uses getstate which is no longer locked, rename
it to drop the _locked suffix.
And convert save_config to taking the disk lock. This way the
save_ and load_config are mutexing each other, as they should.
Changelog: May 29:
Per Dwight's comment, take the lock before opening the config
FILE *.
Only take disklock at load and save_config when we're using the
container's config file, not when read/writing from/to another
file.
Dwight Engen [Tue, 28 May 2013 19:25:41 +0000 (15:25 -0400)]
add console to lxc api
Make lxc_cmd_console() return the fd from the socket connection to the
caller. This fd keeps the tty slot allocated until the caller closes
it. Returning the fd allows for a long lived process to close the fd
and reuse consoles.
Serge Hallyn [Tue, 28 May 2013 20:27:42 +0000 (15:27 -0500)]
api_clone: call is_stopped_locked() to avoid deadlock.
Technically as Dwight has mentioned we should probably drop the locking
from api_state() altogether, since those are protected through the
lxc command system.
Serge Hallyn [Fri, 17 May 2013 21:23:17 +0000 (23:23 +0200)]
Move container creation fully into the api
1. implement bdev->create:
python and lua: send NULL for bdevtype and bdevspecs.
They'll want to be updated to pass those in in a way that makes
sense, but I can't think about that right now.
2. templates: pass --rootfs
If the container is backed by a device which must be mounted (i.e.
lvm) then pass the actual rootfs mount destination to the
templates.
Note that the lxc.rootfs can be a mounted block device. The template
should actually be installing the rootfs under the path where the
lxc.rootfs is *mounted*.
Still, some people like to run templates by hand and assume purely
directory backed containers, so continue to support that use case
(i.e. if no --rootfs is listed).
Make sure the templates don't re-write lxc.rootfs if it is
already in the config. (Most were already checking for that)
3. Replace lxc-create script with lxc_create.c program.
Changelog:
May 24: when creating a container, create $lxcpath/$name/partial,
and flock it. When done, close that file and unlink it. In
lxc_container_new() and lxcapi_start(), check for this file. If
it is locked, create is ongoing. If it exists but is not locked,
create() was killed - remove the container.
May 24: dont disk-lock during lxcapi_create. The partial lock
is sufficient.
Serge Hallyn [Fri, 17 May 2013 05:20:10 +0000 (07:20 +0200)]
destroy: implement in the api
This requires implementing bdev->ops->destroy() for each of the backing
store types. Then implementing lxcapi_clone(), writing lxc_destroy.c
using the api, and removing the lxc-destroy.in script.
(this also has a few other cleanups, like marking some functions
static)
Changelog:
fold into destroy: fix zfs destroy
destroy: use correct program name in help
Serge Hallyn [Thu, 16 May 2013 21:03:47 +0000 (23:03 +0200)]
lxc-stop: use api, remove lxc_shutdown, extend lxc-stop functionality
implement c->reboot(c) in the api.
Also if the container is not running, return -2. Currently
lxc-stop will return 0, so you cannot tell the difference
between successfull stopping and noop.
Per stgraber's email:
- Remove lxc-shutdown
- Change lxc-stop so that:
* Default behaviour is to call shutdown(), wait 15s for STOPPED, if
not STOPPED, print a message to the user and call stop() [ NOTE:
actually 60 seconds per followup thread]
* We have a -r option to reboot the container (with proper check that
the container indeed rebooted within the next 15s)
* We have a -s option to shutdown the container without the automatic
fallback to stop()
* Add a -k option allowing a user to just kill a container
(equivalent to old lxc-stop, no shutdown() call and no delay).
Serge Hallyn [Fri, 24 May 2013 21:03:22 +0000 (16:03 -0500)]
locking: update per Dwight's comment
Create three pairs of functions:
int process_lock(void);
void process_unlock(void);
int container_mem_lock(struct lxc_container *c)
void container_mem_unlock(struct lxc_container *c)
int container_disk_lock(struct lxc_container *c);
void container_disk_unlock(struct lxc_container *c);
and use those in lxccontainer.c
process_lock() is to protect the process state among multiple threads.
container_mem_lock() is to protect a struct container among multiple
threads. container_disk_lock is to protect a container on disk.
Also remove the lock in lxcapi_init_pid() as Dwight suggested.
Fix a typo (s/container/contain) spotted by Dwight.
More locking fixes are needed, but let's first the the fundamentals
right. How close does this get us?
Serge Hallyn [Wed, 22 May 2013 21:24:00 +0000 (16:24 -0500)]
lxclock: Replace named sempahore with flock
The problem: if a task is killed while holding a posix semaphore,
there appears to be no way to have the semaphore be reliably
autmoatically released. The only trick which seemed promising
is to store the pid of the lock holder in some file and have
later lock seekers check whether that task has died.
Instead of going down that route, this patch switches from a
named posix semaphore to flock. The advantage is that when
the task is killed, its fds are closed and locks are automatically
released.
The disadvantage of flock is that we can't rely on it to exclude
threads. Therefore c->slock must now always be wrapped inside
c->privlock.
This patch survived basic testing with the lxcapi_create patchset,
where now killing lxc-create while it was holding the lock did
not lock up future api commands.
Dwight Engen [Thu, 23 May 2013 19:44:39 +0000 (15:44 -0400)]
fix memory leaks in cgroup functions
There were several memory leaks in the cgroup functions, notably in the
success cases.
The cgpath test program was refactored and additional tests added to it.
It was used in various modes under valgrind to test that the leaks were
fixed.
Simplify lxc_cgroup_path_get() and cgroup_path_get by having them return a
char * instead of an int and an output char * argument. The only return
values ever used were -1 and 0, which are now handled with NULL and non-NULL
returns respectively.
Use consistent variable names of cgabspath when refering to an absolute path
to a cgroup subsystem or file, and cgrelpath when refering to a container
"group/name" within the cgroup heirarchy.
Remove unused subsystem argument to lxc_cmd_get_cgroup_path().
Stéphane Graber [Thu, 23 May 2013 02:28:43 +0000 (22:28 -0400)]
python: Fix lxc-ls's usage of get_ips()
The recent port of get_ips() from pure python to the C API came with
a couple of API changes for that function call (as were highlighted in
the commit message).
I somehow didn't notice that lxc-ls was still calling with the old API
and so was crashing whenever it was asked to show the ipv4 or ipv6 address.
This is just some minor changes in the way the Fedora template is
synthesizing the target rootfs_path. Currently, the template uses a
path with the container in it twice like this:
/var/lib/lxc/rasputin/rasputin/rootfs
This happens because the container name is already contained in the
"path" and the template appends it a second time. This changes the
logic to be congruent with other templates such as lxc-arch. The new
behavior will be to create the rootfs like this:
/var/lib/lxc/rasputin/rootfs
Attached below the jump.
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | mhw@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
--
Signed-off-by: Michael H. Warfield <mhw@WittsEnd.com> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Dwight Engen [Tue, 21 May 2013 15:34:45 +0000 (11:34 -0400)]
oracle template: mount /dev/shm as tmpfs
sem_open(3) checks that /dev/shm is SHMFS_SUPER_MAGIC. Normally /dev/shm
is mounted in the initramfs created by dracut, but that won't be run for
a container so make sure that rc.sysinit mounts /dev/shm.
Serge Hallyn [Wed, 22 May 2013 01:31:04 +0000 (20:31 -0500)]
attach: and cgroup.c: be overly cautious
Realistically (as Dwight points out) it doesn't seem possible that
getline won't return at least one line in this functions, however
just to make absolutely sure we don't get a segv on free(NULL),
check line != NULL before freeing it on exit.
Dwight Engen [Fri, 17 May 2013 22:29:12 +0000 (18:29 -0400)]
extend command processor to handle generic data
Motivation for this change is to have the ability to get the run-time
configuration items from a container, which may differ from its current
on disk configuration, or might not be available any other way (for
example lxc.network.0.veth.pair). In adding this ability it seemed there
was room for refactoring improvements.
Genericize the command infrastructure so that both command requests and
responses can have arbitrary data. Consolidate all commands into command.c
and name them consistently. This allows all the callback routines to be
made static, reducing exposure.
Return the actual allocated tty for the console command. Don't print the
init pid in lxc_info if the container isn't actually running. Command
processing was made more thread safe by removing the static buffer from
receive_answer(). Refactored command response code to a common routine.
This adds a new get_ips call which takes a family (inet, inet6 or NULL),
a network interface (or NULL for all) and a scope (0 for global) and returns
a char** of all the IPs in the container.
This also adds a matching python3 binding (function result is a tuple) and
deprecates the previous pure-python get_ips() implementation.
WARNING: The python get_ips() call is quite different from the previous
implementation. The timeout argument has been removed, the family names are
slightly different (inet/inet6 vs ipv4/ipv6) and an extra scope parameter
has been added.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Patch to the lxc-fedora template to setup gettys on the ttys that are
enabled in the configuration. The area of the code already had some
modifications to that service that didn't seem to do anything and would
get wiped out by an update. I commented that out but subsumed the
change it was attempting into my command in case it does something on
another rev somewhere.
This is very similar to the logic in the OpenSuse template but doesn't
seem to appear in other templates, such as arch, which have to deal with
systemd. This isn't unique to Fedora. The templates for Fedora,
ArchLinux, and OpenSuse are the only three that seem to have any
reference to systemd at all.
Attached below the jump.
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | mhw@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
--
Signed-off-by: Michael H. Warfield <mhw@WittsEnd.com> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Dwight Engen [Fri, 17 May 2013 22:28:12 +0000 (18:28 -0400)]
return lxc generated name for veth pair
Doing a get_config_item for lxc.network.0.veth.pair only returns the
pair name if explicitly given, but it can be useful to know the name
even if it is the one that lxc autogenerated.
Serge Hallyn [Tue, 14 May 2013 21:10:37 +0000 (16:10 -0500)]
lxc: add clone hook.
Add a clone hook called from api_clone. Pass arguments to it from
lxc_clone.c.
The clone update hook is called while the container's bdev is mounted.
Information about the container is passed in through environment
variables LXC_ROOTFS_PATH, LXC_NAME, The LXC_ROOTFS_MOUNT, and
LXC_CONFIG_FILE.
So from the hook, updates to the container should be made under
$LXC_ROOTFS_MOUNT/ .
The hook also receives command line arguments as follows:
First argument is container name, second is always 'lxc', third
is the hook name (always clone), then come the arguments which
were passed to lxc-clone. I.e. when I did:
sudo lxc-clone demo2 demo3 -- hey there dude
the arguments passed in were "demo3 lxc clone hey there dude"
I personally would like to drop the first two arguments. The
name is available as $LXC_NAME, and the section argument ('lxc')
is meaningless. However, doing so risks invalidating existing
hooks.
Soon analogous create and destroy hooks will be added as well.
Serge Hallyn [Wed, 15 May 2013 20:21:24 +0000 (15:21 -0500)]
cgroup: prevent DOS when a hierachy is mounted multiple times
When starting a container, we walk through all cgroup mounts looking
for a unique directory name we can use for this container. If the
name we are trying is in use, we try another name. If it is not in
use in the first mount we check, we need to check other hierarchies
as it may exist there. But we weren't checking whether we have already
checked a subsystem - so that if freezer was mounted twice, we would
create it in the first mount, see it exists in the second, so start
over trying in the second mount.
To fix this, keep track of which subsystems we have already checked,
and do not re-check.
Note we still need to add, at the next: label, the removal of the
directories we've already created. I'm keeping that for later as
it's far lower priority than this fix, and I don't want to risk
introducing a regression for that.
Dwight Engen [Wed, 15 May 2013 16:27:34 +0000 (12:27 -0400)]
set non device cgroup items before the cgroup is entered
This allows some special cgroup items such as memory.kmem.limit_in_bytes
to be successfully set, since they must be set before any task is put
into the cgroup.
The devices cgroup is setup later giving the container a chance to mount
file systems before the device it might want to mount from becomes
unavailable.
lxc-fedora-template: autodev, hostname, ARM archs, Raspberry Pi fixes
This took a lot longer for me to get around to it... Sorry.
Patch to the lxc-fedora template.
I didn't get any further comments from my earlier proposal, weeks ago,
and did get one addition based on comments about properly setting the
hostname in /etc/hostname, which I've added. I could have broken them
into separate patches but most are pretty small and minor.
Changes:
* Map armv6l and armv7l architectures to "arm" for yum and repos to
function properly.
* Detect Fedora Remix distros with no "/etc/fedora-release" file
(Raspberry Pi) and find proper release versions when "remix" part of the
file context.
* Change default Fedora container on non-Fedora hosts to Fedora 17.
* Added code for autodev for Fedora systemd containers.
* Added code to set /etc/hostname for Fedora > 14 (systemd).
* Fix a few typos.
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | mhw@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
--
Signed-off-by: Michael H. Warfield <mhw@WittsEnd.com> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
lxc-busybox: check when bind-mounting host libdirs
The patch removes the behavior of automatically mounting /lib
and /usr/lib, since this is duplicated a few lines below. It will
also remove the risk of failing when one of these entries are not
present on the host - e.g. on a 64bit machine.
Serge Hallyn [Fri, 10 May 2013 05:52:22 +0000 (00:52 -0500)]
add lxc-cirros
Add a template to create a cirros container. One great thing about
cirros is that the image you download is 3.5M.
Thanks smoser!
Note by default /etc/inittab doesn't have a /dev/console entry, so you
don't get a login on the lxc-start console. Adding
console::respawn:/sbin/getty 115200 console
makes that work, but ctrl-c still gets forwarded to init which then
reboots. So I didn't bother adding console as part of the template
(yet). Instead I simply lxc-start -d, then lxc-console.
Signed-off-by: Scott Moser <scott.moser@canonical.com> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Dwight Engen [Mon, 13 May 2013 16:03:14 +0000 (12:03 -0400)]
serialize multiple threads doing lxcapi_start()
The problem is that the fd table is shared between threads and if a thread
forks() while another thread has an open fd to the monitor, the duped fd
in the fork()ed child will not get closed, thus causing monitord to stay
around since it thinks it still has a client. This only happened when
calling lxcapi_start() in the daemonized case since that is the only time
we try to get the status from the monitor.
Serge Hallyn [Wed, 8 May 2013 00:28:32 +0000 (19:28 -0500)]
lxc-ps: handle cgroup collisions
A few months ago cgroup handling in lxc was updated so that if
/sys/fs/cgroup/$cgroup/lxc/$container already exists (most often
due to another container by the same name under a different lxcpath),
then /sys/fs/cgroup/$cgroup/lxc/${container}-N would be used.
lxc-ps was never updated to handle this. Fix that.
(Note, the ns cgroup is being special cased there, but I don't
really believe ns cgroup works any more.)
It would be preferable to rewrite lxc-ps in python or in C, but
this at least makes the basic lxc-ps work in the case of multiple
containers with the same name.
Changelog:
fix missing fi.
replace 'z1' with '$container' as pointed out by Christian
Serge Hallyn [Tue, 7 May 2013 20:33:42 +0000 (15:33 -0500)]
conf.c: remove a break
commit ab81cef05338e7a553aacca141287034d6daf167 meant to remove the
added break, but apparently i had not done 'git add' before commit
--amend. Remove the added break.
Dwight Engen [Fri, 3 May 2013 17:41:40 +0000 (13:41 -0400)]
coverity: ftell returns a signed value
The check for flen < 0 could never have been true since flen was declared
to be size_t (unsigned). Declare flen to be long since that is what ftell
returns.
Dwight Engen [Fri, 3 May 2013 16:04:01 +0000 (12:04 -0400)]
coverity: fix leak in error case
Since lxc_execute() is available through the library and is exposed via
the API we cannot be sure the caller will immediately exit, so we should
take care to free the allocated memory.
Weng Meiling [Fri, 3 May 2013 03:02:48 +0000 (11:02 +0800)]
lxc_start: free the conf if starting the container fails
When running lxc-start command with valgrind, it reports a memory leak error.
When lxc-start command fails, the conf which is from malloc has not been released.
This patch fix the problem.
> Quoting Dwight Engen (dwight.engen@oracle.com):
> > So I did this, only to realize that lxc-init is passing "none" for
> > the file anyway, so it currently doesn't intend to log. This makes
> > me think that passing NULL for lxcpath is the right thing to do in
> > this patch. If you want me to make it so lxc-init can log, I can do
> > that but I think it should be in a different change :)
>
> That actually would be very useful, but as you say that's a different
> feature - thanks.
It's a tiny program (exported through the api) wrapping the util.c
helpers for reading /etc/lxc/lxc.conf variables, and replaces
the kludgy shell duplication in lxc.functions.in
Changelog: Apr 30: address feedback from Dwight
(exit error on failure, and use 'lxcpath' as name, not
'default_path').
for lvm and zfs, as we don't yet support passing options, only default
VG of 'lxc' and default zfsroot of 'tank' are supported when converting
another backing store type.
refuse deletion of container which has lvm or zfs snapshots.
Note that since a zfs clone must be made from a zfs snapshot,
which is made from the original zfs fs, even after we
lxc-destroy the snapshotted container we still must manually
remove the snapshot. This can be handled automatically, by
looking for snapshots where c1 is the original, c2 is the clone,
tank/c2 no longer exists, but tank/c1@c2 does. We can then
remove tank/c1@c2 and feel free to remove tank/c1. This patch
does NOT do that yet.
Make sure not to return when we're a forked child.
implement backend drivers and container clone API (v3)
1. commonize waitpid users to use a single helper. We frequently want
to run something in a clean namespace, or fork off a script. This
lets us keep the function doing fork:(1)exec(2)waitpid simpler.
2. start a blockdev backend implementation. This will be used for
mounting, copying, and snapshotting container filesystems.
3. implement btrfs, lvm, directory, and overlayfs backends.
4. For overlayfs, support a new lxc.rootfs format of
'bdevtype:<extra>'. This means you can now use overlayfs-based
containers without using lxc-start-ephemeral, by using
lxc.rootfs = overlayfs:/readonly-dir:writeable-dir
5. add a set of simple clone testcases
6. Write a new lxc_clone.c based on api clone.
Still to do (there's more, but off top of my head):
1. support zfs, aufs
2. have clone handle other mount entries (right now it only clones
the rootfs)
3. python, lua, and go bindings (not me :)
4. lxc-destroy: if lvm backing store, check for snapshots of it.
(what about directories which have overlayfs clones?)
Changes since v2:
Initialize random generator when picking new macaddr (reported
by caglar@10ur.org)
Fix wrong use of bitmask flags
On copy-clone of btrfs, create a subvolume
lxc_clone.c: respect the command line usage of the old script
lxc-clone(1): update documentation
Refuse to try changing backing stores expect to overlayfs, as
it is not implemented (yet) anyway.