nspawn: introduce the new /machine/ tree in the cgroup tree and move containers there
Containers will now carry a label (normally derived from the root
directory name, but configurable by the user), and the container's root
cgroup is /machine/<label>. This label is called "machine name", and can
cover both containers and VMs (as soon as libvirt also makes use of
/machine/).
libsystemd-login can be used to query the machine name from a process.
This patch also includes numerous clean-ups for the cgroup code.
Kay Sievers [Mon, 15 Apr 2013 21:39:42 +0000 (23:39 +0200)]
bus: fix missing macro argument renaming
<fdo-vcs> systemd kay master * b1454bf src/libsystemd-bus/ bus-kernel.c kdbus.h
<fdo-vcs> systemd bus: catch up with kernel changes
<kmacleod> kay: randomly looked at your commit, it looks like in KDBUS_FOREACH_ITEM
you missed changing a (d) to an (i) in (uint8_t*) (d) < (uint8_t*) (k) + (k)->size; ?
<kay> kmacleod: hah, so there *is* a reason for using _foo in macros :)
<kay> kmacleod: thanks!
audit: since nspawn now sets CAP_AUDIT_CONTROL for containers we cannot user this anymore to skip audit session ID retrieval
As audit is still broken in containers we need a reliable way how we can
determine whether the audit data we read from 7proc is actually useful.
Previously we used CAP_AUDIT_CONTROL for this, since nspawn removed that
from the nspawn container. This has changed a while back however, which
means we used audit data of host system in the container.
This adds an explicit container check to the audit calls, so that all
audit data is turned off in containers.
This should fix session creation with pam_systemd/logind in nspawn containers.
core: always create /user and /machine top-level cgroup dirs
This allows clients to put inotify watches on these trees to watch for
state changes, without having to wait until these dirs are created.
This introduces the new top-level /machine cgroup dir as canonical
location where OS containers and VMs shall be located (as discussed with
the libvirt folks).
b8a2b0f76 'use initalization instead of explicit zeroing'
introduced a bug where only the first sizeof(uint_t*) bytes
would be zeroed out, instead of the whole array.
"__attribute__((always_inline))" does not replace "inline" and they
still need to be used together. This fixes "always_inline function
might not be inlinable [-Wattributes]" warning in gcc 4.7
<fcntl.h> is POSIX. On Linux, <sys/fcntl.h> simply includes
<fcntl.h>, so there should be on difference. On Android
likewise, except that there is some more stuff. QNX has
only <fcntl.h>.
Tom Gundersen [Thu, 11 Apr 2013 19:14:40 +0000 (21:14 +0200)]
tmpfiles: create static device nodes before udev is started
Since v183, the contents of /usr/lib/udev/devices is no longer copied to /dev
on boot, rather systemd-tmpfiles should be used instead. However, as
systemd-tmpfiles --create is only ran long after udevd has been started, it is
no longer possible to use udev rules to assign permissions to the static nodes.
This calls systemd-tmpfiles --create early, before udev is started, and
restricts the call to /dev, which is known to be mounted already.
In the future, this could also take over the creation of static device nodes
from systemd-udevd.
Make sure we compare errno against positive error codes.
The ones in hwclock.c and install.c can have an impact, the
rest are unlikely to be hit or in code that isn't widely
used.
Also check that errno > 0, to help gcc know that we are
returning a negative error code.
In case of zsh completion, new functionality is less useful
because of caching. Nevertheless, zsh completion for restart
is made to behave more-or-less the same as bash completion.
At least sockets can be restarted.
macro: make sure ALIGN() can be calculated constant by the compiler
If we pass a constant value to ALIGN() gcc should have the chance to
calculate the value during compilation rather than runtime, so let's
avoid a static inline call if we can.
It is faster to use a bash built-in, then to invoke an external
program. The problem of unit names starting with a dash is solved
by prepending a space. Spaces are ignored anyway.
For zsh, replace echo "$unit", which is vulnerable to dashes,
with echo " $unit".
systemctl: ellipsize job list only when necessary, highlight running
I was debugging systemd waiting on a missing disk, and noticed
that the job listing could use some polishing. Jobs that are
actually running are highlighted, so it's easier to see what
very actually waiting for.
Also, the needed widths are precalculated, to use available columns
more ecomically.
There were old session state files accumulating in /run/systemd/session.
They confused e.g. "reboot", which thought there were still users logged
in. The files got created like this:
session_stop(Session *s) ->
...
unlink(s->state_file);
...
seat_set_active(s->seat, NULL) ->
session_save(...); /* re-creates the state file we just
unlinked */
Fix it simply by clearing the s->started flag earlier to prevent
any further writes of the state file (session_save() checks the flag).
As it turns out if you pass a va_list to a function its state becomes
undefined after that function returns, and this actually does break on
x86-32.
Hence, let's reimplement message_read_ap() without the use of recursion.
Instead we now build our own stack of types in an array so that we can
decode the entire parameter list in a single stackframe.