git.proxmox.com Git - mirror

pam_cgfs: reimplement and add cgroupfs v2 support

This is a rewrite of pam_cgfs which leans on LXC's cgfsng.c. Various codepaths
have been adapted and made more appropriate.

The strategy of pam_cgfs v2 is to support cgroupfs v1, cgroupfs v2, and mixed
mounts where some controllers are mounted into a standard cgroupfs v1 hierarchy
location (/sys/fs/cgroup/<controller>) and other controllers are mounted into
the cgroupfs v2 hierarchy.

The functions and types for cgroupfs v1 and cgroupfs v2 have nearly all been
kept separately even if they do nearly the exact same job. This is on purpose!
Although marked non-experimental, cgroupfs v2 is too much of a moving target.
Extrapolating from currentl cgroupfs v2 standard behaviour seems risky and error
prone. Even more so when those assumptions complexify or simplify cgroupfs v1
assumptions when trying to handle both, cgroupfs v1 and cgroupfs v2, in one
function. In short, code duplication currently is on purpose so that we can
easily adapt to changes in cgroupfs v2 behaviour without having to touch any of
the functions or types that deal with the basically standardized cgroupfs v1
behaviour.

A quick run-through of what current pam_cgfs does (The same wording can be found
in the preamble/license to pam_cgfs.c.):

When a user logs in, this pam module will create cgroups which the user may
administer. It handles both pure cgroupfs v1 and pure cgroupfs v2, as well as
mixed mounts, where some controllers are mounted in a standard cgroupfs v1
hierarchy location (/sys/fs/cgroup/<controller>) and others are in the cgroupfs
v2 hierarchy.
Writeable cgroups are either created for all controllers or, if specified, for
any controllers listed on the command line.
The cgroup created will be "user/$user/0" for the first session, "user/$user/1"
for the second, etc.

Systems with a systemd init system are treated specially, both with respect to
cgroupfs v1 and cgroupfs v2. For both, cgroupfs v1 and cgroupfs v2, we check
whether systemd already placed us in a cgroup it created, e.g.

user.slice/user-uid.slice/session-n.scope

by checking whether uid == our uid. If it did, we simply chown the last
part (session-n.scope). If it did not we create a cgroup as outlined above
(user/$user/n) and chown it to our uid.
The same holds for cgroupfs v2 where checking this assumption becomes crucial:
If we systemd already created and placed us in a cgroups, we __have to__ be
placed our under it on login, otherwise things like starting an xserver or
similar will not work.

All requested cgroups must be mounted under /sys/fs/cgroup/$controller,
no messing around with finding mountpoints.

Note, as of now, we currently do not yet necessarily deal correctly with weird
corner cases like not mounting the name=systemd cgroupfs v1 controller at
/sys/fs/cgroup/systemd but rather mounting an empty cgroupfs v2 hierarchy at the
same location which is used by systemd to track processes. This is left for
future commits.

Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

author	Christian Brauner <christian.brauner@canonical.com>
	Sun, 13 Nov 2016 05:07:58 +0000 (06:07 +0100)
committer	Christian Brauner <christian.brauner@canonical.com>
	Wed, 16 Nov 2016 20:15:33 +0000 (21:15 +0100)
commit	e65cfafc7aa0d691eacf2ab4df2b9183889cbd90
tree	f3c8f12361034bc66bdc412898039ff9fd731d97	tree
parent	17e0e36838447659e9679412a41a011b911a38ab	commit \| diff