<para>
specify what kind of network virtualization to be used
for the container.
+ Must be specified before any other option(s) on the net device.
Multiple networks can be specified by using an additional index
<option>i</option>
after all <option>lxc.net.*</option> keys. For example,
the container at some <filename>path</filename>, and then mounts
under <filename>path</filename>, then a TOCTTOU attack would be
possible where the container user modifies a symbolic link under
- his home directory at just the right time.
+ their home directory at just the right time.
</para>
<variablelist>
<varlistentry>
</term>
<listitem>
<para>
- extra mount options to use when mounting the rootfs.
+ Specify extra mount options to use when mounting the rootfs.
+ The format of the mount options corresponds to the
+ format used in fstab. In addition, LXC supports the custom
+ <option>idmap=</option> mount option. This option can be used
+ to tell LXC to create an idmapped mount for the container's
+ rootfs. This is useful when the user doesn't want to recursively
+ chown the rootfs of the container to match the idmapping of the
+ user namespace the container is going to use. Instead an
+ idmapped mount can be used to handle this.
+ The argument for
+ <option>idmap=</option>
+ can either be a path pointing to a user namespace file that
+ LXC will open and use to idmap the rootfs or the special value
+ "container" which will instruct LXC to use
+ the container's user namespace to idmap the rootfs.
</para>
</listitem>
</varlistentry>
</refsect2>
<refsect2>
- <title>Control group</title>
+ <title>Control groups ("cgroups")</title>
<para>
The control group section contains the configuration for the
different subsystem. <command>lxc</command> does not check the
started, but has the advantage of permitting any future
subsystem.
</para>
+
+ <para>
+ The kernel implementation of cgroups has changed significantly over the
+ years. With Linux 4.5 support for a new cgroup filesystem was added
+ usually referred to as "cgroup2" or "unified hierarchy". Since then the
+ old cgroup filesystem is usually referred to as "cgroup1" or the
+ "legacy hierarchies". Please see the cgroups manual page for a detailed
+ explanation of the differences between the two versions.
+ </para>
+
+ <para>
+ LXC distinguishes settings for the legacy and the unified hierarchy by
+ using different configuration key prefixes. To alter settings for
+ controllers in a legacy hierarchy the key prefix
+ <option>lxc.cgroup.</option> must be used and in order to alter the
+ settings for a controller in the unified hierarchy the
+ <option>lxc.cgroup2.</option> key must be used. Note that LXC will
+ ignore <option>lxc.cgroup.</option> settings on systems that only use
+ the unified hierarchy. Conversely, it will ignore
+ <option>lxc.cgroup2.</option> options on systems that only use legacy
+ hierachies.
+ </para>
+
+ <para>
+ At its core a cgroup hierarchy is a way to hierarchically organize
+ processes. Usually a cgroup hierarchy will have one or more
+ "controllers" enabled. A "controller" in a cgroup hierarchy is usually
+ responsible for distributing a specific type of system resource along
+ the hierarchy. Controllers include the "pids" controller, the "cpu"
+ controller, the "memory" controller and others. Some controllers
+ however do not fall into the category of distributing a system
+ resource, instead they are often referred to as "utility" controllers.
+ One utility controller is the device controller. Instead of
+ distributing a system resource it allows to manage device access.
+ </para>
+
+ <para>
+ In the legacy hierarchy the device controller was implemented like most
+ other controllers as a set of files that could be written to. These
+ files where named "devices.allow" and "devices.deny". The legacy device
+ controller allowed the implementation of both "allowlists" and
+ "denylists".
+ </para>
+
+ <para>
+ An allowlist is a device program that by default blocks access to all
+ devices. In order to access specific devices "allow rules" for
+ particular devices or device classes must be specified. In contrast, a
+ denylist is a device program that by default allows access to all
+ devices. In order to restrict access to specific devices "deny rules"
+ for particular devices or device classes must be specified.
+ </para>
+
+ <para>
+ In the unified cgroup hierarchy the implementation of the device
+ controller has completely changed. Instead of files to read from and
+ write to a eBPF program of
+ <option>BPF_PROG_TYPE_CGROUP_DEVICE</option> can be attached to a
+ cgroup. Even though the kernel implementation has changed completely
+ LXC tries to allow for the same semantics to be followed in the legacy
+ device cgroup and the unified eBPF-based device controller. The
+ following paragraphs explain the semantics for the unified eBPF-based
+ device controller.
+ </para>
+
+ <para>
+ As mentioned the format for specifying device rules for the unified
+ eBPF-based device controller is the same as for the legacy cgroup
+ device controller; only the configuration key prefix has changed.
+ Specifically, device rules for the legacy cgroup device controller are
+ specified via <option>lxc.cgroup.devices.allow</option> and
+ <option>lxc.cgroup.devices.deny</option> whereas for the
+ cgroup2 eBPF-based device controller
+ <option>lxc.cgroup2.devices.allow</option> and
+ <option>lxc.cgroup2.devices.deny</option> must be used.
+ </para>
+ <para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ A allowlist device rule
+ <programlisting>
+ lxc.cgroup2.devices.deny = a
+ </programlisting>
+ will cause LXC to instruct the kernel to block access to all
+ devices by default. To grant access to devices allow device rules
+ must be added via the <option>lxc.cgroup2.devices.allow</option>
+ key. This is referred to as a "allowlist" device program.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ A denylist device rule
+ <programlisting>
+ lxc.cgroup2.devices.allow = a
+ </programlisting>
+ will cause LXC to instruct the kernel to allow access to all
+ devices by default. To deny access to devices deny device rules
+ must be added via <option>lxc.cgroup2.devices.deny</option> key.
+ This is referred to as a "denylist" device program.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Specifying any of the aformentioned two rules will cause all
+ previous rules to be cleared, i.e. the device list will be reset.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ When an allowlist program is requested, i.e. access to all devices
+ is blocked by default, specific deny rules for individual devices
+ or device classes are ignored.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ When a denylist program is requested, i.e. access to all devices
+ is allowed by default, specific allow rules for individual devices
+ or device classes are ignored.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <para>
+ For example the set of rules:
+ <programlisting>
+ lxc.cgroup2.devices.deny = a
+ lxc.cgroup2.devices.allow = c *:* m
+ lxc.cgroup2.devices.allow = b *:* m
+ lxc.cgroup2.devices.allow = c 1:3 rwm
+ </programlisting>
+ implements an allowlist device program, i.e. the kernel will block
+ access to all devices not specifically allowed in this list. This
+ particular program states that all character and block devices may be
+ created but only /dev/null might be read or written.
+ </para>
+
+ <para>
+ If we instead switch to the following set of rules:
+ <programlisting>
+ lxc.cgroup2.devices.allow = a
+ lxc.cgroup2.devices.deny = c *:* m
+ lxc.cgroup2.devices.deny = b *:* m
+ lxc.cgroup2.devices.deny = c 1:3 rwm
+ </programlisting>
+ then LXC would instruct the kernel to implement a denylist, i.e. the
+ kernel will allow access to all devices not specifically denied in
+ this list. This particular program states that no character devices or
+ block devices might be created and that /dev/null is not allow allowed
+ to be read, written, or created.
+ </para>
+
+ <para>
+ Now consider the same program but followed by a "global rule"
+ which determines the type of device program (allowlist or
+ denylist) as explained above:
+ <programlisting>
+ lxc.cgroup2.devices.allow = a
+ lxc.cgroup2.devices.deny = c *:* m
+ lxc.cgroup2.devices.deny = b *:* m
+ lxc.cgroup2.devices.deny = c 1:3 rwm
+ lxc.cgroup2.devices.allow = a
+ </programlisting>
+ The last line will cause LXC to reset the device list without changing
+ the type of device program.
+ </para>
+
+ <para>
+ If we specify:
+ <programlisting>
+ lxc.cgroup2.devices.allow = a
+ lxc.cgroup2.devices.deny = c *:* m
+ lxc.cgroup2.devices.deny = b *:* m
+ lxc.cgroup2.devices.deny = c 1:3 rwm
+ lxc.cgroup2.devices.deny = a
+ </programlisting>
+ instead then the last line will cause LXC to reset the device list and
+ switch from a allowlist program to a denylist program.
+ </para>
<variablelist>
<varlistentry>
<term>
lxc.net.1.ipv6.address = 2003:db8:1:0:214:1234:fe0b:3596
lxc.net.2.type = phys
lxc.net.2.flags = up
- lxc.net.2.link = dummy0
+ lxc.net.2.link = random0
lxc.net.2.hwaddr = 4a:49:43:49:79:ff
lxc.net.2.ipv4.address = 10.2.3.6/24
lxc.net.2.ipv6.address = 2003:db8:1:0:214:1234:fe0b:3297
lxc.mount.fstab = /etc/fstab.complex
lxc.mount.entry = /lib /root/myrootfs/lib none ro,bind 0 0
lxc.rootfs.path = dir:/mnt/rootfs.complex
+ lxc.rootfs.options = idmap=container
lxc.cap.drop = sys_module mknod setuid net_raw
lxc.cap.drop = mac_override
</programlisting>