CONTAINER_INTERFACE[0] is something systemd people call their API and
we need to adapt to it a bit, even if it means doing stupid
unnecessary things, as else systemd decides to regress and suddenly
break network stack in CT after an upgrade[1].
This mounts the parent /sys as mixed, which is:
> mount /sys as read-only but with /sys/devices/virtual/net writable.
-- man 5 lxc.container.conf
Allow users to overwrite that with a features knob, as surely some
run into other issues else and manually adding a "lxc.mount.auto"
entry in the container .conf is not an nice user experience for most.
Fixes the system regression in up to date Arch installations
introduced by[2].
[0]: https://systemd.io/CONTAINER_INTERFACE/
[1]: https://github.com/systemd/systemd/issues/15101#issuecomment-
598607582
[2]: https://github.com/systemd/systemd/commit/
bf331d87171b7750d1c72ab0b140a240c0cf32c3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
$raw .= "lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file 0 0\n";
}
+ if ($unprivileged && !$features->{force_rw_sys}) {
+ # unpriv. CT default to sys:rw, but that doesn't always plays well with
+ # systemd, e.g., systemd-networkd https://systemd.io/CONTAINER_INTERFACE/
+ $raw .= "lxc.mount.auto = sys:mixed\n";
+ }
+
# WARNING: DO NOT REMOVE this without making sure that loop device nodes
# cannot be exposed to the container with r/w access (cgroup perms).
# When this is enabled mounts will still remain in the monitor's namespace
." This requires a kernel with seccomp trap to user space support (5.3 or newer)."
." This is experimental.",
},
+ force_rw_sys => {
+ optional => 1,
+ type => 'boolean',
+ default => 0,
+ description => "Mount /sys in unprivileged containers as `rw` instead of `mixed`."
+ ." This can break networking under newer (>= v245) systemd-network use."
+ },
};
my $confdesc = {