Stefan Hanreich [Mon, 20 Nov 2023 19:19:54 +0000 (20:19 +0100)]
create: Do not call create_ifaces_ipams_ips
Since create_vm already calls update_pct_config, which in turn calls
vmconfig_apply_pending we do not need to explicitly create the IPAM
entries when creating a container from scratch.
Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
Stefan Hanreich [Mon, 20 Nov 2023 19:19:52 +0000 (20:19 +0100)]
network: Do not always reserve new IP in IPAM
Currently when updating the network configuration of a container, SDN
would always create a new entry in the IPAM. Only create a new entry
when the bridge or MAC changes or the NIC is completely new.
Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
Stefan Hanreich [Mon, 20 Nov 2023 19:19:51 +0000 (20:19 +0100)]
hotplug network: Only change IPAM when MAC or bridge changes
Currently a new IPAM entry is created everytime a NIC config changes.
When editing properties other than MAC or Bridge this could lead to
duplicated entries in the IPAM. Only reserve a new IP when the bridge
or MAC changes or the NIC is completely new.
Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
Filip Schauer [Fri, 17 Nov 2023 10:28:16 +0000 (11:28 +0100)]
Add device passthrough
Add a dev[n] argument to the container config to pass devices through to
a container. A device can be passed by its path. Additionally the access
mode, uid and gid can be specified through their respective properties.
Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
Thomas Lamprecht [Sun, 19 Nov 2023 18:10:34 +0000 (19:10 +0100)]
setup: handle getty services also via systemd-preset
fixes an issue where the first boot of a Fedora 39 CT had no
container-getty due to the default prefixes enabling the getty@
service instead, only on second boot (where presets aren't applied
anymore) our TTY handling actually was in effect and worked.
Note that preset aren't bothered by a service not existing, but still,
for older distro releases disabling getty@ could lead to problem, for
now we call this only for modern distro releases any way, and it also
only affects newly created CTs.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Sun, 19 Nov 2023 16:42:38 +0000 (17:42 +0100)]
setup base: disable sysfs debug mounts via systemd presets
they will fail and are not really useful in the container, at least
not as default.
Just disable via the preset mechanism, so any user can easily start
that mount if it'd make sense for their use case.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Christoph Heiss [Mon, 25 Sep 2023 11:38:49 +0000 (13:38 +0200)]
setup: fix architecture detection for NixOS containers
NixOS is special and deviates in many places from a "standard" Linux
system. In this case, /bin/sh does not exist in the filesystem, before
the initial activation (aka. first boot) - which creates a symlink at
/bin/sh.
Due to the currently existing fallback code, only an error message is
logged and the architecture is defaulted to x86_64. Still, this is not
something users might expect.
Thus try a bit harder to detect the architecture for NixOS containers by
inspecting the init script, which contains a shebang-line with the full
path to the system shell.
This moves the architecture detection code to the end of the container
creation lifecycle, so that it can be implemented as a plugin
subroutine. Therefore this mechanism is now generic enough that it can
be adapted to other container OS's in the future if needed. AFAICS
`arch` is only used when writing the actual LXC config, so determining
it later during creation does not change anything.
detect_architecture() has been made a bit more generic; the LXC-specific
error was moved out of this function, as well as the chroot(). Ensuring
that it is executed from the correct rootdir/chroot should be handled by
the caller.
Tested by creating a NixOS and a Debian container (to verify that
nothing regressed) and checking if the warning "Architecure detection
failed: [..]" no longer appears for the NixOS CT and if `arch` in the
CT config is correct. Also tested restoring both containers from a local
and a PBS backup, as well as migrating both container.
Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
Leo Nunner [Thu, 15 Jun 2023 09:43:31 +0000 (11:43 +0200)]
api: network: get interfaces from containers
Adds an 'interfaces' endpoint in the API
(/nodes/{node}/lxc/{vmid}/interfaces'), which returns a list of
interface names, together with a MAC, IPv4 and IPv6 address. This list
may be expanded in the future. Note that this is only returned for
*running* containers, stopped containers simply return an empty list.
Stoiko Ivanov [Fri, 23 Jun 2023 17:19:37 +0000 (19:19 +0200)]
setup: fedora: fix wrong systemd-networkd preset
The refactoring of the systemd-preset handling inadvertently changed
the preset for Fedora >= 37 to disabled in e11806e ("add
setup_systemd_preset helper, disable networkd for debian 12+")
Reported in our community forum:
https://forum.proxmox.com/threads/129395/
Aaron Lauterer [Mon, 19 Jun 2023 09:29:36 +0000 (11:29 +0200)]
migration: fail when aliased volume is detected
Aliased volumes (referencing the same volume multiple times) can lead to
unexpected behavior in a migration.
Therefore, stop the migration in such a case.
The check works by comparing the path returned by the storage plugin.
This means that we should be able to catch the common situations where
it can happen:
* by referencing the same volid multiple times
* having a different volid due to an aliased storage: different storage
name but pointing to the same location.
We decided against checking the storages themselves being aliased. It is
not possible to infer that reliably from just the storage configuration
options alone.
Aaron Lauterer [Mon, 19 Jun 2023 09:29:35 +0000 (11:29 +0200)]
migration: only migrate volumes used by the guest
When scanning all configured storages for volumes belonging to the
container, the migration could easily fail if a storage is not
available, but enabled. That storage might not even be used by the
container at all.
By not doing that and only looking at the disk images referenced in the
config, we can avoid that.
We need to add additional steps for pending volumes with checks if they
actually exist. Changing an existing mountpoint to a new volume
will only create the volume on the next start of the container.
The big change regarding behavior is that volumes not referenced in the
container config will be ignored. They are already orphans that used to
be migrated as well, but are now left where they are.
setup: enable systemd-networkd via preset for archlinux
Note that this is now done in `setup_init` which is a
pre-start hook rather than a one time template fixup,
however, the presets are only applied on first boot or if
the user requests them explicitly, and the usual mechanisms
to prevent the file from being written can be used.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Stoiko Ivanov [Wed, 14 Jun 2023 12:33:24 +0000 (14:33 +0200)]
tests: fix small syntax glitch
adaptation to adhere to perlcritics recommendation led to the snapshot
tests to not work anymore:
```
Undefined subroutine &Test::MockModule called at snapshot-test.pm line 300.
```
With this the snapshot tests still run and perlcritic seems happy
Stoiko Ivanov [Fri, 9 Jun 2023 13:05:51 +0000 (15:05 +0200)]
setup: systemd-network: use correct values for dhcp-modes
the change from v4->ipv4 happened 2015 in systemd commit cb9fc36a1211967e8c58b0502a26c42552ac8060 - so by now it should be
safe to replace it for all containers relying on systemd-networkd.
Friedrich Weber [Mon, 15 May 2023 13:08:23 +0000 (15:08 +0200)]
lxc start: warn in case of conflicting lxc.idmap entries
Users can customize the mapping between host and container uids/gids
by providing `lxc.idmap` entries in the container config. The syntax
is described in lxc.container.conf(5). One source of errors are
conflicting entries for one or more uid/gids. An example:
...
lxc.idmap: u 0 100000 65536
lxc.idmap: u 1000 1000 10
...
Assuming `root:1000:10` is correctly added to /etc/subuid, starting
the container fails with an error that is hard to interpret:
In order to simplify troubleshooting, validate the mapping before
starting the container and print a warning if a conflict is detected.
For the above mapping:
lxc.idmap: invalid map entry 'u 1000 1000 10':
container uid 1000 is also mapped by entry 'u 0 100000 65536'
The warning appears in the task log and in the output of `pct start`.
The validation subroutine considers uid and gid mappings separately.
For each of the two types, it makes one pass to detect container id
conflicts and one pass to detect host id conflicts. The subroutine
dies with the first detected conflict.
A failed validation only prints a warning instead of erroring out, to
make sure buggy (or outdated) validation logic does not prevent
containers from starting.
Note that validation does not take /etc/sub{uid,gid} into account,
which, if misconfigured, could still prevent the container from
starting with an error like
"newuidmap: uid range [1000-1010) -> [1000-1010) not allowed"
If needed, validating /etc/sub{uid,gid} could be added in the future.
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
Thomas Lamprecht [Wed, 26 Apr 2023 14:21:21 +0000 (16:21 +0200)]
memory: enforce memory.high also on hotplug changes
Factor out the calculation into a method to ensure it keeps in sync
and then use the newly added parameter of the change_memory_limit
PVE::CGroup method, bump the dependency in d/control respectively.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
memory: set cgroupv2 memory.high to ~99.6% of memory.max hard-limit
cgroup memory usage is limited by the hard 'max' limit (OOM-killer
enforced) and the soft 'high' limit (cgroup processes get throttled
and put under heavy reclaim pressure). Set the latter high limit to
1016/1024 (~99.2%) of the 'max' hard limit, this scales with CT
memory allocations, & gives a decent 2^x based rest for 2^y memory
config which is still quite near the upper bound – clamp the maximum
gap between high and max at 128 MiB to avoid that huge container pay
quite an high amount of absolute cost.
A few example for differences between max & high for a few mem sizes:
- 2 MiB lower for 256 MiB max
- 16 MiB lower for 2 GiB max
- 128 MiB for 16 GiB and above
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Wed, 26 Apr 2023 14:22:35 +0000 (16:22 +0200)]
setup: avoid writing truncated machine-id if it didn't exist
Allows an admin to prepare a template that will have the first-boot
condition set on first start, as we only want to disable first-boot
condition but (re)generate also a machine-id on clone if the
machine-id already exist and isn't set to "uninitialized".
Christoph Heiss [Wed, 22 Feb 2023 12:49:02 +0000 (13:49 +0100)]
net: Add `link_down` config to allow setting interfaces as disconnected
If this network option is set, the host-side link will be forced down
and the interface won't be connected to the bridge.
Add a `Disconnect` option for network interfaces on LXC containers, much
like it already exists for VMs. This has been requested in #3413 [0] and
seems useful, especially considering we already support the same thing
for VMs.
One thing to note is that LXC does not seem to support the notion of
setting an interface down. The `flags` property would suggest that this
possible [1], but AFAICS it does not work. I tried setting the value as
empty and to something else than "up" (since that is really the only
supported option [2][3]), which both had absolutely no effect.
Thus force the host-side link of the container network down and avoid
adding it to the designated bridge if the new option is set, effectively
disconnecting the container network.
Signed-off-by: Christoph Heiss <c.heiss@proxmox.com> Tested-by: Friedrich Weber <f.weber@proxmox.com>
[ T: paste cover letter as commit message ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Friedrich Weber [Mon, 20 Feb 2023 10:04:45 +0000 (11:04 +0100)]
fix #4470: pct fstrim: ignore bind or read-only mountpoints
Currently, `pct fstrim` will run `fstrim` on all mountpoints
of the container, including bind and read-only mountpoints.
However, trimming a bind mountpoint might trim a host
filesystem, which users may not expect. Also, trimming can
be considered a write operation, which users may not expect
to be carried out on a read-only mountpoint.
Hence, exclude bind mointpoints and read-only mountpoints
from trimming.
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
Friedrich Weber [Wed, 25 Jan 2023 13:07:49 +0000 (14:07 +0100)]
fix: shutdown: if lxc-stop fails, wait for socket closing with timeout
When trying to shutdown a hung container with `forceStop=0` (e.g. via
the Web UI), the shutdown task may run indefinitely while holding a lock
on the container config. The reason is that the shutdown subroutine
waits for the LXC command socket to close, even if the `lxc-stop`
command has failed due to timeout. This prevents other tasks (such as a
stop task) from acquiring the lock. In order to stop the container, the
shutdown task has to be explicitly killed first, which is inconvenient.
This occurs e.g. when trying to shutdown a hung CentOS 7 container (with
systemd <v232) in a cgroupv2 environment.
This fix imposes a timeout on the socket polling operation if the
`lxc-stop` command has failed. Behavior in case `lxc-stop` succeeds is
unchanged. This reintroduces some behavior from b1bad293. The timeout
duration is the given shutdown timeout, meaning that the final task
duration in the scenario above is twice the shutdown timeout.
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
Friedrich Weber [Mon, 16 Jan 2023 16:52:34 +0000 (17:52 +0100)]
fix #4460: setup: centos: create /etc/hostname if it does not exist
Previously, the CentOS setup only wrote to /etc/hostname if the file
already existed. Many CT templates of Redhat-derived distros do not
contain that file, so the containers ended up without /etc/hostname.
This caused systemd-hostnamed to report the "static hostname" to be
empty. If networking is handled by NetworkManager, the empty static
hostname caused DHCP requests to be sent without the "Hostname"
field, as reported in #4460.
With this fix, the CentOS setup module creates /etc/hostname if it
does not exist, so NetworkManager correctly reads the hostname and
includes it in DHCP requests.
Manually tested with the following CT templates (checking that
/etc/hostname exists and DHCP requests include the hostname):
- Distros using NetworkManager:
- Alma Linux 9 (almalinux-9-default_20221108_amd64.tar.xz)
- CentOS 8 (centos-8-default_20201210_amd64.tar.xz)
- CentOS 9 Stream (centos-9-stream-default_20221109_amd64.tar.xz)
- Rocky Linux 9 (rockylinux-9-default_20221109_amd64.tar.xz)
- Distros using network-scripts (here, DHCP requests already
contained the hostname without this fix, as network-scripts does
not rely on systemd-hostnamed):
- Alma Linux 8 (almalinux-8-default_20210928_amd64.tar.xz)
- CentOS 7 (centos-7-default_20190926_amd64.tar.xz)
- CentOS 8 Stream (centos-8-stream-default_20220327_amd64.tar.xz)
- Rocky Linux 8 (rockylinux-8-default_20210929_amd64.tar.xz)
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
[ T: slightly touch up of commit message format / wording ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
restore: also remove firewall config after failed restore
Before, a failed restore would only remove the container config, but
the firewall config would remain.
Now, the firewall config is also removed, except for the case when the
user only has the VM.Backup permission. In this case the firewall
would not have been restored/changed by us and is left as is.
Signed-off-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com>
restore: clean up config when invalid source archive is given
Before, if a non-existent source archive parameter was passed when
restoring a container, the task would fail but leave an empty config
file behind. The same with invalid mount point configurations.
In both cases, the empty config will now be removed.
Signed-off-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com>