while some people write percentages as 0.XX , putting a % next to that is just
confusing. also, combined with the format modifier this would be rather lossy,
and also not match regular `df` output..
Thomas Lamprecht [Wed, 24 Apr 2024 12:21:20 +0000 (14:21 +0200)]
api: fix regression with locking start-after-create
The API now correctly forks first a worker and then locks the config
in for the actual creation, but this broke the start-after-create
feature, as that also locks the config.
Use the same approach like in qemu-server and just do the start after
the create, outside of the lock.
While this has a small race window where another API call could lock
the newly created CT again, we never really guaranteed that the start
after create param is guaranteed atomic.
Even if we want to guarantee that someday, we can still do so, but
this is a good stop-gap until then and worked fine for VMs since its
introduction.
Fixes: 7a73568 ("api: status: move config locking from API handler into worker") Reported-by: Stefan Hanreich <s.hanreich@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Tested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Tested-by: Fiona Ebner <f.ebner@proxmox.com> Tested-by: Stefan Hanreich <s.hanreich@proxmox.com>
Stefan Hanreich [Fri, 19 Apr 2024 09:42:35 +0000 (11:42 +0200)]
firewall: add handling for new nft firewall
When the nftables firewall is enabled, we do not need to create
firewall bridges.
Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
[ TL: use a more meaningful variable name and add a comment ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
The previous wording made it sound like all "visible" tasks were
aborted, which is not the case: A user with Sys.Audit but without
Sys.Modify may see a task that was started by a different user, but
overrule-shutdown would not abort the task.
Change wording to better reflect that not all visible tasks may be
aborted.
Also, add a full-stop that was previously missing.
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
Thomas Lamprecht [Thu, 18 Apr 2024 13:19:36 +0000 (15:19 +0200)]
format disk: set FS root uid/gid for passed through /dev/ volumes
When calling mkfs one must pass the root uid/gid parameter along side
as they are used unconditionally, but this wasn't done for the edge
case where a block device from the host was used as volume for the CT,
causing an undef warning.
Note that this code branch is not reachable currently, but that might
change. For now add a FIXME comment to mark this for removal, as we
probably do not want to format devices from /dev/ in any way (and no
user reported that this was broken, so use case seems to be
non-existent).
Fixes: d216e89 ("unprivileged: remove bad chown -R call") Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Filip Schauer [Wed, 17 Apr 2024 14:35:53 +0000 (16:35 +0200)]
fix #4846: Avoid the outdated noacl mount option on ext4
Do not use the 'noacl' mount option when mounting a container disk with
an ext4 file system. The option was deprecated in kernel commit f70486055ee3 ("ext4: try to deprecate noacl and noxattr_user mount
options") (v3.4) as it no other filesystem exposed disabling ACL as
mount option, and then finally got removed in commit 2d544ec923db ("ext4:
remove deprecated noacl/nouser_xattr options") (v6.1).
Signed-off-by: Filip Schauer <f.schauer@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Friedrich Weber [Fri, 12 Apr 2024 14:15:50 +0000 (16:15 +0200)]
fix #4474: lxc api: add overrule-shutdown parameter to stop endpoint
The new `overrule-shutdown` parameter is boolean and defaults to 0. If
it is 1, all active `vzshutdown` tasks for the same CT (which are
visible to the user/token) are aborted before attempting to stop the
CT.
Passing `overrule-shutdown=1` is forbidden for HA resources.
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
Filip Schauer [Tue, 16 Apr 2024 09:27:17 +0000 (11:27 +0200)]
fix invalid device passthrough being added to config
Fix a bug that allows a device passthrough entry to be added to the
config despite the device path not pointing to a device. Previously,
adding an invalid device passthrough entry would throw an error, but the
entry would still be added to the config. This is fixed by moving the
respective checks from update_lxc_config to update_pct_config, which is
run before the entry is written to the config file.
Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
[FE: drop hunk for use statements left-over from earlier version] Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
mountpoint mount: activate PVE-managed volumes during preparation
Otherwise it was not possible to hotplug a volume that was previously
deactivated and requires activation, e.g. an LVM LV that was detached
after shutting down the container couldn't be hotplugged anymore
later.
Filip Schauer [Tue, 9 Apr 2024 09:26:22 +0000 (11:26 +0200)]
fix #5160: fix move_mount regression for mount point hotplug
Set up an Apparmor profile to allow moving mounts for mount point
hotplug.
This fixes a regression caused by
kernel commit 157a3537d6 ("apparmor: Fix regression in mount mediation")
The commit introduced move_mount mediation, which now requires
move_mount to be allowed in the Apparmor profile. Although it is allowed
for most paths in the /usr/bin/lxc-start profile, move_mount is called
with a file descriptor instead of a path in mountpoint_insert_staged,
thus it is not affected by the allow rules in
/etc/apparmor.d/abstractions/lxc/container-base.
To fix this, introduce a new Apparmor profile to allow move_mount on
every mount, specifically for mount point hotplug.
Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
Friedrich Weber [Tue, 30 Jan 2024 17:10:53 +0000 (18:10 +0100)]
api: status: move config locking from API handler into worker
Previously, container start/stop/shutdown/suspend would try to acquire
the config lock in the API handler prior to forking a worker. If the
lock was currently held elsewhere, this would block the API handler
and thus the pvedaemon worker thread until the 10s timeout expired (or
the lock could be acquired).
To avoid blocking the API handler, immediately fork off a worker
process and try to acquire the config lock in that worker.
Patch best viewed with `git show -w`.
Suggested-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
Folke Gleumes [Fri, 9 Feb 2024 13:17:09 +0000 (14:17 +0100)]
pct: add keep-env option to the 'enter' and 'exec' command
The keep-env option allows the user to define if the current
environment should be kept when running 'pct enter/exec'. pct will now
always set '--keep-env' or '--clear-env' when calling lxc-attach to
anticipate the upcoming change in default behavior.
Signed-off-by: Folke Gleumes <f.gleumes@proxmox.com>
[ TL: fix some extra whitespace, extend subject slightly ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Folke Gleumes [Mon, 29 Jan 2024 15:43:17 +0000 (16:43 +0100)]
fix #5194: pct: delete environment variables set by pve
proxmox-perl-rs set's SSL_CERT_{DIR,FILE}, which can break ssl in
containers if their certificate store can't be found in the same spot.
This patch explicitly unsets those variables before starting the
container.
Stefan Hanreich [Mon, 20 Nov 2023 19:19:54 +0000 (20:19 +0100)]
create: Do not call create_ifaces_ipams_ips
Since create_vm already calls update_pct_config, which in turn calls
vmconfig_apply_pending we do not need to explicitly create the IPAM
entries when creating a container from scratch.
Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
Stefan Hanreich [Mon, 20 Nov 2023 19:19:52 +0000 (20:19 +0100)]
network: Do not always reserve new IP in IPAM
Currently when updating the network configuration of a container, SDN
would always create a new entry in the IPAM. Only create a new entry
when the bridge or MAC changes or the NIC is completely new.
Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
Stefan Hanreich [Mon, 20 Nov 2023 19:19:51 +0000 (20:19 +0100)]
hotplug network: Only change IPAM when MAC or bridge changes
Currently a new IPAM entry is created everytime a NIC config changes.
When editing properties other than MAC or Bridge this could lead to
duplicated entries in the IPAM. Only reserve a new IP when the bridge
or MAC changes or the NIC is completely new.
Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
Filip Schauer [Fri, 17 Nov 2023 10:28:16 +0000 (11:28 +0100)]
Add device passthrough
Add a dev[n] argument to the container config to pass devices through to
a container. A device can be passed by its path. Additionally the access
mode, uid and gid can be specified through their respective properties.
Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
Thomas Lamprecht [Sun, 19 Nov 2023 18:10:34 +0000 (19:10 +0100)]
setup: handle getty services also via systemd-preset
fixes an issue where the first boot of a Fedora 39 CT had no
container-getty due to the default prefixes enabling the getty@
service instead, only on second boot (where presets aren't applied
anymore) our TTY handling actually was in effect and worked.
Note that preset aren't bothered by a service not existing, but still,
for older distro releases disabling getty@ could lead to problem, for
now we call this only for modern distro releases any way, and it also
only affects newly created CTs.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Sun, 19 Nov 2023 16:42:38 +0000 (17:42 +0100)]
setup base: disable sysfs debug mounts via systemd presets
they will fail and are not really useful in the container, at least
not as default.
Just disable via the preset mechanism, so any user can easily start
that mount if it'd make sense for their use case.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Christoph Heiss [Mon, 25 Sep 2023 11:38:49 +0000 (13:38 +0200)]
setup: fix architecture detection for NixOS containers
NixOS is special and deviates in many places from a "standard" Linux
system. In this case, /bin/sh does not exist in the filesystem, before
the initial activation (aka. first boot) - which creates a symlink at
/bin/sh.
Due to the currently existing fallback code, only an error message is
logged and the architecture is defaulted to x86_64. Still, this is not
something users might expect.
Thus try a bit harder to detect the architecture for NixOS containers by
inspecting the init script, which contains a shebang-line with the full
path to the system shell.
This moves the architecture detection code to the end of the container
creation lifecycle, so that it can be implemented as a plugin
subroutine. Therefore this mechanism is now generic enough that it can
be adapted to other container OS's in the future if needed. AFAICS
`arch` is only used when writing the actual LXC config, so determining
it later during creation does not change anything.
detect_architecture() has been made a bit more generic; the LXC-specific
error was moved out of this function, as well as the chroot(). Ensuring
that it is executed from the correct rootdir/chroot should be handled by
the caller.
Tested by creating a NixOS and a Debian container (to verify that
nothing regressed) and checking if the warning "Architecure detection
failed: [..]" no longer appears for the NixOS CT and if `arch` in the
CT config is correct. Also tested restoring both containers from a local
and a PBS backup, as well as migrating both container.
Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
Leo Nunner [Thu, 15 Jun 2023 09:43:31 +0000 (11:43 +0200)]
api: network: get interfaces from containers
Adds an 'interfaces' endpoint in the API
(/nodes/{node}/lxc/{vmid}/interfaces'), which returns a list of
interface names, together with a MAC, IPv4 and IPv6 address. This list
may be expanded in the future. Note that this is only returned for
*running* containers, stopped containers simply return an empty list.
Stoiko Ivanov [Fri, 23 Jun 2023 17:19:37 +0000 (19:19 +0200)]
setup: fedora: fix wrong systemd-networkd preset
The refactoring of the systemd-preset handling inadvertently changed
the preset for Fedora >= 37 to disabled in e11806e ("add
setup_systemd_preset helper, disable networkd for debian 12+")
Reported in our community forum:
https://forum.proxmox.com/threads/129395/
Aaron Lauterer [Mon, 19 Jun 2023 09:29:36 +0000 (11:29 +0200)]
migration: fail when aliased volume is detected
Aliased volumes (referencing the same volume multiple times) can lead to
unexpected behavior in a migration.
Therefore, stop the migration in such a case.
The check works by comparing the path returned by the storage plugin.
This means that we should be able to catch the common situations where
it can happen:
* by referencing the same volid multiple times
* having a different volid due to an aliased storage: different storage
name but pointing to the same location.
We decided against checking the storages themselves being aliased. It is
not possible to infer that reliably from just the storage configuration
options alone.
Aaron Lauterer [Mon, 19 Jun 2023 09:29:35 +0000 (11:29 +0200)]
migration: only migrate volumes used by the guest
When scanning all configured storages for volumes belonging to the
container, the migration could easily fail if a storage is not
available, but enabled. That storage might not even be used by the
container at all.
By not doing that and only looking at the disk images referenced in the
config, we can avoid that.
We need to add additional steps for pending volumes with checks if they
actually exist. Changing an existing mountpoint to a new volume
will only create the volume on the next start of the container.
The big change regarding behavior is that volumes not referenced in the
container config will be ignored. They are already orphans that used to
be migrated as well, but are now left where they are.
setup: enable systemd-networkd via preset for archlinux
Note that this is now done in `setup_init` which is a
pre-start hook rather than a one time template fixup,
however, the presets are only applied on first boot or if
the user requests them explicitly, and the usual mechanisms
to prevent the file from being written can be used.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Stoiko Ivanov [Wed, 14 Jun 2023 12:33:24 +0000 (14:33 +0200)]
tests: fix small syntax glitch
adaptation to adhere to perlcritics recommendation led to the snapshot
tests to not work anymore:
```
Undefined subroutine &Test::MockModule called at snapshot-test.pm line 300.
```
With this the snapshot tests still run and perlcritic seems happy
Stoiko Ivanov [Fri, 9 Jun 2023 13:05:51 +0000 (15:05 +0200)]
setup: systemd-network: use correct values for dhcp-modes
the change from v4->ipv4 happened 2015 in systemd commit cb9fc36a1211967e8c58b0502a26c42552ac8060 - so by now it should be
safe to replace it for all containers relying on systemd-networkd.