prestart-hook: detect cgroupv2 incompatible systemd version
Some container OS (e.g. CentOS 7, Ubuntu 16.04) are booted with
systemd, in a version which is not able to run with a pure cgroupv2
(a.k.a unified hierarchy) environment.
Detect those in the lxc-pve-prestart-hook, because there we already
have all mount-points set up.
This approach only leaves syslog/journal as place for notifying the
user since starting a container eventually runs `systemctl start
pve-container@VMID.service`, where we lose the prints to stdout and
stderr.
The alternative of shortly mounting all container mounts just to
obtain the systemd-version, before starting the container seems
prohibitively expensive.
The heuristic of /sbin/init needing to be a link to something ending
in systemd is taken from the systemd documentation[0] and was verified
on a few of our container-templates.
With cgroupv2 we lose the default devices entries, which in
cgroupv1 results in the default inherited 'a *:* rwm', so
let's have lxc's cgroupv2 default do the same (iow. turn it
into a "deny-list").
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
fix #3421: allow custom storage plugins to support rootfs
it is now necessary for storages to support the 'rootdir' content in
order to start containers on them. all native storage plugins
already report the rootdir content correctly.
Fabian Ebner [Fri, 18 Jun 2021 10:59:32 +0000 (12:59 +0200)]
migrate: enforce that rootdir content type is available
and use it for the vdisk_list call too. This avoids scanning (and picking up
volumes from!) storages that are not even configured to hold container images.
Also serves a bit as a preparation to enforce content type on guest startup,
because now migration failure happens early and not only when trying to start
the guest on the remote node.
Fabian Ebner [Fri, 18 Jun 2021 10:59:30 +0000 (12:59 +0200)]
prefer storage_check_enabled over storage_check_node
storage_check_enabled simply checks for the 'disable' option and then calls
storage_check_node.
While not strictly necessary for a second call where only the storage differs,
it is more future-proof: if support for a target storage is added at some point,
it might be easy to miss adapting the call.
For the migration checks, disabled storages are now always caught.
Thomas Lamprecht [Fri, 18 Jun 2021 16:15:45 +0000 (18:15 +0200)]
clear machine-id: only truncate machine-id file if either it exists or systemd managed
Not nice to create empty /etc/machine-id files in, e.g., Alpine Linux
CTs.
The adaption of the else branch is not only an optimization to avoid
unlink call of non-existent file, but required as it not guaranteed
to be in the "no clone" case else anymore.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 18 Jun 2021 16:02:21 +0000 (18:02 +0200)]
setup: fix calling clone hook with weird params
$clone has no use and what the interface constraints on $conf vs.
$self->{conf} really are is nowhere documented, so just use $conf for
now, to at least use only one thing (and avoid the highly confusing
case where the signature suggests that $conf is used, so when one
would pass a to $self->{conf} unrelated $conf it would not work)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
we need to clone the firewall config before doing any actual work, else
we risk partially aborting and leaving a non-firewalled container
around. accordingly, we need to (attempt to) remove the cloned FW config
after successfully removing the guest config in error handling.
introduce a new helper handling
- obtaining the flock
- (re)loading the config
- checking that the 'create' lock is still there
before calling a passed-in sub with the current config, since this
pattern was used quite a lot here.
intentionally changed behaviour:
- flock is now held for the post_clone hook call
- failure to remove the 'create' lock or to move the config to the
target node if applicable will not undo the clone, since either is
trivially fixable ('pct unlock' or a no-op migration), and copying all
those volumes might have been quite expensive..
set_lock already obtains the flock (since it does a read-modify-write
cycle), and the rest of this code does not touch the config file in any
fashion so no need to hold the flock either..
Oguz Bektas [Thu, 17 Jun 2021 10:51:59 +0000 (12:51 +0200)]
clone_vm: improve config locking
cleaned up the locking situation with config files as Fabian G.
suggested in the review.
use the 'create_and_lock_config' helper in the beginning to ensure that
the target CTID is available, and that the target config is locked from
the beginning. in case any error happens during the initial checks, we
unlink this config in error handling.
firewall config is also now cloned inside the worker instead of before
the worker, in case the clone fails.
also lock the config file when renaming the conf (for moving to a target
node when the option is passed).
Fabian Ebner [Tue, 1 Jun 2021 06:43:05 +0000 (08:43 +0200)]
vm status: force int where appropriate
In the case of a running container with cgroupv2, swap would be a string,
causing a
size.toFixed is not a function
error for the format_size call in the containers's "Summary" page in the UI.
The vmids from config_list() are already integers as the return schema expects,
while the opt_vmid passed from the status/current API call needs to be
converted.
Thomas Lamprecht [Wed, 16 Jun 2021 14:12:39 +0000 (16:12 +0200)]
pct: exec, attach: drop "Error: " prefix from error message
we normally do not have that here, the load_config call (which
ensures that the CT exists) also errors without any "Error" like
prefix, so for consistency drop it.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Oguz Bektas [Tue, 6 Apr 2021 11:56:16 +0000 (13:56 +0200)]
pct: fix edge case for 'pct push' with root uid/gid
we should check if the variable is defined in the end (because root
uid:gid is 0:0, this causes perl to get confused and die, eventhough the
uid:gid was obtained correctly)
Fabian Ebner [Thu, 11 Mar 2021 10:26:50 +0000 (11:26 +0100)]
vmstatus: make lock property optional again
Commit d02262048cbbe91ca8b12f98e3dc7bbab28e4c64 made the property de-facto
non-optional. Partially revert this and instead adapt the printing, making the
behavior match the API description again. The conditional assignment is
already there further down the vmstatus function.
Fabian Ebner [Thu, 11 Mar 2021 10:26:49 +0000 (11:26 +0100)]
config: parse: also allow empty values
because they are valid for '-list' formats and it makes the behavior match with
what we do for VM configs. The new pattern is the same that is used for VM
configs. Because it is a non-greedy pattern, trailing whitespaces will not be
included in the value anymore. This /should/ cause no problems and the '\s*$'
at the end suggests that that is how it was intended in the first place.
Oguz Bektas [Thu, 25 Feb 2021 14:11:16 +0000 (15:11 +0100)]
fix #3313: restore: keep unprivileged status from archive config
Since pct defaults to privileged containers, it restores the
container as privileged when `--unprivileged 1` is not passed.
Instead we should check the old configuration and retrieve it from
there.
This way, when one creates an unprivileged container, it will be
still be unprivileged after restore, if not overwritten by API
arguments.
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Making the CT and VM API more stream lined. But, we do not use the
same dangerous default than the VM API does, as we only have it there
for backward compatibility.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Mon, 25 Jan 2021 19:15:24 +0000 (20:15 +0100)]
mkfs: make less noisy
Easiest and cleanest would be to pass the -q quiet parameter, but
that drops also possible relevant information when rescuing such a
filesystem (super block backup positions, UUID, ...)
Will let thorugh something like:
> Creating filesystem with 262144 4k blocks and 65536 inodes
> Filesystem UUID: 3a6f3548-baf6-45fa-93d2-b61212668d23
> Superblock backups stored on blocks:
> 32768, 98304, 163840, 229376
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Fabian Ebner [Fri, 20 Nov 2020 14:50:45 +0000 (15:50 +0100)]
vzdump: pass along exclude patterns to proxmox-backup-client
to make the behavior consistent across modes.
Previously vzdump's exclude-path option only had an effect for suspend mode
backups, as then the exclusion already happens when rsync copies the data
during an earlier stage in the backup.
Fabian Ebner [Fri, 20 Nov 2020 14:50:44 +0000 (15:50 +0100)]
vzdump: allow relative exclude patterns for snapshot and stop mode
to make the behavior consistent across modes.
For suspend mode, relative patterns worked for a long time, because the
exclusion already happens when rsync copies the data during an earlier stage of
the backup.
For the other two methods, the way the patterns are passed to tar (after the
'--anchored' option and prefixed with a dot) meant that relative patterns
had no effect previously.
Users which have a relative exclude path by accident (if it's not by accident
then this fixes the behavior) and did not use suspend mode (if they did use
suspend mode, they hopefully would have noticed the unintended exclusion then)
will be affected by this change.
Stoiko Ivanov [Mon, 23 Nov 2020 10:12:29 +0000 (11:12 +0100)]
fix #3161: snapshot creation: only check volumes for fsfreeze
When considering mountpoints for running 'fsfreeze' before snapshot
creation, commit 8463099d99273561c46398bf02206b4d9d431bc5 did not
only consider volumes created by our storage-stack, but also
bindmounts and devmounts (directly mounting a blockdevice).
This led to PVE::Storage::parse_volume_id failing on those
mountpoints.
Since the fsfreeze call is best-effort and only run for specific
storageplugins, we can simply skip non-volume mountpoints, when
gathering the list of volumes to call fsfreeze on.
Stoiko Ivanov [Fri, 6 Nov 2020 14:19:42 +0000 (15:19 +0100)]
snapshot creation: fsfreeze mountpoints, if needed
fixes #2991, #2528.
creating a snapshot with rbd, after the syncfs finished successfully does not
guarantee that the snapshot has the state of the filesystem after syncfs.
suggestion taken from #2528 (running fsfreeze -f/-u before snapshotting on
the mountpoints)
added helper PVE::Storage::volume_snapshot_needs_fsfreeze, to indicate
which volumes need to be frozen/thawed. (and mocked it in the tests here).
Added the freeze to sync_container_namespace, since it needs to run inside the
container's mount namespace.
unfreezing happens in a sub of its own.
tests in #2991 seem to indicate that this helps to successfully create backups.
Stoiko Ivanov [Fri, 6 Nov 2020 14:19:41 +0000 (15:19 +0100)]
add fsfreeze helper:
fsfreeze_mountpoint issues the same ioctls as fsfreeze(8) on the provided
directory (the $thaw parameter deciding between '--freeze' and '--unfreeze')
This is used for container backups on RBD, where snapshots on containers,
which are heavy on IO, are not mountable readonly, because the ext4 is not
consistent.
Needed to fix #2991 and #2528.
The ioctl numbers were found via strace -X verbose (and verified with the
kernel documentation).