Thomas Lamprecht [Wed, 29 Sep 2021 17:45:23 +0000 (19:45 +0200)]
setup: drop copying binfmt qemu-static executable
The binfmt-support and qemu-user-static package setup the
`/proc/sys/fs/binfmt_misc/' entry with the "fix binary" `F` flag:
> The usual behaviour of binfmt_misc is to spawn the binary lazily
> when the misc format file is invoked. However, this doesn't work
> very well in the face of mount namespaces and changeroots, so the F
> mode opens the binary as soon as the emulation is installed and
> uses the opened image to spawn the emulator, meaning it is always
> available once installed, regardless of how the environment
> changes.
--
https://www.kernel.org/doc/html/latest/admin-guide/binfmt-misc.html
which seems to be enough to make it work. binfmt-support's changelog
has some indication that it can use the `F` flag since the version
shipped in Debian Buster (PVE 6), and this support was added before
that, which would explain the earlier need for it..
Drop it now and slowly roll it out, if somebody really is using this
obscure PVE feature and yells we can always revert/workaround it.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
if a volume is only referenced in the pending section of a config it was
previously not removed when removing the CT, unless the non-default
'remove unreferenced disks' option was enabled.
keeping track of volume IDs which we attempt to remove gets rid of false
warnings in case a volume is referenced both in the config and the
pending section, or multiple times in the config for other reasons.
Thomas Lamprecht [Tue, 28 Sep 2021 12:52:26 +0000 (14:52 +0200)]
setup: avoid one-argument bless
> Normally, bless takes two arguments: a reference to the referent
> that is to become the object, and a string naming the desired class
> of that object. However, the second argument is actually optional,
> and defaults to the current package name.
-- page 365 of Perl Best Practice, Convay.
That means that a inheriting module would get the wrong class due to
that, we do not really have that issue with Setup now, but copy-is-my
hobby would allow that error to infect other code ;-)
If one wants the default behavior should say so explicitly, e.g.:
bless { ... }, __PACKAGE__
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dominik Csapak [Wed, 4 Aug 2021 10:51:07 +0000 (12:51 +0200)]
add old config and unprivileged to check_ct_modify_config_perm
we'll need that for checking the features more granularly
for it to work correctly, we have to move the permission checks
into the 'lock_config' sub, since we now also need to check the current
config and it could change between the permission check and the lock
fix #3478: abort container creation on arch detection timeout
increased the timeout for detect_arch from 5 to 10 seconds.
until now, on any error detect_architecture would fall back to amd64.
to avoid falling back due to an timeout error this function now dies
on timeout errors.
additionally minor changes to the error messages have been made.
this is what's actually applied to the container (although
the container may be imposing an even stricter limit, but
that's not what we want to see...)
also, the v2 cpuset list may be empty (and often is for
unprivileged+nesting containers), which currently fails to
parse
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
prestart-hook: detect cgroupv2 incompatible systemd version
Some container OS (e.g. CentOS 7, Ubuntu 16.04) are booted with
systemd, in a version which is not able to run with a pure cgroupv2
(a.k.a unified hierarchy) environment.
Detect those in the lxc-pve-prestart-hook, because there we already
have all mount-points set up.
This approach only leaves syslog/journal as place for notifying the
user since starting a container eventually runs `systemctl start
pve-container@VMID.service`, where we lose the prints to stdout and
stderr.
The alternative of shortly mounting all container mounts just to
obtain the systemd-version, before starting the container seems
prohibitively expensive.
The heuristic of /sbin/init needing to be a link to something ending
in systemd is taken from the systemd documentation[0] and was verified
on a few of our container-templates.
With cgroupv2 we lose the default devices entries, which in
cgroupv1 results in the default inherited 'a *:* rwm', so
let's have lxc's cgroupv2 default do the same (iow. turn it
into a "deny-list").
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
fix #3421: allow custom storage plugins to support rootfs
it is now necessary for storages to support the 'rootdir' content in
order to start containers on them. all native storage plugins
already report the rootdir content correctly.
Fabian Ebner [Fri, 18 Jun 2021 10:59:32 +0000 (12:59 +0200)]
migrate: enforce that rootdir content type is available
and use it for the vdisk_list call too. This avoids scanning (and picking up
volumes from!) storages that are not even configured to hold container images.
Also serves a bit as a preparation to enforce content type on guest startup,
because now migration failure happens early and not only when trying to start
the guest on the remote node.
Fabian Ebner [Fri, 18 Jun 2021 10:59:30 +0000 (12:59 +0200)]
prefer storage_check_enabled over storage_check_node
storage_check_enabled simply checks for the 'disable' option and then calls
storage_check_node.
While not strictly necessary for a second call where only the storage differs,
it is more future-proof: if support for a target storage is added at some point,
it might be easy to miss adapting the call.
For the migration checks, disabled storages are now always caught.
Thomas Lamprecht [Fri, 18 Jun 2021 16:15:45 +0000 (18:15 +0200)]
clear machine-id: only truncate machine-id file if either it exists or systemd managed
Not nice to create empty /etc/machine-id files in, e.g., Alpine Linux
CTs.
The adaption of the else branch is not only an optimization to avoid
unlink call of non-existent file, but required as it not guaranteed
to be in the "no clone" case else anymore.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 18 Jun 2021 16:02:21 +0000 (18:02 +0200)]
setup: fix calling clone hook with weird params
$clone has no use and what the interface constraints on $conf vs.
$self->{conf} really are is nowhere documented, so just use $conf for
now, to at least use only one thing (and avoid the highly confusing
case where the signature suggests that $conf is used, so when one
would pass a to $self->{conf} unrelated $conf it would not work)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
we need to clone the firewall config before doing any actual work, else
we risk partially aborting and leaving a non-firewalled container
around. accordingly, we need to (attempt to) remove the cloned FW config
after successfully removing the guest config in error handling.
introduce a new helper handling
- obtaining the flock
- (re)loading the config
- checking that the 'create' lock is still there
before calling a passed-in sub with the current config, since this
pattern was used quite a lot here.
intentionally changed behaviour:
- flock is now held for the post_clone hook call
- failure to remove the 'create' lock or to move the config to the
target node if applicable will not undo the clone, since either is
trivially fixable ('pct unlock' or a no-op migration), and copying all
those volumes might have been quite expensive..
set_lock already obtains the flock (since it does a read-modify-write
cycle), and the rest of this code does not touch the config file in any
fashion so no need to hold the flock either..
Oguz Bektas [Thu, 17 Jun 2021 10:51:59 +0000 (12:51 +0200)]
clone_vm: improve config locking
cleaned up the locking situation with config files as Fabian G.
suggested in the review.
use the 'create_and_lock_config' helper in the beginning to ensure that
the target CTID is available, and that the target config is locked from
the beginning. in case any error happens during the initial checks, we
unlink this config in error handling.
firewall config is also now cloned inside the worker instead of before
the worker, in case the clone fails.
also lock the config file when renaming the conf (for moving to a target
node when the option is passed).
Fabian Ebner [Tue, 1 Jun 2021 06:43:05 +0000 (08:43 +0200)]
vm status: force int where appropriate
In the case of a running container with cgroupv2, swap would be a string,
causing a
size.toFixed is not a function
error for the format_size call in the containers's "Summary" page in the UI.
The vmids from config_list() are already integers as the return schema expects,
while the opt_vmid passed from the status/current API call needs to be
converted.
Thomas Lamprecht [Wed, 16 Jun 2021 14:12:39 +0000 (16:12 +0200)]
pct: exec, attach: drop "Error: " prefix from error message
we normally do not have that here, the load_config call (which
ensures that the CT exists) also errors without any "Error" like
prefix, so for consistency drop it.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Oguz Bektas [Tue, 6 Apr 2021 11:56:16 +0000 (13:56 +0200)]
pct: fix edge case for 'pct push' with root uid/gid
we should check if the variable is defined in the end (because root
uid:gid is 0:0, this causes perl to get confused and die, eventhough the
uid:gid was obtained correctly)