Thomas Lamprecht [Sun, 20 Nov 2022 15:32:04 +0000 (16:32 +0100)]
network: let the common tap-plug helper add fdb entries
Avoids trying to append some on OVS ports or the like, which won't
work with the bridge util, so let the common tap-plug helper add fdb
entries, if needed _and_ supported.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Sat, 19 Nov 2022 17:12:29 +0000 (18:12 +0100)]
setup: fix using non-plugin methods
ct_is_symlink and ct_readlink_recursive are not defined in
PVE::LXC::Setup::Plugin and thus not available for call in
PVE::LXC::Setup, thus it broke unmanaged CTs which does not descends
from the Base module, put from the abstract Plugin directly to avoid
touching its CTs at all (well, it's unmanaged)
We'd either need to add those symlink helpers to the abstract plugin
or, like we do now, add a new more general get_ct_init_path which
unmanaged can truthfully implement.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Sat, 19 Nov 2022 10:35:38 +0000 (11:35 +0100)]
fix #4355: d/control: depend on binutils to ensure objdump is available
Reported both in BZ and the forum, with the latter posting the
output of `pct start <vmid> --debug` it quickly became obvious that
we miss the binutils dependency here, maybe we can drop that in the
future again by simply parsing the ELF header in rust and use perlmod
but as stop gap for now just ensure that we actually got the tools
available we want to use..
While the template has systemd-networkd enabled, the lack of
/etc/machine-id causes systemd to revert to its "preset",
where now in
/usr/lib/systemd/system-preset/90-default.preset
fedora disables systemd-networkd in favor of NetworkManager.
Without this patch, the first boot of a fresh fedora 37
container would disable networking requiring a
`systemctl enable systemd-networkd` from within the
container once, after which it sticks around (until
/etc/machine-id is deleted).
This patch provides an
`/etc/systemd/system-preset/00-pve.preset` file to keep
systemd-networkd enabled via the `template_fixup` hook.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
This patch reworks some mtu settings for LXC containers in the backend
Namely, introducing an absolute maximum for the MTU field of 65535 and
asserting that the MTU setting isn't bigger than the bridge's MTU size
Signed-off-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com>
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
[ T: adapt to iface learning-disable being now auto-detected ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
for bullseye-based systems, the 'fs.protected_regular'[0] sysctl is set
to '2' by default[1] (as opposed to the old value of '0'). this breaks
rsync's `--inplace` mode for such protected files, since opening them
with O_CREAT is not even possible for the root user anymore.
one example in the wild are debian (-based) containers using PHP, where
the session dir '/var/lib/php/sessions' is sticky, world-writable, owned
by root and contains sessions files usually owned by www-data. if any of
these session files are modified between the first and second rsync run,
the second run and thus the backup will fail.
the downside of this change is that containers with large files that are
updated between the first and second run will now see more (temp) space
usage - but suspend mode is not space efficient anyway and such setups
should consider switching to snapshot mode anyway.
additionaly, this commit drops the now no longer needed $first parameter
previously used to decide between different parameters for first and
second rsync run.
Leo Nunner [Thu, 15 Sep 2022 11:52:28 +0000 (13:52 +0200)]
fix #4192: revamp check for systemd version
Instead of iterating through several folders, it might just be easier to
check the objdump output of /sbin/init and getting the version from there.
Resolving the /sbin/init symlink happens inside the chroot, but the
objdump from the host system is used, as to not run any untrusted
executables.
Fiona Ebner [Fri, 7 Oct 2022 12:41:47 +0000 (14:41 +0200)]
api: create/update vm: clamp cpu unit value
While the clamping already happens before setting the actual
cpu.weight lxc config key, it can be done here too, to avoid writing
new out-of-range values into the config.
Can't use a validator enforcing this, because existing out-of-range
values should not become errors on parsing the config.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Fiona Ebner [Fri, 7 Oct 2022 12:41:44 +0000 (14:41 +0200)]
use helper from common for cpu units/shares
to make behavior more consistent with what we do for VMs. The helper
will clamp the value as needed, rather than dying.
Allows starting existing containers with an out-of-range (for the
relevant cgroup version) value. It's also possible to end up with
out-of-range values via update/create API.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Fiona Ebner [Fri, 7 Oct 2022 12:41:42 +0000 (14:41 +0200)]
change cpu shares: hard-code cgroupv1 default parameter
so that the description of the default can be changed to reflect that
it depends on cgroup version.
Not strictly necessary, because the function currently will ignore the
value anyways. But certainly more future-proof then starting to pass
something invalid.
Until recently perl did not care as those things are only checked
_somewhat_ on "compile" (module load) times, and the one (single?)
call site in PVE::LXC::Config missed the `use PVE::LXC` statement,
and so the module-load did not see the wrong prototype and thus did
not cared, on runtime all is different anyway (what a mess).
The recent commit 11066f6bfdca5225a6f872d5664e6637ccb58dd6 added that
use statement and made package compilation implode, almost like
spooky actions in the time-space distance...
Fixes: b2de4c048ee50094593f4f8ffd18b6c346f7157a Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Fiona Ebner [Mon, 8 Aug 2022 12:36:42 +0000 (14:36 +0200)]
apply pending mountpoint: also hotplug non-volume mount points
Previously, bind and device mount points were applied to the
configuration, but not actually hot-plugged/mounted, causing a
mismatch for running containers.
Reported in the community forum:
https://forum.proxmox.com/threads/113364/
Oguz Bektas [Tue, 19 Jul 2022 11:24:56 +0000 (13:24 +0200)]
fix #4164: use DHCP=yes instead of DHCP=both in systemd-networkd config
"both" option is deprecated, this gets rid of the warning in the journal
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
[Note: 'yes' was introduced with v219 in 2015, deprecated with v242] Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
fix: cloning a locked container creates an empty config
When an attempt was made to clone a locked container the API would
correctly present the error 'CT is locked (disk)' but create the
config files for the new container anyway.
There was also a potential problem when the config of the new ct would
already be present and the creation of the container failed. In this
case the config of the new CT would be incorrectly removed.
The config locks for the new and the old configs should now be
correctly released depending on from which call a problem originates.
Futhermore, I moved some related function calls into the eval block to
avoid similar problems with leftover config files in the future.
Signed-off-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com>
Dominik Csapak [Wed, 4 May 2022 08:15:02 +0000 (10:15 +0200)]
move_volume: call deactivate volume for the old volid in any case
not only when we want to remove it. Otherwise, if the old volume is
mapped (e.g. ceph krbd), we don't unmap it when we're finished.
We have to save if we deactivated successfully before attempting to
remove it. If it was not removed (either because we could not
deactivate, or the remove failed), we add it back as unused.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Dominik Csapak [Tue, 3 May 2022 09:42:26 +0000 (11:42 +0200)]
prestart & poststop hook: init REST environment, e.g. for storage activation
Initialize the basic CLI REST environment which is expected on some
PVE methods we may rely on.
This became a specific problem recently when adding better support
for external and/or multiple ceph RBD clusters on a PVE system in
commit cfe46e2d4a97a83f1bbe6ad656e6416399309ba2 from pve-storage,
which added a PVE::Rados call to get the underlying cluster FSID
required to build the /dev-mapped RBD path, and PVE::Rados
requires a initialized RPC/REST environment.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
While NixOS generally overrides any static contents in /etc/hostname
with the hostname defined in `networking.hostname`, it can use the
contents of `/etc/hostname` provided by PVE if this option is not
set.
Signed-off-by: Harikrishnan R <rharikrishnan95@gmail.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Stefan Sterz [Thu, 24 Feb 2022 14:21:50 +0000 (15:21 +0100)]
parse pct config: remove "\s*" from multi-line comment regex
To be consistent with PBS's implementation of multi-line comments
remove "\s*" here too. Since the regex isn't lazy .* matches
everything \s* would anyway. (Note that new lines occurs after "$").
there were two helpers that were not handling this correctly:
ct_make_path
since this never gets called with $opts, and there also is no 'owner'
and 'group' in $self, the previous logic could never work, sometimes
leaving nobody:nogroup files around for unprivileged containers.
since only the centos and suse plugins use this helper, the issue was
fairly limited.
ct_symlink
could create symlinks owned by nobody:nogroup. since symlinks are
created 777 by default, this just meant they were not modifiable inside
the container, but reading/dereferencing was no problem so it went
unnoticed so far.
Markus Frank [Fri, 11 Mar 2022 11:59:57 +0000 (12:59 +0100)]
fix #3917: Ignore fstrim failure in pct fstrim
With "noerr => 1" the function does not abort, when one of the
mountpoints is not fstrim compatible like zfs (has its own trim).
I do not think it is necessary to warn or error, because fstrim
tells when something is not trimmable and aborts.
make it more explicit (the whole call to the plugin's cleanup sub is
wrapped in an eval + warn anyway), so that future extensions can be
added after this point if they don't rely on snapshot removal being
successful.
Fabian Ebner [Wed, 23 Feb 2022 12:03:58 +0000 (13:03 +0100)]
fix #3424: api: snapshot delete: wait for active replication
A to-be-deleted snapshot might be actively used by replication,
resulting in a not (or only partially) removed snapshot and locked
(snapshot-delete) container. Simply wait a few seconds for any ongoing
replication.
Fabian Ebner [Wed, 23 Feb 2022 12:03:57 +0000 (13:03 +0100)]
partially fix #3424: vzdump: cleanup: wait for active replication
As replication and backup can happen at the same time, the vzdump
snapshot might be actively used by replication when backup tries
to cleanup, resulting in a not (or only partially) removed snapshot
and locked (snapshot-delete) container.
Wait up to 10 minutes for any ongoing replication. If replication
doesn't finish in time, the fact that there is no attempt to remove
the snapshot means that there's no risk for the container to end up in
a locked state. And the beginning of the next backup will force remove
the left-over snapshot, which will very likely succeed even at the
storage layer, because the replication really should be done by then
(subsequent replications shouldn't matter as they don't need to
re-transfer the vzdump snapshot).
Distro detection is done heuristically through the presence of a
`/nix/store` folder.
NixOS typically uses a script-based network configuration system that
isn't easy to configure from the outside, while the configuration
snippets would be simple to generate, bringing them in effect isn't.
LXC templates generated for proxmox are instead expected to use
systemd-networkd.
Signed-off-by: Harikrishnan R <rharikrishnan95@gmail.com>
[ Thomas: update/reword commit ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
the config is now updated anyway because of target-storage support, so
volume renaming is both 'free' and improves the chances of migration
with and without changing storages actually works successfully.
Dominik Csapak [Fri, 22 Oct 2021 06:44:13 +0000 (08:44 +0200)]
fix #3635: fix overly-strict pool permission check on create
we do not need Permissions.Modify on the pool as the actual required
check for 'VM.Allocate' for that pool is already handled below, so
remove it like we did in qemu-server 4fc5242 ("fix pool permission
checks on create")
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> Tested-by: Mira Limbeck <m.limbeck@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Fabian Ebner [Thu, 13 Jan 2022 11:04:04 +0000 (12:04 +0100)]
config: parse_volume: don't die when noerr is set
AFAICT, the only existing callers using noerr=1 are in
__snapshot_delete_remove_drive, and in AbstractConfig's
foreach_volume_full. The former should not be affected, as unknown
keys should never make their way in there. For the latter, it makes
iterating with
$opts = { extra_keys => ['vmstate'] }
possible while being agnostic of guest type. Previously, it would die
for LXC configs, but now the unknown key is simply skipped there.
Oguz Bektas [Wed, 1 Dec 2021 15:17:56 +0000 (16:17 +0100)]
config: allow 'lazytime' mount option for containers
worked fine here in ubuntu container.
root@CT1022:/# mount | grep lazy
/var/lib/pve/local-btrfs/images/1022/vm-1022-disk-0/disk.raw on / type ext4 (rw,relatime,lazytime)
/var/lib/pve/local-btrfs/images/1022/vm-1022-disk-0/disk.raw on /snap type ext4 (rw,relatime,lazytime)
requested in community forum [0]
[0]: https://forum.proxmox.com/threads/100454/
Tested-by: Dylan Whyte <d.whyte@proxmox.com> Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
with `storage` being optional (and not allowed for reassign operations),
the ACL path in the schema can end up as `/storage/-`, which is wrong.
replace it with an explicit check:
- target `storage` for move mp
- storage from source disk for reassign mp (we only rename here, but
it's still a new volume on that storage after all)