Fabian Ebner [Tue, 5 May 2020 08:27:15 +0000 (10:27 +0200)]
create_vm: avoid premature write_config caused by update_pct_config
by moving the write_config calls from vmconfig_*_pending to their
call sites. The single other call site for update_pct_config in
update_vm is also adapted.
The update_pct_config call lead to a write_config call and so the
configuration file was created before it was intended to be created.
When the CFS is updated in between the write_config call and the
PVE::Cluster::check_vmid_unused call in create_and_lock_config,
the container file would already exist and so creation would
fail after writing out a basically empty config.
Even worse, a race was possible for two containers created with the
same ID at the same time:
Assuming the initial PVE::Cluster::check_vmid_unused check in the
parameter verification passes for both create_vm calls, the later one
would potentially overwrite the earlier configuration file with its
update_pct_config call.
Additionally, the file read for $old_config was always the one written
by update_pct_config. Meaning that for a create_vm call with force=1,
already existing old volumes were not removed.
When creating an unprivileged container with CentOS 6 (which will be EOL in
Nov 2020 [0]) the console does not work.
The problem is mitigated by adding the --nohangup argument to the mingetty
invocations during bootup (in /etc/init/tty.conf).
The idea for the fix is based on the legacy template builder-scripts from
lxc:
https://github.com/lxc/lxc-templates/blob/master/templates/lxc-centos.in#L308
Since '/etc/init/tty.conf' is only written during container creation/restore
and since it is guarded to CentOS versions < 7, the potential for regression
should be rather small.
Tested by creating an unprivileged and a privileged CentOS6 container and
with nesting enabled and disabled for both - the console showed up in
all cases with this fix.
And use StandardOutput/Error=null, so we can use
`Type=simple`. Because using `Type=forking` has become more
difficult with systemd & upstream lxc's cgroup layout
changes. This seems to be the path of least resistance.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Fabian Ebner [Mon, 23 Mar 2020 11:18:51 +0000 (12:18 +0100)]
For clone+copy features, make sure a valid format for the target is supported
using the new option valid_target_formats. This is
necessary, because clone_image can result in a qcow2 image
being created (on directory based storages) which is not
valid for LXC.
Thomas Lamprecht [Wed, 18 Mar 2020 09:46:17 +0000 (10:46 +0100)]
lxc_config: mount /sys as mixed for unprivileged by default
CONTAINER_INTERFACE[0] is something systemd people call their API and
we need to adapt to it a bit, even if it means doing stupid
unnecessary things, as else systemd decides to regress and suddenly
break network stack in CT after an upgrade[1].
This mounts the parent /sys as mixed, which is:
> mount /sys as read-only but with /sys/devices/virtual/net writable.
-- man 5 lxc.container.conf
Allow users to overwrite that with a features knob, as surely some
run into other issues else and manually adding a "lxc.mount.auto"
entry in the container .conf is not an nice user experience for most.
Fixes the system regression in up to date Arch installations
introduced by[2].
Fabian Ebner [Tue, 18 Feb 2020 11:31:22 +0000 (12:31 +0100)]
Fix mounting ZFS snapshots whose dataset is not mounted below '/'
Trying to back up a container with a ZFS dataset with non-standard mount
would fail, see [0].
This also removes the near-dead code
$name .= "\@$snapname";
when snapname is false-y, but defined and turns
the check for $snapname into one for definedness.
Thomas Lamprecht [Wed, 19 Feb 2020 16:41:49 +0000 (17:41 +0100)]
backup prepare: remove useless "activate volumes"
As the actual stop of the CT happened after VZDump called the prepare
step, the volume activation was undone again.
commit 00cc04160351f0034c5349d208e59a5f46d8ee33 improved that by
doing the activate now in the archive step when colleting the
moutpoints to backup, so drop it here for good.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Oguz Bektas [Tue, 18 Feb 2020 13:38:52 +0000 (14:38 +0100)]
fix #2598: activate volumes before mounting in stop mode backup
'stop' mode deactivates the volumes (relevant for LVM backend), and
they're not reactivated before trying to mount them for backup.
reactivating the volumes before the mount in 'stop' mode backup solves
the issue.
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com> Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Oguz Bektas [Wed, 5 Feb 2020 14:03:29 +0000 (15:03 +0100)]
apply_pending: do cleanup pending between, not during, change/delete loop
instead of calling it while iterating, inbetween the loops is a
better place in terms of similarity with qemu-server side, while also
fixing the bug that Dominik found[0]:
> when setting a netX option that is semantically the same as the one
> already set but in a different order, e.g.:
>
> in config:
> net0: name=eth0,bridge=vmbr0,hwaddr=AA:AA:AA:AA:AA:AA,type=veth
> setting via api:
> net0: bridge=vmbr0,name=eth0,hwaddr=AA:AA:AA:AA:AA:AA,type=veth
>
> the code tries to 'hot-apply' the change (which is no change
> really) where the api line then gets parsed and printed which
> results in the same string already in the config
>
> then we do a 'cleanup_pending' which removes it from pending, since
> the config already contains the exact same options, but then we
> overwrite the config from pending (which is empty) resulting in an
> invalid config line:
> --8<--
> net0:
> -->8--
Avoid this by only calling the cleanup pending change outside the
loop, it makes no sense to loop over the whole config on each pending
property change and pending delete.
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com> Tested-By: Dominik Csapak <d.csapak@proxmox.com>
[ Thomas: adapted commit message with some extra info ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Fri, 31 Jan 2020 15:24:30 +0000 (16:24 +0100)]
d/control: depend on pve-lxc-syscalld
It's a really small daemon doing nothing if not in use, and only
requiring < 1M of disk space and ~2M of memory (and one can always
stop the service if not wanted)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[ Thomas: use new helper from common ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This causes char and blockdev mknod() and mknodat() calls to
be forwarded to the seccomp proxy, so unprivileged
containers can finally create /dev/null by themselves.
For now this is experimental and therefore added to
`features`. Ideally, if this works as intended, we can make
it the default in pve 7.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Thomas Lamprecht [Tue, 21 Jan 2020 07:55:04 +0000 (08:55 +0100)]
fsck: do is-CT-running check earlier
besides the fact that it makes sense to check that early it avoids
also uncleaned side-effect, like a mapped RBD volume which did not
get unmapped again due to this check dying.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
When starting via 'lxc-start' from the CLI the prestart hook
ended up mounting relative to the current working dir, so
the container refused to start and we created a bunch of
useless `var` directories.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>