Fabian Ebner [Tue, 3 Dec 2019 08:31:28 +0000 (09:31 +0100)]
Always determine the size of the volume in volume_rescan
Otherwise there is an issue when resizing a volume with pending changes:
1. Have a running container with a mount point
2. Edit the mount point and change the path
3. Resize the mount point
4. Reboot the container
Result: the old size is written to the config.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> Tested-by: Oguz Bektas <o.bektas@proxmox.com>
fix #2512: post-stop: unmount stage mps before cleanup
With staged mount points we now have mount points also
mounted in our staging temp directory, and we keep them
there in order to prevent hotplugged mounts (which can be
unmounted by the container) to disconnect from their loop
devices, so we need to clean those up as well before we can
run any cleanups.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
We still passed the target mount path to bindmount() causing
bindmount_verify() to fail. Fix this by assuming '/' as the
in-container target mount path when staging, as we mount
onto the $rootdir instead.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Oguz Bektas [Thu, 21 Nov 2019 16:48:06 +0000 (17:48 +0100)]
apply pending changes in lxc poststop hook
apply pending changes after container is stopped (via API or systemctl), and
update lxc config.
also affects reboots from inside the container. (but in that case we don't try
to update_lxc_config again if pending changes were already applied and lxc config
was updated)
The prestart hook is executed by lxc, that is *after* it
loaded the config, so any pending changes which involve
updates to /var/lib/lxc/$vmid/config won't have any actual
effect: seccomp profile, apparmor profile changes, cgroup
related settings, newly added network devices, ...
prestart-hook: use staged mountpoints on newer kernels
This way we operate on defined paths in the monitor
namespace (/run/pve/mountpoint/{rootfs,mp0,mp1,...}) while
performing the mount, and can use `move_mount()` without
passing the MOVE_MOUNT_T_SYMLINKS flag when putting the
hierarchy in place.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Staging a mount point requires the new kernel mount API and
will mount the volume at a fixed path, then use open_tree()
to "pick it up" into a file descriptor.
For most of our volumes we wouldn't need the temp directory,
but some things cannot be handled with _only_ the new API
(like single-step read-only bind mounts). Additionally, the
'mount' command figures out file systems automatically and
has a bunch of helpers we'd need to reimplement, so instead,
go through our usual mount code and then pick up the result.
This can then be used to implement mount point hotplugging,
as with the open file descriptor we can move into the
container's namespace and issue a `move_mount()` to put the
mount point in place in the running container.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Oguz Bektas [Wed, 6 Nov 2019 14:58:55 +0000 (15:58 +0100)]
fix #2453: actually reflect random MAC address selection in config
When creating/changing the network interface of a container, the
parse_lxc_network can have side-effects, e.g., it adds a new random
MAC hwaddr if the netX format-string did not had any. Thus, we need
to call print_lxc_network again in order to have the correct,
up-to-date, property string in the config file.
Apparently this was a regression introduced with the pending changes
series.
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
We currently don't depend on a particular version, although
in the future we may want to enforce a minimum (at which
point we'll need more than just a whitelist entry for this,
but right now this will do...)
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
We now get rid of all the PVE::CLIHandler baggage which
reduces the code a lot. It is also not compatible with the
new lxc.hook.version=1 method of hooks!
The new helper is specific to lxc hooks and supports both
current `lxc.hook.version`s.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Stefan Reiter [Mon, 28 Oct 2019 11:59:14 +0000 (12:59 +0100)]
setup: do host architecture translation ourself
This was done by the PVE:Tools backed get_host_arch method, but as we
were the only user of that specific translation and it's quite LXC
related it makes more sense to do it here. This also allows reuse of
the PVE::Tools function.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Christian Ebner [Tue, 15 Oct 2019 11:00:24 +0000 (13:00 +0200)]
fix #1291: add option purge for destroy_vm api call
When destroying a CT, we intentionally did not remove all related
configs such as backup or replication jobs.
The intention of this flag is to allow the removal of references to
the VM being removed from such configs on destroy.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Oguz Bektas [Mon, 14 Oct 2019 08:28:51 +0000 (10:28 +0200)]
implement pending changes
previous behaviour directly applied the possible config changes, and
died when there was something which can't be applied while CT is
running.
instead, we now write all the changes directly into the config pending
section, and then apply or hotplug the changes depending on whether CT
is running. the non-hotpluggable changes are left as pending changes.
Oguz Bektas [Mon, 14 Oct 2019 08:28:49 +0000 (10:28 +0200)]
add vmconfig_hotplug_pending and vmconfig_apply_pending
vmconfig_hotplug_pending is responsible for checking if a key/value pair
in the pending section can be hotpugged, if yes; perform a generic
replace, or perform specific actions for hotplugging the special cases.
vmconfig_apply_pending is only supposed to be called when ct isn't live.
Oguz Bektas [Mon, 14 Oct 2019 08:28:46 +0000 (10:28 +0200)]
skip pending changes while taking backup
we can only clone the current state of container (without pending
changes), as otherwise the on-disk state might not match the
configuration. this also makes it more consistent to qemu-server
behavior.
Oguz Bektas [Mon, 14 Oct 2019 08:28:44 +0000 (10:28 +0200)]
api: config: use shared guesthelpers in GET call
since containers can also have pending changes now, we need a method to
get the current applied config as well as the one with the pending
changes inside. this makes the GET config api more consistent with
qemu-server's by reusing load_current_config and load_snapshot_config from
AbstractConfig.
to decide which method to call, we look at the parameters.
Oguz Bektas [Mon, 14 Oct 2019 08:28:41 +0000 (10:28 +0200)]
adapt CT config parser for pending changes
config parser can now read/write [pve:pending] section. this was named
such, instead of [PENDING], after on- and offline discussion regarding
namespacing the pending section and snapshots.
Between calling destroy_lxc_container and removing the ID from
user.cfg (remove_vm_access) creating a new CT with this ID was
possible. CTs could go missing from pools as a consequence.
unlinking must happen at the very end of the deletion
process to avoid that other nodes use the ID in the meanwhile
Further lock the config after the VM was destroyed with a config lock
named, well, destroyed. This way it's easy to know that the CT was
destroyed but has still the config skelleton and FW, access etc.
stuff possible left over.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Oguz Bektas [Mon, 14 Oct 2019 08:28:47 +0000 (10:28 +0200)]
prepend underscores for is_volume_in_use private helper
this helper was defined twice, once as 'my $is_volume_in_use' sub and
second as a helper sub. as our other helpers with a similar structure,
it is better to prepend the variable sub with two underscores.
Oguz Bektas [Fri, 13 Sep 2019 10:35:57 +0000 (12:35 +0200)]
fix issue where ttys aren't correctly set after restore
restore from unpriv to priv causes a problem with the log-in from web
console, since the /etc/securetty file isn't modified after a restore to
reflect that change (/dev/lxc/tty1 and so on).
template_fixup is normally called in post_create_hook, but we have no
$password or $ssh_keys to call the hook with during the restore. instead
we call template_fixup by itself to fix the ttys on some distributions.
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Oguz Bektas [Mon, 26 Aug 2019 14:06:32 +0000 (16:06 +0200)]
don't leave fstrim lock if mount_all fails
when a container has a mountpoint which can't be mounted for some
reason, mount_all dies and the fstrim lock stays. prevent this by
moving the call into eval, warn if any error occurs.
Still try to unmount all already mounted MPs so that nothing blocking
remains left.
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Tue, 27 Aug 2019 16:49:01 +0000 (18:49 +0200)]
setup: allow CentOS 5 and CentOS 8
One is in the extended support phase, it should not be used but
people report that the CentOS 6 code path works just fine, so why
not...
The other is for the upcoming CentOS 8, while not fully testable for
compatibility yet, CentOS 7 code path should do the trick, else
we'll need to adapt it anyway, so see this as experimental
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
mountpoints: create parent dirs with correct owner
otherwise unprivileged containers might end up with directories that
they cannot modify since they are owned by the user root in the host
namespace, instead of root inside the container.
note: the problematic behaviour is only exhibited when an intermediate
directory needs to be created, e.g. a mountpoint /test/mp gets mounted,
and /test does not yet exist.
Thomas Lamprecht [Fri, 19 Jul 2019 13:42:13 +0000 (15:42 +0200)]
debian: bump compat to 12 and don't restart container.slice
since compat 10 the restart is default, as I want to use
'dh_installsystemd' (vs 'dh_systemd_start') I need at least compat
level 11, so go for the now recommended compat level 12.
diffoscope tells me that the main change us the wanted:
./postinst
> @@ -1,10 +1,15 @@
> #!/bin/sh
> set -e
> -# Automatically added by dh_systemd_start/12.1.1
> +# Automatically added by dh_installsystemd/12.1.1
> if [ "$1" = "configure" ] || [ "$1" = "abort-upgrade" ] || [ "$1" = "abort-deconfigure" ] || [ "$1" = "abort-remove" ] ; then
> if [ -d /run/systemd/system ]; then
> systemctl --system daemon-reload >/dev/null || true
> - if [ -n "$2" ]; then
> - _dh_action=restart
> - else
> - _dh_action=start
> - fi
> - deb-systemd-invoke $_dh_action 'system-pve\x2dcontainer.slice' >/dev/null || true
> + deb-systemd-invoke start 'system-pve\x2dcontainer.slice' >/dev/null || true
> fi
> fi
> # End automatically added section
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Wed, 17 Jul 2019 10:07:40 +0000 (12:07 +0200)]
setup getty: ensure the getty.target is not masked
some distro templates have this masked by default, it makes sense to
always ensure that it can work, a CT admin can still prevent this by
using the .pve-ignore.$file mechanism.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Acked-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Thomas Lamprecht [Thu, 18 Jul 2019 15:17:17 +0000 (17:17 +0200)]
setup getty: drop now obsolete setup_systemd_console
The setup_container_getty_service can now handle also old
getty@.service if the newer container-getty@.service is not
available. So drop, and convert the two remaining users to calling
the now compatible setup_container_getty_service
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Acked-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Some recent distributions running as a LXC container eat the relative
low default limits up very fast. Thus increase all those
(semi-related) limits by a factor of 512. This was chosen by using
one of our bigger know CT setup (~1500 CTs per host) and the fact
that I can have only a very low count (circa 5 - 7) of running
"inotify watch hungry" CTs (e.g., ones with a recent systemd > 240).
So, as 5 * 512 is well >> 1500, we can assume with confidence to
allow most reasonable and existing setups by default.
As with the kernel commit d46eb14b735b11927d4bdc2d1854c311af19de6d
"fs: fsnotify: account fsnotify metadata to kmemcg" [0] the memory
usage from the watch and queue overhead is accounted to the users
respective memory CGroup (i.e., for LXC containers their memory
limit) we can do this without to much fear of negative implications.
Don't change the hardcoded kernel default values directly though,
ship a sysctl.d configuration file, which is a bit more transparent
about what happens and can be shipped by the component needing this
(i.e., pve-container).
Follow the considerations of `man 5 sysctl.d` for shipping:
> Packages should install their configuration files in /lib/. Files
> in /etc/ are reserved for the local administrator, who may use this
> logic to override the configuration files installed by vendor
> packages. All configuration files are sorted by their filename in
> lexicographic order, regardless of which of the directories they
> reside in. If multiple files specify the same option, the entry in
> the file with the lexicographically latest name will take
> precedence. It is recommended to prefix all filenames with a
> two-digit number and a dash, to simplify the ordering of the files.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>