ZFSPoolPlugin: fix #2662 get volume size correctly
Getting the volume sizes as byte values instead of converted to human
readable units helps to avoid rounding errors in the further processing
if the volume size is more on the odd side.
The `zfs list` command supports the -p(arseable) flag since a few years
now.
When returning the size in bytes there is no calculation performed and
thus we need to explicitly cast the size to an integer before returning
it.
to guess a valid volname for a targetstorage of a different type.
This makes it possible to migrate raw volumes between 'dir' and 'lvm'
storages.
It is only used when the storage type for the source storage X
and target storage Y differ and should work as long as Y uses
the standard naming scheme (VMID/vm-VMID-name.fmt respectively vm-VMID-name).
If it doesn't, we get an invalid name and fail, which is the old
behavior (except if X and Y have different types but the same
non-standard naming-scheme, where the old behavior did work).
The original name is preserved, except for a possible extension
and it's also checked if the format is valid for the target storage.
Example: mylvm:vm-123-disk-4 <-> mydir:123/vm-123-disk-4.raw
Introduce allow_rename parameter for pvesm import and storage_migrate
and also return the ID of the allocated volume. This option
allows plugins to choose a new name if there is a collision.
In storage_migrate, the API version of the receiving side is checked.
In Storage.pm's volume_import, when a plugin returns 'undef',
it can be assumed that the import with the requested volid was
successful (it should've died otherwise) and so volid is returned.
This is done for backwards compatibility with foreign plugins.
Instead of relying on list_volumes of Plugin.pm (which filters by
the content types set in the config), use our own to always
show the luns of an iscsi.
This makes sense here, since we need it to show the luns when using
it as base storage for LVM (where we have content type 'none' set).
It does not interfere with the rest of the GUI, since on e.g. disk
creation, we already filter the storages in the dropdown by content
type, iow. an iscsi storage used this way still does not show up
when trying to create a disk.
This also shows the luns now in the 'Content' tab, but this is also
OK, since the user cannot actually do anything there with the luns.
(Besides looking at them)
Fabian Ebner [Mon, 23 Mar 2020 11:18:50 +0000 (12:18 +0100)]
Allow passing options to volume_has_feature
With the option valid_target_formats it's possible
to let the caller specify possible formats for the target
of an operation.
[0]: If the option is not set, assume that every format is valid.
In most cases the format of the the target and the format
of the source will agree (and therefore assumption [0] is
not actually assuming very much and ensures backwards
compatability). But when cloning a volume on a storage
using Plugin.pm's implementation (e.g. directory based
storages), the result is always a qcow2 image.
When cloning containers, the new option can be used to detect
that qcow2 is not valid and hence the clone feature is not
available.
Fabian Ebner [Wed, 19 Feb 2020 10:31:31 +0000 (11:31 +0100)]
volume_resize: align size to 1 KiB
1. Avoids the error
qemu-img: The new size must be a multiple of 512
for qcow2 disks.
2. Because volume_import expects disk sizes to be a multiple of 1 KiB.
Thomas Lamprecht [Wed, 19 Feb 2020 13:51:00 +0000 (14:51 +0100)]
namespace storage specific secret files to 'priv/storage' folder
As /etc/pve/priv is already pretty polluted, having a
"<storage-id>.pw" file there smells like it could make problems in
the future.
So let the pbs pw file generator use /etc/pve/priv/storages as base
path.
Other storage should move also to that path in the future, if they
save such secrets anywhere in /etc/pve.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Fabian Ebner [Tue, 18 Feb 2020 10:14:59 +0000 (11:14 +0100)]
Check whether 'zfs get mountpoint' returns a valid absolute path
The command 'zfs get mountpoint' can return 'none' and so 'mountpoint
none' was written to storage.cfg, which would block the fall-back to
using the default mount point when requesting a path, see [0].
since we redirect the output to our (insecure) socket, logfunc is only
used for STDERR anyway, so we might as well make it explicit on the
caller side.
Thomas Lamprecht [Wed, 29 Jan 2020 18:48:49 +0000 (19:48 +0100)]
cephfs mount: reload systemd if existing unit gets regenerated
One the first write bringing the unit file in existence we can just
start it, after that we need to tell systemd that we want to actively
reload it.
While this is slightly shaky due to the fact that we do not check all
paths where such a unit could reside, it is something we can do
because earlier one couldn't have a unit/overwrite anyway (from
procfs mountinfo generated one do not support that) and does adding
such override ones from now on should work.
Also note that we can only get here in the "user does no weird stuff"
case when "cephfs_is_mounted" actively tells that there is no cephfs
mounted at the $mountpoint - at which time we can safely re-write the
potential updated unit file, reload and mount again.
So let's make our life a bit easier here until a user actually
complains about a rational issue for this, maybe we have PVE 7.0 then
and can get rid of that anyway :)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Wed, 29 Jan 2020 18:41:09 +0000 (19:41 +0100)]
cephfs: mount fuse through systemd with correct order dependencies
This fixes a potential races where fuse get's unmouted to late in the
shutdown process, i.e., at a time where network was down and it could
not talk to any MDS or monitor anymore.
We could fix it the same way we did once with the kernel based mount,
i.e., adding _netdev, but doing so would require to switch over from
"ceph-fuse" to "mount.fuse.ceph" which has better compatibility with
the common mount tool API.
As that helper exists we can reuse the newer systemd_netmount
ephemeral unit generator, only some options differ in name between
fuse and kernel variant.
So besides solving a potential issue we get a more unified handling
of those two cases.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Wed, 29 Jan 2020 17:02:13 +0000 (18:02 +0100)]
fix random hangs on reboot with active CephFS mount ordering cycle
commit 54e0b0034bd6654c566cb4ae7d4a5953c48cd1ca introduced the
"_netdev" option, for PVE 5.3. The systemd generator then correctly
resolved that in the following resulting order-dependencies:
> Wants=network-online.target
> Before=umount.target remote-fs.target
> After=remote-fs-pre.target system.slice network.target network-online.target -.mount
This worked well and all were happy. With the current systemd in 6.0
we sometimes get the local-fs ones there generated too. This is a
fallout from a try to better handling nested mount hierachies, where
a .mount unit needs to be mounter or unmounted, before or after,
respectively, the parent mount was processed. It seems that sometime
that glitches and thus a "RequireMountFor=/mnt/pve" gets thrown in
and result sometimes in the local-fs order constraints being added.
The issue now is, that one must not have ordering depends to all,
local-fs, local-fs-pre, remote-fs, remote-fs-pre, as that gets you a
ordering cycle. Systemd tries to solve that cycle by randomly
dropping one constraint and retrying. By luck this is a not so
important unit, and all goes on well. Most of the time one isn't that
lucky and something important gets dropped, for example:
> Jan 24 18:43:05 prod1 systemd[1]: sysinit.target: Found ordering cycle on systemd-timesyncd.service/stop
> Jan 24 18:43:05 prod1 systemd[1]: sysinit.target: Found dependency on systemd-tmpfiles-setup.service/stop
> Jan 24 18:43:05 prod1 systemd[1]: sysinit.target: Found dependency on local-fs.target/stop
> Jan 24 18:43:05 prod1 systemd[1]: sysinit.target: Found dependency on mnt-pve-cephfs.mount/stop
> Jan 24 18:43:05 prod1 systemd[1]: sysinit.target: Found dependency on remote-fs-pre.target/stop
> Jan 24 18:43:05 prod1 systemd[1]: sysinit.target: Found dependency on rbdmap.service/stop
> Jan 24 18:43:05 prod1 systemd[1]: sysinit.target: Found dependency on sysinit.target/stop
> Jan 24 18:43:05 prod1 systemd[1]: sysinit.target: Job remote-fs-pre.target/stop deleted to break ordering cycle starting with sysinit.target/stop
Then, most of the time the host reboot hangs for ~10 minutes, often
showing scapegoat units like the pve-ha-lrm being the cause of the
hang (even if no HA is configure >.<).
This behavior is fixed with newer systemd versions, e.g., the v244
from buster-backports, but that is not a real option for us for now.
So until 7.0 we generate the unit with the correct dependencies
directly in the ephemeral /run/ tmpfs backed systemd/system path and
start it.
While FUSE gets only the local-fs ordering constraint, it seems to cope
very well regarding such symptoms. But it _is_ racy and probably only
works due to systemd stopping it early as it has not much ordering
constraints at all.. It should be moved in the future nonetheless, as
there's a mount.fuse.ceph helper that should be not an issue.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Wed, 29 Jan 2020 17:00:34 +0000 (18:00 +0100)]
CephFSPlugin: copy over systemd_escape
This is but a hack, but we have no general helper/tools module here
and I do not want to do versioned dependencies for this fast-tracked
bugfix to pve-common, so I'll have to live with the shame for now.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
when listing volumes, otherwise an empty hash can be persisted into the
current worker's $vmlist, which could cause issues at various other API
endpoints.
Fabian Ebner [Wed, 11 Dec 2019 09:25:49 +0000 (10:25 +0100)]
Use a common interface for find_free_diskname
We can use 'list_images' to get the desired volume IDs in
'find_free_diskname' for most plugins. For the two LVM plugins, 'list_images'
potentially skips untagged volumes, so we keep the custom version. For the
RBD plugin, 'list_images' is much more costly than the custom version, so we
keep the custom version.
we need to unprotect more snapshots than just the base one, since we
allow linked clones of regular VM snapshots. unprotection will only work
if no linked clones exist anymore.
Fabian Ebner [Mon, 9 Dec 2019 07:25:53 +0000 (08:25 +0100)]
When resizing a ZFS volume, align size to 1M
The size is required to be a multiple of volblocksize. Make sure
that the requirement is always met, so ZFS won't complain when we do
things like 'qm resize 102 scsi1 +0.01G'.
Thomas Lamprecht [Fri, 29 Nov 2019 13:44:13 +0000 (14:44 +0100)]
LVM commands: ignore "No medium found" bogus warnings
Those come normally from virtual devices, like a IPMI disk, if no
media is attached. They spam the log really often on operations like
migrate, and are quite scare-mongering. So filter them out.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
rbd: don't attempt to update features of snapshots
it does not work:
disable RBD image features this kernel RBD drivers is not compatible with: fast-diff,object-map,deep-flatten
clone failed: could not disable krbd-incompatible image features 'fast-diff,object-map,deep-flatten' for rbd image: vm-123123123-disk-0@test: rbd: snapshot name specified for a command that doesn't use it
pvesm import: improve handling of interrupted export
since 'pvesm export' and 'pvesm import' are connected via a pipe and
SSH, a fatal error in the former can lead to no valid header being
written to the pipe. handle this more gracefully by printing an easier
to understand error message, instead of uninitialized warnings with no
context.
Thomas Lamprecht [Sat, 23 Nov 2019 14:37:46 +0000 (15:37 +0100)]
RBD: disable and enable features depending on kernel version
Modern kernel, like 5.3, support all those features ('fast-diff',
'object-map', 'deep-flatten'), so we do not want to disable them
there. 5.0 already supports exclusive-locks, so no need to disable
exclusive locking there.
Further, we also want to profit from new features available, so let's
enable those which can be enabled "live" (i.e., after image creation)
if their available.
While we could also parse the kernel information directly from:
/sys/module/libceph/parameters/supported_features
there's not much advantage to that, features cannot be disabled with
KConfig, their also very dependent of the kernel version booted.
So for us it's enough to check that one.
This only affects container and VMs backed by a storage with KRBD
explicitly enabled. But as the enabling and disabling happens
transparently, it has no effect on the running guest.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Tim Marx [Thu, 21 Nov 2019 10:43:20 +0000 (11:43 +0100)]
fix #2467: avoid duplicate volumes & tag with correct content type
The bugfix for #2317 introduced a kind of odd API behavior, where
each volume was returned twice from our API if a storage has both
'rootdir' & 'images' content types enabled. To give the content type
of the volume an actual meaning, it is now inferred from the
associated guest, if there's no guest or we don't have an owner for
that volume we default to 'images'.
At the volume level, there is no option to list volumes based on
content types, since the volumes do not know what type they are
actually used for.
Fabian Ebner [Mon, 18 Nov 2019 10:45:38 +0000 (11:45 +0100)]
fix #2085: add mountpoint property for non-default ZFS pool MPs
When adding a zfspool storage with 'pvesm add' the mount point is now
added automatically to the storage configuration if it can be
determined. path() does not assume the default mountpoint anymore,
fixing 2085.