with more than a few images, 'rbd ls -l' gets rather slow
compared to a simple 'rbd ls'. since we only need to check
existing image names for finding a free one, the latter is
sufficient.
example with ~400 rbd images:
$ time rbd ls -p ceph-vm > /dev/null
real 0m0.027s
user 0m0.012s
sys 0m0.008s
$ time rbd ls -l -p ceph-vm > /dev/null
real 0m5.250s
user 0m1.632s
sys 0m0.584s
a linked clone of two disks on the same setup accordingly
also shows a massive speedup:
$ time qm clone 1000 10000 -snap test
create linked clone of drive scsi0 (ceph-vm:vm-1000-disk-2)
clone vm-1000-disk-2: vm-1000-disk-2 snapname test to
vm-10000-disk-1
create linked clone of drive scsi1 (ceph-vm:vm-1000-disk-1)
clone vm-1000-disk-1: vm-1000-disk-1 snapname test to
vm-10000-disk-2
real 0m11.157s
user 0m3.752s
sys 0m1.308s
$ time qm clone 1000 10000 -snap test
create linked clone of drive scsi1 (ceph-vm:vm-1000-disk-1)
clone vm-1000-disk-1: vm-1000-disk-1 snapname test to
vm-10000-disk-1
create linked clone of drive scsi0 (ceph-vm:vm-1000-disk-2)
clone vm-1000-disk-2: vm-1000-disk-2 snapname test to
vm-10000-disk-2
Dominik Csapak [Mon, 28 Nov 2016 12:34:19 +0000 (13:34 +0100)]
use qemu gluster blockdriver for linked clone creation
this works around a bug, where qemu does not align the qcow2 file
when using the filesystem directly, and the gluster blockdriver
refuses to read from it
the old code was way too broad here, this fixes at least the
following issues:
- importing of other/unconfigured zpools by "import -a"
- possible false positives if a pool name is a substring of
another pool name because of "list" without pool name,
potentially skipping activation for such pools
- not noticing failure to activate in activate_storage
because the success of "zpool import -a" does not tell us
anything about the pool we actually wanted to import
checking specifically for the pool to be activated when
calling "zpool list" gets rid of the second issue, and
trying to import only that pool fixes the other two.
Dominik Csapak [Thu, 6 Oct 2016 09:42:28 +0000 (11:42 +0200)]
allow rbd images < 1M to be detected
without this, having an efidisk on a ceph storage
prevents creating another disk on the same
ceph storage, because it will not be detected
and we try to allocate one with the same name
the smart checks are only needed for the API call(s) that
list all disks and their status, but get_disks is also used
in disk usage checks and in the Ceph code, where the smart
status is completely irrelevant.
drop the implicit skipping of smart checks if $disk is set,
since we have an explicit parameter for this now.
because we never ever want to die in get_disks because of a
single disk, but the nodes/xyz/disks/smart API path is
allowed to fail if a disk device is unsupported by smartctl
or something else goes wrong.
While the mkdir option deals with the case where we don't
want to clobber a mount point with directories (like ZFS,
gluster or NFS), putting a directory storage directly onto a
mount point is still risky:
If the path exists - which it usually does even if not
mounted - the storage will be considered successfully
activated, but empty (or with unexpected content). Some
operations will then lead to unexpected problems: the
free_disk operation for instance only warns if the disk does
not exist, but does not throw an error. In this case the
configuration might be updated without the real disk being
deleted. Once it's mounted back in, later operations which
check existing disks which are not part of the current VM
configuration (like migration) might error unexpectedly.
This adds an 'is_mountpoint' option to directory storages
which assumes the directory is an externally managed mount
point (eg. fstab or zfs) and changes activate_storage() to
throw an error if the path is not mounted.
So far this only prevented the creation of the toplevel
directory. This does not cover all problem cases,
particularly when said directory is supposed to be a mount
point, including NFS and glusterfs beside ZFS.
The directory based storages we have already use mkpath
whenever they need to create files, and for actions on files
which are supposed to exist it's fine if it errors out.
So it should also be safe to skip the creation of standard
subdirectories in activate_storage().
Additionally NFS and glusterfs storages should also accept
the mkdir option as they otherwise may exhibit similar
issues, eg. when an NFS storage is mounted onto a directory
inside a ZFS subvolume.
since the rbd images themselves are named differently than
the volumes in our config files, we need to recreate this
information from the parent relation in the ceph metadata,
otherwise list_images() might return wrong volume names/IDs
since list_images is used by PVE::Storage::vdisk_free() to
check for children still referencing a base image, because
of the wrong volume id RBDPlugin->parse_volname() does not
detect the base image of linked clones and the check fails.
this is thankfully mitigated by the protected status of the
base snapshot, but creates a rather confusing error message.
scenario (VM 701 is a linked clone of template VM 700):
before (pvesm list reports wrong volume ID, check fails):
$ pvesm list ceph_qemu
ceph_qemu:base-700-disk-1 raw 2147483648 700
ceph_qemu:vm-701-disk-1 raw 2147483648 701
$ pvesm free ceph_qemu:base-700-disk-1
snap_unprotect: can't unprotect; at least 1 child(ren) in pool rbd
rbd unprotect base-700-disk-1 snap '__base__' error: snap_unprotect: can't unprotect; at least 1 child(ren) in pool rbd
after (correct volume ID, check works as intended):
$ pvesm list ceph_qemu
ceph_qemu:base-700-disk-1 raw 2147483648 700
ceph_qemu:base-700-disk-1/vm-701-disk-1 raw 2147483648 701
$ pvesm free ceph_qemu:base-700-disk-1
base volume 'base-700-disk-1' is still in use (use by 'base-700-disk-1/vm-701-disk-1')