with more than a few images, 'rbd ls -l' gets rather slow
compared to a simple 'rbd ls'. since we only need to check
existing image names for finding a free one, the latter is
sufficient.
example with ~400 rbd images:
$ time rbd ls -p ceph-vm > /dev/null
real 0m0.027s
user 0m0.012s
sys 0m0.008s
$ time rbd ls -l -p ceph-vm > /dev/null
real 0m5.250s
user 0m1.632s
sys 0m0.584s
a linked clone of two disks on the same setup accordingly
also shows a massive speedup:
$ time qm clone 1000 10000 -snap test
create linked clone of drive scsi0 (ceph-vm:vm-1000-disk-2)
clone vm-1000-disk-2: vm-1000-disk-2 snapname test to
vm-10000-disk-1
create linked clone of drive scsi1 (ceph-vm:vm-1000-disk-1)
clone vm-1000-disk-1: vm-1000-disk-1 snapname test to
vm-10000-disk-2
real 0m11.157s
user 0m3.752s
sys 0m1.308s
$ time qm clone 1000 10000 -snap test
create linked clone of drive scsi1 (ceph-vm:vm-1000-disk-1)
clone vm-1000-disk-1: vm-1000-disk-1 snapname test to
vm-10000-disk-1
create linked clone of drive scsi0 (ceph-vm:vm-1000-disk-2)
clone vm-1000-disk-2: vm-1000-disk-2 snapname test to
vm-10000-disk-2
Dominik Csapak [Mon, 28 Nov 2016 12:34:19 +0000 (13:34 +0100)]
use qemu gluster blockdriver for linked clone creation
this works around a bug, where qemu does not align the qcow2 file
when using the filesystem directly, and the gluster blockdriver
refuses to read from it
the old code was way too broad here, this fixes at least the
following issues:
- importing of other/unconfigured zpools by "import -a"
- possible false positives if a pool name is a substring of
another pool name because of "list" without pool name,
potentially skipping activation for such pools
- not noticing failure to activate in activate_storage
because the success of "zpool import -a" does not tell us
anything about the pool we actually wanted to import
checking specifically for the pool to be activated when
calling "zpool list" gets rid of the second issue, and
trying to import only that pool fixes the other two.
Dominik Csapak [Thu, 6 Oct 2016 09:42:28 +0000 (11:42 +0200)]
allow rbd images < 1M to be detected
without this, having an efidisk on a ceph storage
prevents creating another disk on the same
ceph storage, because it will not be detected
and we try to allocate one with the same name
the smart checks are only needed for the API call(s) that
list all disks and their status, but get_disks is also used
in disk usage checks and in the Ceph code, where the smart
status is completely irrelevant.
drop the implicit skipping of smart checks if $disk is set,
since we have an explicit parameter for this now.
because we never ever want to die in get_disks because of a
single disk, but the nodes/xyz/disks/smart API path is
allowed to fail if a disk device is unsupported by smartctl
or something else goes wrong.
While the mkdir option deals with the case where we don't
want to clobber a mount point with directories (like ZFS,
gluster or NFS), putting a directory storage directly onto a
mount point is still risky:
If the path exists - which it usually does even if not
mounted - the storage will be considered successfully
activated, but empty (or with unexpected content). Some
operations will then lead to unexpected problems: the
free_disk operation for instance only warns if the disk does
not exist, but does not throw an error. In this case the
configuration might be updated without the real disk being
deleted. Once it's mounted back in, later operations which
check existing disks which are not part of the current VM
configuration (like migration) might error unexpectedly.
This adds an 'is_mountpoint' option to directory storages
which assumes the directory is an externally managed mount
point (eg. fstab or zfs) and changes activate_storage() to
throw an error if the path is not mounted.
So far this only prevented the creation of the toplevel
directory. This does not cover all problem cases,
particularly when said directory is supposed to be a mount
point, including NFS and glusterfs beside ZFS.
The directory based storages we have already use mkpath
whenever they need to create files, and for actions on files
which are supposed to exist it's fine if it errors out.
So it should also be safe to skip the creation of standard
subdirectories in activate_storage().
Additionally NFS and glusterfs storages should also accept
the mkdir option as they otherwise may exhibit similar
issues, eg. when an NFS storage is mounted onto a directory
inside a ZFS subvolume.
since the rbd images themselves are named differently than
the volumes in our config files, we need to recreate this
information from the parent relation in the ceph metadata,
otherwise list_images() might return wrong volume names/IDs
since list_images is used by PVE::Storage::vdisk_free() to
check for children still referencing a base image, because
of the wrong volume id RBDPlugin->parse_volname() does not
detect the base image of linked clones and the check fails.
this is thankfully mitigated by the protected status of the
base snapshot, but creates a rather confusing error message.
scenario (VM 701 is a linked clone of template VM 700):
before (pvesm list reports wrong volume ID, check fails):
$ pvesm list ceph_qemu
ceph_qemu:base-700-disk-1 raw 2147483648 700
ceph_qemu:vm-701-disk-1 raw 2147483648 701
$ pvesm free ceph_qemu:base-700-disk-1
snap_unprotect: can't unprotect; at least 1 child(ren) in pool rbd
rbd unprotect base-700-disk-1 snap '__base__' error: snap_unprotect: can't unprotect; at least 1 child(ren) in pool rbd
after (correct volume ID, check works as intended):
$ pvesm list ceph_qemu
ceph_qemu:base-700-disk-1 raw 2147483648 700
ceph_qemu:base-700-disk-1/vm-701-disk-1 raw 2147483648 701
$ pvesm free ceph_qemu:base-700-disk-1
base volume 'base-700-disk-1' is still in use (use by 'base-700-disk-1/vm-701-disk-1')
Dominik Csapak [Tue, 23 Aug 2016 10:20:46 +0000 (12:20 +0200)]
add api entries for disk management
adds a new class (intended to be used under nodes in pve-manager)
which adds the three api calls: list, smart and init
list being a general list of the available disk with infos
smart being a call to get the smart data from a given device
init being a call to write a gpt header to an unused disk
Dominik Csapak [Tue, 23 Aug 2016 10:20:45 +0000 (12:20 +0200)]
add Diskmanage Utilities
this adds the functions for listing the disks (mostly copied from
the ceph code), checking if a disk is a valid blockdevice, if it
is used/in a zfs pool/as an lvm pv, and an init function (just to add a gpt header;
this is important if one wants to use a fresh disk for ceph journals)
Dmitry Petuhov [Fri, 26 Aug 2016 13:06:33 +0000 (16:06 +0300)]
Add support for custom storage plugins
PVE team cannot support specialized vendor-specific storage
plugins because of lack of hardware. But we can allow users to
add own plugins for their storages without need to rewrite any
PVE code and thus ease PVE updates to them.
Idea of this patch is to add folder /usr/share/perl5/PVE/Storage/Custom
where user can place his plugins and PVE will automatically load
them on start or warn if it could not and continue. Maybe we could
even load all plugins (except PVE::Storage::Plugin itself) this way,
because current storage plugins are not really plugins, if they
need to be explicitly loaded in PVE code :-).
Custom plugins MUST have api() method returning version for which
it was designed. If API changes from PVE side, module is just not
being registered and warnig message is printed do log, so user have
to update module. Until module update, corresponding storage will
just disappear from PVE, so it shall not impose any data damage
because of API change.
This approach works (with some limitations) if plugin works in
generic PVE way: full control of volumes lifecycle. And will not
currently work for custom plugins like iSCSI, which needs to select
pre-existing volumes. Maybe someone will add more flexible way to
pve-manager to select input elements for storage plugins to target
this.