The import methods (volume_import, volume_import_formats):
These additionally get the '$snapshot' parameter which is
already present on the export side as an informational piece
to know which of the snapshots is the *current* one.
This parameter is inserted *in the middle* of the current
parameters, so the import & export format methods now have
the same signatures.
The current "disk" state will be set to this snapshot.
This, too, is required for our btrfs implementation.
`volume_import_formats` can obviously not make much
*use* of this parameter, but it'll still be useful to know
that the information is actually available in the import
call, so its presence will be checked in the btrfs
implementation.
Currently this is intended to be used for btrfs send/recv
support, which in theory could also get additional metadata
similar to how we do the "tar+size" format, however, we
currently only really use this within this repository in
storage_migrate() which has this information readily
available anyway.
On the export side (volume_export, volume_export_formats):
The `$with_snapshots` option is now "defined" to be an
ordered array of snapshots to include, as a hint for
storages which need this. (As of the next commit this is
only btrfs, and only when also specifying a base snapshot,
which is a case we can currently not run into except on the
command line interface.)
The current providers of the `with_snapshot` option will
still treat it as a boolean (since eg. for ZFS you cannot
really "skip" snapshots AFAIK).
This is mainly intended for storages which do not have a
strong association between snapshots and the originals, or
an ordering (eg. btrfs and lvm-thin allow creating
arbitrary snapshot trees, and with btrfs you can even
create a "circular" connection between subvolumes, also we
could consider reflink based copies snapshots on xfs in
the future maybe?)
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
This is mostly the same as a directory storage, with 2 major
differences:
* 'subvol' volumes are actual btrfs subvolumes and therefore
allow snapshots
* 'raw' files are placed *into* a subvolume and therefore
also allow snapshots, the raw file for volume
`btrstore:100/vm-100-disk-1.raw` can be found under
`$path/images/100/vm-100-disk-1/disk.raw`
* in both cases, snapshots add an '@name' suffix to the
subvolume's directory name, so snapshot 'foo' of the above
would be found under
`$path/images/100/vm-100-disk-1@foo/disk.raw`
or for format "subvol":
`$path/images/100/subvol-100-disk-1.subvol@foo`
Note that qgroups aren't included in btrfs-send streams,
therefore for now we will only be using *unsized* subvolumes
for containers and place a regular raw+ext4 file for sized
containers.
We could extend the import/export stream format to include
the information at the front (similar to how we do the
"tar+size" format, but we need to include the size of all
the contained snapshots as well, since they can technically
change). (But before enabling quotas we should do some
performance testing on bigger file systems with multiple
snapshots as there are quite a few reports of the fs slowing
down considerably in such scenarios).
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
One test had to be adapted because it tested obsolete code. Namely:
it expects vztmpl to only end with .tar.gz, but the new regex also
includes .tar.xz, there is nothing against allowing .tar.xz files as
vztmpl files.
Stoiko Ivanov [Tue, 22 Jun 2021 16:39:54 +0000 (18:39 +0200)]
plugins: untaint volume_size_info retuns
the size returned by volume_size_info is used for creating the new
destination image in PVE::QemuServer::clone_disk (and probably
elsewhere). In certain cases the return values are tainted - they are
obtained by a run_command call and depending on the format and length
of the parsed output can still have their tainted attribute.
One example of a tainted return has been reported in our
community-forum:
https://forum.proxmox.com/threads/cannot-clone-vm-or-move-disk-with-more-than-13-snapshots.89628/
A qcow2 image with 13 snapshots generates a output > 4k in length from
`qemu-img info --output=json`, which in turn causes the output to be
considered tainted.
This patch untaints the returns where applicable. The other
storage-plugins are not affected:
* LVMPlugin returns a single number and a newline (thus gets untainted
by run_command)
* RBDPlugin untaints the complete json before decoding
* ZFSPoolplugin and ISCSIDirectPlugin explicitly untaint their
returns.
Fabian Ebner [Fri, 18 Jun 2021 10:59:35 +0000 (12:59 +0200)]
vdisk_list: only scan storages with the correct content type(s)
The enabled check in the lower loop is now redundant and can be removed.
If storeid is provided, initialize the result hash accordingly, mainly for
backwards compatibility (needed by a caller in pve-manager's Ceph/Pools.pm and
the migration code in pve-container and qemu-server), but it also is less
surprising in general.
Remaining vdisk_list users that do not specify a content type are:
1. pve-manager's Pool/Ceph.pm, but the content type for RBD can only be
rootdir and images, so the storage is scanned (if enabled, same as
before).
2. pve-container migration
3. qemu-server migration
For the latter two, it's planned to enforce content type, so the change is fine
too.
This also means that for iscsi(direct) plugins with content type 'none', i.e.
"use LUNs directly" does not return the list of images anymore, but that was
rather a bug anyways as they're not virtual disks then:
0.0.0.scsi-36001405b8f2772e13a04b8e9390db13d
All of the remaining callers not using content types (see above) are fine with
that change too.
Fabian Ebner [Tue, 4 May 2021 09:52:54 +0000 (11:52 +0200)]
lvm: volume import: handle worker returned by free_image
only affects LVM storages with 'saferemove 1' where the import fails at a rather
advanced stage. Previously in such cases, the renamed (by free_image) volume
del-vm-XYZ-disk-N would be left over.
Fabian Ebner [Tue, 4 May 2021 09:52:53 +0000 (11:52 +0200)]
pbs: free image: explicitly return undef
Storage.pm's vdisk_free interprets truthy return values as worker subs, so be
explicit about returning undef here. Not an issue at the moment, because
run_client_command already returns undef, but better be safe than sorry.
The latter commit suggests to switch to using mount.fuse.ceph for the '_netdev'
option, but it doesn't seem to work:
root@pve701 / # mount -t fuse.ceph 10.10.10.11,10.10.10.12,10.10.10.13:/ /mnttest/fuse -o 'ceph.id=admin,ceph.keyfile=/etc/pve/priv/ceph/cephfs.secret,ceph.conf=/etc/pve/ceph.conf,_netdev'
ceph-fuse[20729]: starting ceph client
2021-06-15T14:22:00.631+0200 7f995f878080 -1 init, newargv = 0x55e09fc11a40 newargc=11
ceph-fuse[20729]: starting fuse
root@pve701 / # mount -t ceph 10.10.10.11,10.10.10.12,10.10.10.13:/ /mnttest/normal -o 'name=admin,secretfile=/etc/pve/priv/ceph/cephfs.secret,conf=/etc/pve/ceph.conf,_netdev'
root@pve701 / # mount | grep mnttest
ceph-fuse on /mnttest/fuse type fuse.ceph-fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
10.10.10.11,10.10.10.12,10.10.10.13:/ on /mnttest/normal type ceph (rw,relatime,name=admin,secret=<hidden>,acl,_netdev)
Also, the return value is not propagated by mount.fuse.ceph, meaning the output
would need to be parsed...
root@pve701 ~ # mount -t fuse.ceph 10.10.10.11,10.10.10.12,10.10.10.13:/ /mnttest/fuse -o 'ceph.id=admin,ceph.keyfile=/etc/pve/priv/ceph/cephfs.secret,ceph.conf=/etc/pve/ceph.conf,_netdev'
2021-06-15T14:42:56.326+0200 7f634edae080 -1 init, newargv = 0x560cdb5e0a40 newargc=11
ceph-fuse[34480]: starting ceph client
fuse: mountpoint is not empty
fuse: if you are sure this is safe, use the 'nonempty' mount option
ceph-fuse[34480]: fuse failed to start
2021-06-15T14:42:56.338+0200 7f634edae080 -1
fuse_mount(mountpoint=/mnttest/fuse) failed.
Mount failed with status code: 5
root@pve701 ~ # echo $?
0
Fabian Ebner [Wed, 16 Jun 2021 07:26:59 +0000 (09:26 +0200)]
cephfs: revert safe-guard check for Luminous
It's necessary to be on Nautilus before upgrading to 7.x, so the check is no
longer needed. See commit e54c3e334760491954bc42f3585a8b5b136d4b1d. It didn't
cleanly revert, because there were cleanups made afterwards.
Fabian Ebner [Wed, 16 Jun 2021 07:26:58 +0000 (09:26 +0200)]
config: add backup content type to default local storage
which is used if there is no ('dir'-type) 'local' entry. Storage configurations
made by the installer also support backups for the 'local' storage, and the
'prune-backups' parameter is not really useful otherwise.
Fabian Ebner [Wed, 16 Jun 2021 07:26:57 +0000 (09:26 +0200)]
config: mention that maxfiles is deprecated
Don't add an explicit deprecation warning on parsing (yet), this already done in
the pve6to7 script. Also, automatic conversion to 'prune-backups' happens when
the section config is read, so over time fewer users should be affected.
Postpone explicit warning/dropping the parameter to a future major release.
Also switch the setting for the default 'local' storage to 'prune-backups'.
Try to detect active mounts and holders early, because it's cheap. The wipefs
command in the worker will detect even more situations where wiping alone is
not enough for the device to show up as unused, or could otherwise be
problematic.
based on the wipe_disks method from pve-manager's Ceph/Tools.pm with the
following main differences:
* use wipefs to wipe labels first (to avoid sgdisk complaining about the
backed up GPT structure on a subsequent GPT initialization)
* only take one device as an argument
* do not use an absolute path for 'dd'
* die if one of the command fails
The wipefs command makes checks and complains about e.g. mounted or active
devices. One could supply --force to wipefs, but in many such situations it
does not work as expected, because the device would still be detected as in-use
afterwards, and further manaual steps would be needed.
Fabian Ebner [Thu, 4 Feb 2021 10:26:06 +0000 (11:26 +0100)]
clone image: specify base format option with qemu-img
and avoid a warning. It is deprecated to auto-detect the format of the base
volume. See commit d9f059aa6cfccefaffa3532556e966df4a99ece2 in qemu for more
information.
instead of just the snapshot for consistency with other API endpoints,
and possible future extension to VMA backups (where 'snapshot' would be
a rather strange terminology).
add some additional checks (pbs storage type, backup volume type),
completion and magic (allow passing in either a full volume ID with
correct storage, or just the volume name, or just the snapshot for
easier API/CLI usage/convenience).
Not replacing it with return, because the current behavior is dying:
Can't "next" outside a loop block
and the single existing caller in pve-manager's API2/Ceph/OSD.pm does not check
the return value.
Also check for $st, which can be undefined in case a non-existing path was
provided. This also led to dying previously:
Can't call method "mode" on an undefined value
diskmanage: improve setting usage for whole disk with include-partitions
in case a disk with partitions also has an fstype set, which happens for our ZFS
boot disks. Do not change the behavior without include-partitons, as we
prefer(red) to be more specific than simply 'partitions' then.
Currently, it's possible to break replication by:
1. have an existing snapshot whose name contains an uppercase letter
2. set up a replication job and run it
3. rollback to the existing snapshot
4. replicate again -> fails
The failure occurs, because after step 3, the most recent common snapshot is the
previously existing one and currently no uppercase letters are allowed for
export/import.
The pve-snapshot-name option uses the CONFIGID_RE
qr/[a-z][a-z0-9_-]+/i
so it cannot be used here, because it would not allow for e.g. '__migrate__'.
Simply allow uppercase letters, to be backwards compatible and allow all
possible pve-snapshot-name values.
There is still an issue if there also was state volume, but that's a different
bug[1].
fix #3345: zfs: restore container volume to ZFS with size 0
A restore to ZFS for a container which has a volume (rootfs / mount
point) of size 0 failed because the refquota property does not accept
'0k' but wants 'none' in that situation.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This patch introduces support for Cephs RBD namespaces.
A new storage config parameter 'namespace' defines the namespace to be
used for the RBD storage.
The namespace must already exist in the Ceph cluster as it is not
automatically created.
The main intention is to use this for external Ceph clusters. With
namespaces, each PVE cluster can get its own namespace and will not
conflict with other PVE clusters.
The <pool>/<image> paths are needed in quite a lot of places. Having one
single place where they are created helps to reduce duplicate code and
makes it easier to introduce new features.
The 'add_pool_to_disk' sub was already doing that but the name was not
really fitting. This commit renames it to the more general
'get_rbd_path' and changes the second parameter to the more widely used
$volume instead of $disk.
Furthermore, all occurences where "$pool/$volume" has been concatenated
have been replaced with a call to get_rbd_path.
Plus some minor code style cleanups for long function calls that were
touched.
by relying on archive_info's vmid first. archive_info is already used to
determine if it's a standard name, and in that case the vmid is certainly set.
Also add asserts to make sure we got what we expected.
The mentioned commit is actually a backwards-incompatible change that leads to
slightly different behavior when migrating a VM with volumes on a misconfigured
storage. For example, unreferenced volumes on a misconfigured storage won't be
picked up, even though they were before. And for referenced volumes on a
misconfigured storage, the disk size would not be updated on migration anymore.
We should wait until the next major release for this change and then also
re-evaluate the migration behavior with misconfigured disks.
Fabian Ebner [Fri, 12 Mar 2021 09:50:26 +0000 (10:50 +0100)]
vdisk list: only collect images from storages with an appropriate content type
Only these storages are activated in the first place, and it's bad behavior to
list images when no appropriate content type is not set.
For example, on VM destruction, this avoids unreferenced images to be deleted
from a storage with only 'backup' content type set, which is supposedly what
happened in this[0] forum thread.
(Some) callers expect all keys to be present and valid array references in the
result, so initialization is needed.
Now, the enabled check is already done by the preceding code for every element
that is iterated over, and thus isn't needed in the main loop anymore.
Fabian Ebner [Wed, 10 Mar 2021 09:26:27 +0000 (10:26 +0100)]
api: disk list: allow if an audit permission for the node is present
as that seems to be the more natural permission path for listing a nodes local
disks. For backwards compatibility, the old permission check has to be kept
(relevant with propagate=0).
This API call was originally part of the Ceph API and got copied here later,
which might explain the current permission check.
In the UI, the Disk panel is visible with a node audit permission, but the API
call itself failed without the '/' audit permission.
Fabian Ebner [Thu, 11 Feb 2021 10:24:13 +0000 (11:24 +0100)]
storage migration: insecure: improve logging
by including the message/error from the remote side. Some people on the forum[0]
ran into 'no tunnel IP received', but without information from the remote side
it's hard to tell why.