X-Git-Url: https://git.proxmox.com/?a=blobdiff_plain;f=local-zfs.adoc;h=d3db1c5439f67a9f9c6bfcbe6bf638afb38e3bb1;hb=447596fd16c159593439f61e817450bc9aaf5ef2;hp=89ab8bd847d2bc5da9c5ee8c95ab3df6b0f3ca3a;hpb=f4abc68ab1f1c54c1ad115160494fc3993ca1df0;p=pve-docs.git diff --git a/local-zfs.adoc b/local-zfs.adoc index 89ab8bd..d3db1c5 100644 --- a/local-zfs.adoc +++ b/local-zfs.adoc @@ -32,7 +32,8 @@ management. * Copy-on-write clone -* Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3 +* Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2, RAIDZ-3, +dRAID, dRAID2, dRAID3 * Can use SSD for cache @@ -42,8 +43,6 @@ management. * Designed for high storage capacities -* Protection against data corruption - * Asynchronous replication over network * Open Source @@ -57,22 +56,22 @@ Hardware ~~~~~~~~ ZFS depends heavily on memory, so you need at least 8GB to start. In -practice, use as much you can get for your hardware/budget. To prevent +practice, use as much as you can get for your hardware/budget. To prevent data corruption, we recommend the use of high quality ECC RAM. If you use a dedicated cache and/or log disk, you should use an enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can increase the overall performance significantly. -IMPORTANT: Do not use ZFS on top of hardware controller which has its -own cache management. ZFS needs to directly communicate with disks. An -HBA adapter is the way to go, or something like LSI controller flashed -in ``IT'' mode. +IMPORTANT: Do not use ZFS on top of a hardware RAID controller which has its +own cache management. ZFS needs to communicate directly with the disks. An +HBA adapter or something like an LSI controller flashed in ``IT'' mode is more +appropriate. If you are experimenting with an installation of {pve} inside a VM (Nested Virtualization), don't use `virtio` for disks of that VM, -since they are not supported by ZFS. Use IDE or SCSI instead (works -also with `virtio` SCSI controller type). +as they are not supported by ZFS. Use IDE or SCSI instead (also works +with the `virtio` SCSI controller type). Installation as Root File System @@ -246,12 +245,53 @@ them, unless your environment has specific needs and characteristics where RAIDZ performance characteristics are acceptable. +ZFS dRAID +~~~~~~~~~ + +In a ZFS dRAID (declustered RAID) the hot spare drive(s) participate in the RAID. +Their spare capacity is reserved and used for rebuilding when one drive fails. +This provides, depending on the configuration, faster rebuilding compared to a +RAIDZ in case of drive failure. More information can be found in the official +OpenZFS documentation. footnote:[OpenZFS dRAID +https://openzfs.github.io/openzfs-docs/Basic%20Concepts/dRAID%20Howto.html] + +NOTE: dRAID is intended for more than 10-15 disks in a dRAID. A RAIDZ +setup should be better for a lower amount of disks in most use cases. + +NOTE: The GUI requires one more disk than the minimum (i.e. dRAID1 needs 3). It +expects that a spare disk is added as well. + + * `dRAID1` or `dRAID`: requires at least 2 disks, one can fail before data is +lost + * `dRAID2`: requires at least 3 disks, two can fail before data is lost + * `dRAID3`: requires at least 4 disks, three can fail before data is lost + + +Additional information can be found on the manual page: + +---- +# man zpoolconcepts +---- + +Spares and Data +^^^^^^^^^^^^^^^ +The number of `spares` tells the system how many disks it should keep ready in +case of a disk failure. The default value is 0 `spares`. Without spares, +rebuilding won't get any speed benefits. + +`data` defines the number of devices in a redundancy group. The default value is +8. Except when `disks - parity - spares` equal something less than 8, the lower +number is used. In general, a smaller number of `data` devices leads to higher +IOPS, better compression ratios and faster resilvering, but defining fewer data +devices reduces the available storage capacity of the pool. + + Bootloader ~~~~~~~~~~ -Depending on whether the system is booted in EFI or legacy BIOS mode the -{pve} installer sets up either `grub` or `systemd-boot` as main bootloader. -See the chapter on xref:sysboot[{pve} host bootladers] for details. +{pve} uses xref:sysboot_proxmox_boot_tool[`proxmox-boot-tool`] to manage the +bootloader configuration. +See the chapter on xref:sysboot[{pve} host bootloaders] for details. ZFS Administration @@ -389,8 +429,15 @@ Changing a failed device .Changing a failed bootable device -Depending on how {pve} was installed it is either using `grub` or `systemd-boot` -as bootloader (see xref:sysboot[Host Bootloader]). +Depending on how {pve} was installed it is either using `systemd-boot` or `grub` +through `proxmox-boot-tool` +footnote:[Systems installed with {pve} 6.4 or later, EFI systems installed with +{pve} 5.4 or later] or plain `grub` as bootloader (see +xref:sysboot[Host Bootloader]). You can check by running: + +---- +# proxmox-boot-tool status +---- The first steps of copying the partition table, reissuing GUIDs and replacing the ZFS partition are the same. To make the system bootable from the new disk, @@ -405,37 +452,37 @@ different steps are needed which depend on the bootloader in use. NOTE: Use the `zpool status -v` command to monitor how far the resilvering process of the new disk has progressed. -.With `systemd-boot`: +.With `proxmox-boot-tool`: ---- -# pve-efiboot-tool format -# pve-efiboot-tool init +# proxmox-boot-tool format +# proxmox-boot-tool init ---- NOTE: `ESP` stands for EFI System Partition, which is setup as partition #2 on bootable disks setup by the {pve} installer since version 5.4. For details, see -xref:sysboot_systemd_boot_setup[Setting up a new partition for use as synced ESP]. +xref:sysboot_proxmox_boot_setup[Setting up a new partition for use as synced ESP]. -.With `grub`: +.With plain `grub`: ---- # grub-install ---- +NOTE: plain `grub` is only used on systems installed with {pve} 6.3 or earlier, +which have not been manually migrated to using `proxmox-boot-tool` yet. -Activate E-Mail Notification -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -ZFS comes with an event daemon, which monitors events generated by the -ZFS kernel module. The daemon can also send emails on ZFS events like -pool errors. Newer ZFS packages ship the daemon in a separate package, -and you can install it using `apt-get`: +Configure E-Mail Notification +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ----- -# apt-get install zfs-zed ----- +ZFS comes with an event daemon `ZED`, which monitors events generated by the ZFS +kernel module. The daemon can also send emails on ZFS events like pool errors. +Newer ZFS packages ship the daemon in a separate `zfs-zed` package, which should +already be installed by default in {pve}. -To activate the daemon it is necessary to edit `/etc/zfs/zed.d/zed.rc` with your -favourite editor, and uncomment the `ZED_EMAIL_ADDR` setting: +You can configure the daemon via the file `/etc/zfs/zed.d/zed.rc` with your +favorite editor. The required setting for email notification is +`ZED_EMAIL_ADDR`, which is set to `root` by default. -------- ZED_EMAIL_ADDR="root" @@ -444,33 +491,57 @@ ZED_EMAIL_ADDR="root" Please note {pve} forwards mails to `root` to the email address configured for the root user. -IMPORTANT: The only setting that is required is `ZED_EMAIL_ADDR`. All -other settings are optional. - [[sysadmin_zfs_limit_memory_usage]] Limit ZFS Memory Usage ~~~~~~~~~~~~~~~~~~~~~~ -It is good to use at most 50 percent (which is the default) of the -system memory for ZFS ARC to prevent performance shortage of the -host. Use your preferred editor to change the configuration in -`/etc/modprobe.d/zfs.conf` and insert: +ZFS uses '50 %' of the host memory for the **A**daptive **R**eplacement +**C**ache (ARC) by default. Allocating enough memory for the ARC is crucial for +IO performance, so reduce it with caution. As a general rule of thumb, allocate +at least +2 GiB Base + 1 GiB/TiB-Storage+. For example, if you have a pool with ++8 TiB+ of available storage space then you should use +10 GiB+ of memory for +the ARC. + +You can change the ARC usage limit for the current boot (a reboot resets this +change again) by writing to the +zfs_arc_max+ module parameter directly: + +---- + echo "$[10 * 1024*1024*1024]" >/sys/module/zfs/parameters/zfs_arc_max +---- + +To *permanently change* the ARC limits, add the following line to +`/etc/modprobe.d/zfs.conf`: -------- options zfs zfs_arc_max=8589934592 -------- -This example setting limits the usage to 8GB. +This example setting limits the usage to 8 GiB ('8 * 2^30^'). + +IMPORTANT: In case your desired +zfs_arc_max+ value is lower than or equal to ++zfs_arc_min+ (which defaults to 1/32 of the system memory), +zfs_arc_max+ will +be ignored unless you also set +zfs_arc_min+ to at most +zfs_arc_max - 1+. + +---- +echo "$[8 * 1024*1024*1024 - 1]" >/sys/module/zfs/parameters/zfs_arc_min +echo "$[8 * 1024*1024*1024]" >/sys/module/zfs/parameters/zfs_arc_max +---- + +This example setting (temporarily) limits the usage to 8 GiB ('8 * 2^30^') on +systems with more than 256 GiB of total memory, where simply setting ++zfs_arc_max+ alone would not work. [IMPORTANT] ==== -If your root file system is ZFS you must update your initramfs every +If your root file system is ZFS, you must update your initramfs every time this value changes: ---- -# update-initramfs -u +# update-initramfs -u -k all ---- + +You *must reboot* to activate these changes. ==== @@ -484,7 +555,7 @@ to an external Storage. We strongly recommend to use enough memory, so that you normally do not run into low memory situations. Should you need or want to add swap, it is -preferred to create a partition on a physical disk and use it as swapdevice. +preferred to create a partition on a physical disk and use it as a swap device. You can leave some space free for this purpose in the advanced options of the installer. Additionally, you can lower the ``swappiness'' value. A good value for servers is 10: @@ -518,6 +589,12 @@ improve performance when sufficient memory exists in a system. Encrypted ZFS Datasets ~~~~~~~~~~~~~~~~~~~~~~ +WARNING: Native ZFS encryption in {pve} is experimental. Known limitations and +issues include Replication with encrypted datasets +footnote:[https://bugzilla.proxmox.com/show_bug.cgi?id=2350], +as well as checksum errors when using Snapshots or ZVOLs. +footnote:[https://github.com/openzfs/zfs/issues/11688] + ZFS on Linux version 0.8.0 introduced support for native encryption of datasets. After an upgrade from previous ZFS on Linux versions, the encryption feature can be enabled per pool: @@ -659,7 +736,7 @@ WARNING: Adding a `special` device to a pool cannot be undone! ZFS datasets expose the `special_small_blocks=` property. `size` can be `0` to disable storing small file blocks on the `special` device or a power of -two in the range between `512B` to `128K`. After setting the property new file +two in the range between `512B` to `1M`. After setting the property new file blocks smaller than `size` will be allocated on the `special` device. IMPORTANT: If the value for `special_small_blocks` is greater than or equal to @@ -687,3 +764,38 @@ in the pool will opt in for small file blocks). ---- # zfs set special_small_blocks=0 / ---- + +[[sysadmin_zfs_features]] +ZFS Pool Features +~~~~~~~~~~~~~~~~~ + +Changes to the on-disk format in ZFS are only made between major version changes +and are specified through *features*. All features, as well as the general +mechanism are well documented in the `zpool-features(5)` manpage. + +Since enabling new features can render a pool not importable by an older version +of ZFS, this needs to be done actively by the administrator, by running +`zpool upgrade` on the pool (see the `zpool-upgrade(8)` manpage). + +Unless you need to use one of the new features, there is no upside to enabling +them. + +In fact, there are some downsides to enabling new features: + +* A system with root on ZFS, that still boots using `grub` will become + unbootable if a new feature is active on the rpool, due to the incompatible + implementation of ZFS in grub. +* The system will not be able to import any upgraded pool when booted with an + older kernel, which still ships with the old ZFS modules. +* Booting an older {pve} ISO to repair a non-booting system will likewise not + work. + +IMPORTANT: Do *not* upgrade your rpool if your system is still booted with +`grub`, as this will render your system unbootable. This includes systems +installed before {pve} 5.4, and systems booting with legacy BIOS boot (see +xref:sysboot_determine_bootloader_used[how to determine the bootloader]). + +.Enable new features for a ZFS pool: +---- +# zpool upgrade +----