* Copy-on-write clone
-* Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3
+* Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2, RAIDZ-3,
+dRAID, dRAID2, dRAID3
* Can use SSD for cache
~~~~~~~~
ZFS depends heavily on memory, so you need at least 8GB to start. In
-practice, use as much you can get for your hardware/budget. To prevent
+practice, use as much as you can get for your hardware/budget. To prevent
data corruption, we recommend the use of high quality ECC RAM.
If you use a dedicated cache and/or log disk, you should use an
-enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can
+enterprise class SSD. This can
increase the overall performance significantly.
-IMPORTANT: Do not use ZFS on top of hardware controller which has its
-own cache management. ZFS needs to directly communicate with disks. An
-HBA adapter is the way to go, or something like LSI controller flashed
-in ``IT'' mode.
+IMPORTANT: Do not use ZFS on top of a hardware RAID controller which has its
+own cache management. ZFS needs to communicate directly with the disks. An
+HBA adapter or something like an LSI controller flashed in ``IT'' mode is more
+appropriate.
If you are experimenting with an installation of {pve} inside a VM
(Nested Virtualization), don't use `virtio` for disks of that VM,
-since they are not supported by ZFS. Use IDE or SCSI instead (works
-also with `virtio` SCSI controller type).
+as they are not supported by ZFS. Use IDE or SCSI instead (also works
+with the `virtio` SCSI controller type).
Installation as Root File System
parameters of interest are the IOPS (Input/Output Operations per Second) and
the bandwidth with which data can be written or read.
-A 'mirror' vdev (RAID1) will approximately behave like a single disk in regards
-to both parameters when writing data. When reading data if will behave like the
-number of disks in the mirror.
+A 'mirror' vdev (RAID1) will approximately behave like a single disk in regard
+to both parameters when writing data. When reading data the performance will
+scale linearly with the number of disks in the mirror.
A common situation is to have 4 disks. When setting it up as 2 mirror vdevs
(RAID10) the pool will have the write characteristics as two single disks in
-regard of IOPS and bandwidth. For read operations it will resemble 4 single
+regard to IOPS and bandwidth. For read operations it will resemble 4 single
disks.
A 'RAIDZ' of any redundancy level will approximately behave like a single disk
-in regard of IOPS with a lot of bandwidth. How much bandwidth depends on the
+in regard to IOPS with a lot of bandwidth. How much bandwidth depends on the
size of the RAIDZ vdev and the redundancy level.
For running VMs, IOPS is the more important metric in most situations.
The `volblocksize` property can only be set when creating a ZVOL. The default
value can be changed in the storage configuration. When doing this, the guest
needs to be tuned accordingly and depending on the use case, the problem of
-write amplification if just moved from the ZFS layer up to the guest.
+write amplification is just moved from the ZFS layer up to the guest.
Using `ashift=9` when creating the pool can lead to bad
performance, depending on the disks underneath, and cannot be changed later on.
RAIDZ performance characteristics are acceptable.
+ZFS dRAID
+~~~~~~~~~
+
+In a ZFS dRAID (declustered RAID) the hot spare drive(s) participate in the RAID.
+Their spare capacity is reserved and used for rebuilding when one drive fails.
+This provides, depending on the configuration, faster rebuilding compared to a
+RAIDZ in case of drive failure. More information can be found in the official
+OpenZFS documentation. footnote:[OpenZFS dRAID
+https://openzfs.github.io/openzfs-docs/Basic%20Concepts/dRAID%20Howto.html]
+
+NOTE: dRAID is intended for more than 10-15 disks in a dRAID. A RAIDZ
+setup should be better for a lower amount of disks in most use cases.
+
+NOTE: The GUI requires one more disk than the minimum (i.e. dRAID1 needs 3). It
+expects that a spare disk is added as well.
+
+ * `dRAID1` or `dRAID`: requires at least 2 disks, one can fail before data is
+lost
+ * `dRAID2`: requires at least 3 disks, two can fail before data is lost
+ * `dRAID3`: requires at least 4 disks, three can fail before data is lost
+
+
+Additional information can be found on the manual page:
+
+----
+# man zpoolconcepts
+----
+
+Spares and Data
+^^^^^^^^^^^^^^^
+The number of `spares` tells the system how many disks it should keep ready in
+case of a disk failure. The default value is 0 `spares`. Without spares,
+rebuilding won't get any speed benefits.
+
+`data` defines the number of devices in a redundancy group. The default value is
+8. Except when `disks - parity - spares` equal something less than 8, the lower
+number is used. In general, a smaller number of `data` devices leads to higher
+IOPS, better compression ratios and faster resilvering, but defining fewer data
+devices reduces the available storage capacity of the pool.
+
+
Bootloader
~~~~~~~~~~
-Depending on whether the system is booted in EFI or legacy BIOS mode the
-{pve} installer sets up either `grub` or `systemd-boot` as main bootloader.
-See the chapter on xref:sysboot[{pve} host bootladers] for details.
+{pve} uses xref:sysboot_proxmox_boot_tool[`proxmox-boot-tool`] to manage the
+bootloader configuration.
+See the chapter on xref:sysboot[{pve} host bootloaders] for details.
ZFS Administration
Add cache and log to an existing pool
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-If you have a pool without cache and log. First partition the SSD in
-2 partition with `parted` or `gdisk`
+If you have a pool without cache and log, first create 2 partitions on the SSD
+with `parted` or `gdisk`.
IMPORTANT: Always use GPT partition tables.
.Changing a failed bootable device
-Depending on how {pve} was installed it is either using `grub` or `systemd-boot`
-as bootloader (see xref:sysboot[Host Bootloader]).
+Depending on how {pve} was installed it is either using `systemd-boot` or `grub`
+through `proxmox-boot-tool`
+footnote:[Systems installed with {pve} 6.4 or later, EFI systems installed with
+{pve} 5.4 or later] or plain `grub` as bootloader (see
+xref:sysboot[Host Bootloader]). You can check by running:
+
+----
+# proxmox-boot-tool status
+----
The first steps of copying the partition table, reissuing GUIDs and replacing
the ZFS partition are the same. To make the system bootable from the new disk,
NOTE: Use the `zpool status -v` command to monitor how far the resilvering
process of the new disk has progressed.
-.With `systemd-boot`:
+.With `proxmox-boot-tool`:
----
-# pve-efiboot-tool format <new disk's ESP>
-# pve-efiboot-tool init <new disk's ESP>
+# proxmox-boot-tool format <new disk's ESP>
+# proxmox-boot-tool init <new disk's ESP>
----
NOTE: `ESP` stands for EFI System Partition, which is setup as partition #2 on
bootable disks setup by the {pve} installer since version 5.4. For details, see
-xref:sysboot_systemd_boot_setup[Setting up a new partition for use as synced ESP].
+xref:sysboot_proxmox_boot_setup[Setting up a new partition for use as synced ESP].
-.With `grub`:
+.With plain `grub`:
----
# grub-install <new disk>
----
+NOTE: plain `grub` is only used on systems installed with {pve} 6.3 or earlier,
+which have not been manually migrated to using `proxmox-boot-tool` yet.
-Activate E-Mail Notification
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-ZFS comes with an event daemon, which monitors events generated by the
-ZFS kernel module. The daemon can also send emails on ZFS events like
-pool errors. Newer ZFS packages ship the daemon in a separate package,
-and you can install it using `apt-get`:
+Configure E-Mail Notification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----
-# apt-get install zfs-zed
-----
+ZFS comes with an event daemon `ZED`, which monitors events generated by the ZFS
+kernel module. The daemon can also send emails on ZFS events like pool errors.
+Newer ZFS packages ship the daemon in a separate `zfs-zed` package, which should
+already be installed by default in {pve}.
-To activate the daemon it is necessary to edit `/etc/zfs/zed.d/zed.rc` with your
-favourite editor, and uncomment the `ZED_EMAIL_ADDR` setting:
+You can configure the daemon via the file `/etc/zfs/zed.d/zed.rc` with your
+favorite editor. The required setting for email notification is
+`ZED_EMAIL_ADDR`, which is set to `root` by default.
--------
ZED_EMAIL_ADDR="root"
Please note {pve} forwards mails to `root` to the email address
configured for the root user.
-IMPORTANT: The only setting that is required is `ZED_EMAIL_ADDR`. All
-other settings are optional.
-
[[sysadmin_zfs_limit_memory_usage]]
Limit ZFS Memory Usage
This example setting limits the usage to 8 GiB ('8 * 2^30^').
+IMPORTANT: In case your desired +zfs_arc_max+ value is lower than or equal to
++zfs_arc_min+ (which defaults to 1/32 of the system memory), +zfs_arc_max+ will
+be ignored unless you also set +zfs_arc_min+ to at most +zfs_arc_max - 1+.
+
+----
+echo "$[8 * 1024*1024*1024 - 1]" >/sys/module/zfs/parameters/zfs_arc_min
+echo "$[8 * 1024*1024*1024]" >/sys/module/zfs/parameters/zfs_arc_max
+----
+
+This example setting (temporarily) limits the usage to 8 GiB ('8 * 2^30^') on
+systems with more than 256 GiB of total memory, where simply setting
++zfs_arc_max+ alone would not work.
+
[IMPORTANT]
====
If your root file system is ZFS, you must update your initramfs every
time this value changes:
----
-# update-initramfs -u
+# update-initramfs -u -k all
----
You *must reboot* to activate these changes.
We strongly recommend to use enough memory, so that you normally do not
run into low memory situations. Should you need or want to add swap, it is
-preferred to create a partition on a physical disk and use it as swapdevice.
+preferred to create a partition on a physical disk and use it as a swap device.
You can leave some space free for this purpose in the advanced options of the
installer. Additionally, you can lower the
``swappiness'' value. A good value for servers is 10:
Encrypted ZFS Datasets
~~~~~~~~~~~~~~~~~~~~~~
+WARNING: Native ZFS encryption in {pve} is experimental. Known limitations and
+issues include Replication with encrypted datasets
+footnote:[https://bugzilla.proxmox.com/show_bug.cgi?id=2350],
+as well as checksum errors when using Snapshots or ZVOLs.
+footnote:[https://github.com/openzfs/zfs/issues/11688]
+
ZFS on Linux version 0.8.0 introduced support for native encryption of
datasets. After an upgrade from previous ZFS on Linux versions, the encryption
feature can be enabled per pool:
ZFS datasets expose the `special_small_blocks=<size>` property. `size` can be
`0` to disable storing small file blocks on the `special` device or a power of
-two in the range between `512B` to `128K`. After setting the property new file
+two in the range between `512B` to `1M`. After setting the property new file
blocks smaller than `size` will be allocated on the `special` device.
IMPORTANT: If the value for `special_small_blocks` is greater than or equal to