* Copy-on-write clone
-* Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3
+* Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2, RAIDZ-3,
+dRAID, dRAID2, dRAID3
* Can use SSD for cache
data corruption, we recommend the use of high quality ECC RAM.
If you use a dedicated cache and/or log disk, you should use an
-enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can
+enterprise class SSD. This can
increase the overall performance significantly.
IMPORTANT: Do not use ZFS on top of a hardware RAID controller which has its
parameters of interest are the IOPS (Input/Output Operations per Second) and
the bandwidth with which data can be written or read.
-A 'mirror' vdev (RAID1) will approximately behave like a single disk in regards
-to both parameters when writing data. When reading data if will behave like the
-number of disks in the mirror.
+A 'mirror' vdev (RAID1) will approximately behave like a single disk in regard
+to both parameters when writing data. When reading data the performance will
+scale linearly with the number of disks in the mirror.
A common situation is to have 4 disks. When setting it up as 2 mirror vdevs
(RAID10) the pool will have the write characteristics as two single disks in
-regard of IOPS and bandwidth. For read operations it will resemble 4 single
+regard to IOPS and bandwidth. For read operations it will resemble 4 single
disks.
A 'RAIDZ' of any redundancy level will approximately behave like a single disk
-in regard of IOPS with a lot of bandwidth. How much bandwidth depends on the
+in regard to IOPS with a lot of bandwidth. How much bandwidth depends on the
size of the RAIDZ vdev and the redundancy level.
For running VMs, IOPS is the more important metric in most situations.
The `volblocksize` property can only be set when creating a ZVOL. The default
value can be changed in the storage configuration. When doing this, the guest
needs to be tuned accordingly and depending on the use case, the problem of
-write amplification if just moved from the ZFS layer up to the guest.
+write amplification is just moved from the ZFS layer up to the guest.
Using `ashift=9` when creating the pool can lead to bad
performance, depending on the disks underneath, and cannot be changed later on.
RAIDZ performance characteristics are acceptable.
+ZFS dRAID
+~~~~~~~~~
+
+In a ZFS dRAID (declustered RAID) the hot spare drive(s) participate in the RAID.
+Their spare capacity is reserved and used for rebuilding when one drive fails.
+This provides, depending on the configuration, faster rebuilding compared to a
+RAIDZ in case of drive failure. More information can be found in the official
+OpenZFS documentation. footnote:[OpenZFS dRAID
+https://openzfs.github.io/openzfs-docs/Basic%20Concepts/dRAID%20Howto.html]
+
+NOTE: dRAID is intended for more than 10-15 disks in a dRAID. A RAIDZ
+setup should be better for a lower amount of disks in most use cases.
+
+NOTE: The GUI requires one more disk than the minimum (i.e. dRAID1 needs 3). It
+expects that a spare disk is added as well.
+
+ * `dRAID1` or `dRAID`: requires at least 2 disks, one can fail before data is
+lost
+ * `dRAID2`: requires at least 3 disks, two can fail before data is lost
+ * `dRAID3`: requires at least 4 disks, three can fail before data is lost
+
+
+Additional information can be found on the manual page:
+
+----
+# man zpoolconcepts
+----
+
+Spares and Data
+^^^^^^^^^^^^^^^
+The number of `spares` tells the system how many disks it should keep ready in
+case of a disk failure. The default value is 0 `spares`. Without spares,
+rebuilding won't get any speed benefits.
+
+`data` defines the number of devices in a redundancy group. The default value is
+8. Except when `disks - parity - spares` equal something less than 8, the lower
+number is used. In general, a smaller number of `data` devices leads to higher
+IOPS, better compression ratios and faster resilvering, but defining fewer data
+devices reduces the available storage capacity of the pool.
+
+
Bootloader
~~~~~~~~~~
{pve} uses xref:sysboot_proxmox_boot_tool[`proxmox-boot-tool`] to manage the
bootloader configuration.
-See the chapter on xref:sysboot[{pve} host bootladers] for details.
+See the chapter on xref:sysboot[{pve} host bootloaders] for details.
ZFS Administration
Create a new zpool
^^^^^^^^^^^^^^^^^^
-To create a new pool, at least one disk is needed. The `ashift` should
-have the same sector-size (2 power of `ashift`) or larger as the
-underlying disk.
+To create a new pool, at least one disk is needed. The `ashift` should have the
+same sector-size (2 power of `ashift`) or larger as the underlying disk.
----
# zpool create -f -o ashift=12 <pool> <device>
----
+[TIP]
+====
+Pool names must adhere to the following rules:
+
+* begin with a letter (a-z or A-Z)
+* contain only alphanumeric, `-`, `_`, `.`, `:` or ` ` (space) characters
+* must *not begin* with one of `mirror`, `raidz`, `draid` or `spare`
+* must not be `log`
+====
+
To activate compression (see section <<zfs_compression,Compression in ZFS>>):
----
Add cache and log to an existing pool
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-If you have a pool without cache and log. First partition the SSD in
-2 partition with `parted` or `gdisk`
+If you have a pool without cache and log, first create 2 partitions on the SSD
+with `parted` or `gdisk`.
IMPORTANT: Always use GPT partition tables.
.Changing a failed bootable device
-Depending on how {pve} was installed it is either using `proxmox-boot-tool`
+Depending on how {pve} was installed it is either using `systemd-boot` or `grub`
+through `proxmox-boot-tool`
footnote:[Systems installed with {pve} 6.4 or later, EFI systems installed with
{pve} 5.4 or later] or plain `grub` as bootloader (see
xref:sysboot[Host Bootloader]). You can check by running:
bootable disks setup by the {pve} installer since version 5.4. For details, see
xref:sysboot_proxmox_boot_setup[Setting up a new partition for use as synced ESP].
-.With `grub`:
+.With plain `grub`:
----
# grub-install <new disk>
----
+NOTE: plain `grub` is only used on systems installed with {pve} 6.3 or earlier,
+which have not been manually migrated to using `proxmox-boot-tool` yet.
-Activate E-Mail Notification
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-ZFS comes with an event daemon, which monitors events generated by the
-ZFS kernel module. The daemon can also send emails on ZFS events like
-pool errors. Newer ZFS packages ship the daemon in a separate package,
-and you can install it using `apt-get`:
+Configure E-Mail Notification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----
-# apt-get install zfs-zed
-----
+ZFS comes with an event daemon `ZED`, which monitors events generated by the ZFS
+kernel module. The daemon can also send emails on ZFS events like pool errors.
+Newer ZFS packages ship the daemon in a separate `zfs-zed` package, which should
+already be installed by default in {pve}.
-To activate the daemon it is necessary to edit `/etc/zfs/zed.d/zed.rc` with your
-favourite editor, and uncomment the `ZED_EMAIL_ADDR` setting:
+You can configure the daemon via the file `/etc/zfs/zed.d/zed.rc` with your
+favorite editor. The required setting for email notification is
+`ZED_EMAIL_ADDR`, which is set to `root` by default.
--------
ZED_EMAIL_ADDR="root"
Please note {pve} forwards mails to `root` to the email address
configured for the root user.
-IMPORTANT: The only setting that is required is `ZED_EMAIL_ADDR`. All
-other settings are optional.
-
[[sysadmin_zfs_limit_memory_usage]]
Limit ZFS Memory Usage
time this value changes:
----
-# update-initramfs -u
+# update-initramfs -u -k all
----
You *must reboot* to activate these changes.
We strongly recommend to use enough memory, so that you normally do not
run into low memory situations. Should you need or want to add swap, it is
-preferred to create a partition on a physical disk and use it as swapdevice.
+preferred to create a partition on a physical disk and use it as a swap device.
You can leave some space free for this purpose in the advanced options of the
installer. Additionally, you can lower the
``swappiness'' value. A good value for servers is 10:
Encrypted ZFS Datasets
~~~~~~~~~~~~~~~~~~~~~~
+WARNING: Native ZFS encryption in {pve} is experimental. Known limitations and
+issues include Replication with encrypted datasets
+footnote:[https://bugzilla.proxmox.com/show_bug.cgi?id=2350],
+as well as checksum errors when using Snapshots or ZVOLs.
+footnote:[https://github.com/openzfs/zfs/issues/11688]
+
ZFS on Linux version 0.8.0 introduced support for native encryption of
datasets. After an upgrade from previous ZFS on Linux versions, the encryption
feature can be enabled per pool:
ZFS datasets expose the `special_small_blocks=<size>` property. `size` can be
`0` to disable storing small file blocks on the `special` device or a power of
-two in the range between `512B` to `128K`. After setting the property new file
+two in the range between `512B` to `1M`. After setting the property new file
blocks smaller than `size` will be allocated on the `special` device.
IMPORTANT: If the value for `special_small_blocks` is greater than or equal to