ha: add warning against using 'static' mode with many services

[pve-docs.git] / local-zfs.adoc
diff --git a/local-zfs.adoc b/local-zfs.adoc

index ee0fb1084dc4f5131b6f18577ed4a1f5a5b8fa06..34eb06b9f08d8a8fee36a5ccb6a7fbe7ca3a5a30 100644 (file)
--- a/local-zfs.adoc
+++ b/local-zfs.adoc
@@ -32,7 +32,8 @@ management.
  
  * Copy-on-write clone
  
-* Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3
+* Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2, RAIDZ-3,
+dRAID, dRAID2, dRAID3
  
  * Can use SSD for cache
  
@@ -55,22 +56,22 @@ Hardware
  ~~~~~~~~
  
  ZFS depends heavily on memory, so you need at least 8GB to start. In
-practice, use as much you can get for your hardware/budget. To prevent
+practice, use as much as you can get for your hardware/budget. To prevent
  data corruption, we recommend the use of high quality ECC RAM.
  
  If you use a dedicated cache and/or log disk, you should use an
-enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can
+enterprise class SSD. This can
  increase the overall performance significantly.
  
-IMPORTANT: Do not use ZFS on top of hardware controller which has its
-own cache management. ZFS needs to directly communicate with disks. An
-HBA adapter is the way to go, or something like LSI controller flashed
-in ``IT'' mode.
+IMPORTANT: Do not use ZFS on top of a hardware RAID controller which has its
+own cache management. ZFS needs to communicate directly with the disks. An
+HBA adapter or something like an LSI controller flashed in ``IT'' mode is more
+appropriate.
  
  If you are experimenting with an installation of {pve} inside a VM
  (Nested Virtualization), don't use `virtio` for disks of that VM,
-since they are not supported by ZFS. Use IDE or SCSI instead (works
-also with `virtio` SCSI controller type).
+as they are not supported by ZFS. Use IDE or SCSI instead (also works
+with the `virtio` SCSI controller type).
  
  
  Installation as Root File System
@@ -166,17 +167,17 @@ Each `vdev` type has different performance behaviors. The two
  parameters of interest are the IOPS (Input/Output Operations per Second) and
  the bandwidth with which data can be written or read.
  
-A 'mirror' vdev (RAID1) will approximately behave like a single disk in regards
-to both parameters when writing data. When reading data if will behave like the
-number of disks in the mirror.
+A 'mirror' vdev (RAID1) will approximately behave like a single disk in regard
+to both parameters when writing data. When reading data the performance will
+scale linearly with the number of disks in the mirror.
  
  A common situation is to have 4 disks. When setting it up as 2 mirror vdevs
  (RAID10) the pool will have the write characteristics as two single disks in
-regard of IOPS and bandwidth. For read operations it will resemble 4 single
+regard to IOPS and bandwidth. For read operations it will resemble 4 single
  disks.
  
  A 'RAIDZ' of any redundancy level will approximately behave like a single disk
-in regard of IOPS with a lot of bandwidth. How much bandwidth depends on the
+in regard to IOPS with a lot of bandwidth. How much bandwidth depends on the
  size of the RAIDZ vdev and the redundancy level.
  
  For running VMs, IOPS is the more important metric in most situations.
@@ -234,7 +235,7 @@ There are a few options to counter the increased use of space:
  The `volblocksize` property can only be set when creating a ZVOL. The default
  value can be changed in the storage configuration. When doing this, the guest
  needs to be tuned accordingly and depending on the use case, the problem of
-write amplification if just moved from the ZFS layer up to the guest.
+write amplification is just moved from the ZFS layer up to the guest.
  
  Using `ashift=9` when creating the pool can lead to bad
  performance, depending on the disks underneath, and cannot be changed later on.
@@ -244,12 +245,53 @@ them, unless your environment has specific needs and characteristics where
  RAIDZ performance characteristics are acceptable.
  
  
+ZFS dRAID
+~~~~~~~~~
+
+In a ZFS dRAID (declustered RAID) the hot spare drive(s) participate in the RAID.
+Their spare capacity is reserved and used for rebuilding when one drive fails.
+This provides, depending on the configuration, faster rebuilding compared to a
+RAIDZ in case of drive failure. More information can be found in the official
+OpenZFS documentation. footnote:[OpenZFS dRAID
+https://openzfs.github.io/openzfs-docs/Basic%20Concepts/dRAID%20Howto.html]
+
+NOTE: dRAID is intended for more than 10-15 disks in a dRAID. A RAIDZ
+setup should be better for a lower amount of disks in most use cases.
+
+NOTE: The GUI requires one more disk than the minimum (i.e. dRAID1 needs 3). It
+expects that a spare disk is added as well.
+
+ * `dRAID1` or `dRAID`: requires at least 2 disks, one can fail before data is
+lost
+ * `dRAID2`: requires at least 3 disks, two can fail before data is lost
+ * `dRAID3`: requires at least 4 disks, three can fail before data is lost
+
+
+Additional information can be found on the manual page:
+
+----
+# man zpoolconcepts
+----
+
+Spares and Data
+^^^^^^^^^^^^^^^
+The number of `spares` tells the system how many disks it should keep ready in
+case of a disk failure. The default value is 0 `spares`. Without spares,
+rebuilding won't get any speed benefits.
+
+`data` defines the number of devices in a redundancy group. The default value is
+8. Except when `disks - parity - spares` equal something less than 8, the lower
+number is used. In general, a smaller number of `data` devices leads to higher
+IOPS, better compression ratios and faster resilvering, but defining fewer data
+devices reduces the available storage capacity of the pool.
+
+
  Bootloader
  ~~~~~~~~~~
  
-Depending on whether the system is booted in EFI or legacy BIOS mode the
-{pve} installer sets up either `grub` or `systemd-boot` as main bootloader.
-See the chapter on xref:sysboot[{pve} host bootladers] for details.
+{pve} uses xref:sysboot_proxmox_boot_tool[`proxmox-boot-tool`] to manage the
+bootloader configuration.
+See the chapter on xref:sysboot[{pve} host bootloaders] for details.
  
  
  ZFS Administration
@@ -364,8 +406,8 @@ As `<device>` it is possible to use more devices, like it's shown in
  Add cache and log to an existing pool
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  
-If you have a pool without cache and log. First partition the SSD in
-2 partition with `parted` or `gdisk`
+If you have a pool without cache and log, first create 2 partitions on the SSD
+with `parted` or `gdisk`.
  
  IMPORTANT: Always use GPT partition tables.
  
@@ -387,8 +429,15 @@ Changing a failed device
  
  .Changing a failed bootable device
  
-Depending on how {pve} was installed it is either using `grub` or `systemd-boot`
-as bootloader (see xref:sysboot[Host Bootloader]).
+Depending on how {pve} was installed it is either using `systemd-boot` or `grub`
+through `proxmox-boot-tool`
+footnote:[Systems installed with {pve} 6.4 or later, EFI systems installed with
+{pve} 5.4 or later] or plain `grub` as bootloader (see
+xref:sysboot[Host Bootloader]). You can check by running:
+
+----
+# proxmox-boot-tool status
+----
  
  The first steps of copying the partition table, reissuing GUIDs and replacing
  the ZFS partition are the same. To make the system bootable from the new disk,
@@ -403,37 +452,37 @@ different steps are needed which depend on the bootloader in use.
  NOTE: Use the `zpool status -v` command to monitor how far the resilvering
  process of the new disk has progressed.
  
-.With `systemd-boot`:
+.With `proxmox-boot-tool`:
  
  ----
-# pve-efiboot-tool format <new disk's ESP>
-# pve-efiboot-tool init <new disk's ESP>
+# proxmox-boot-tool format <new disk's ESP>
+# proxmox-boot-tool init <new disk's ESP>
  ----
  
  NOTE: `ESP` stands for EFI System Partition, which is setup as partition #2 on
  bootable disks setup by the {pve} installer since version 5.4. For details, see
-xref:sysboot_systemd_boot_setup[Setting up a new partition for use as synced ESP].
+xref:sysboot_proxmox_boot_setup[Setting up a new partition for use as synced ESP].
  
-.With `grub`:
+.With plain `grub`:
  
  ----
  # grub-install <new disk>
  ----
+NOTE: plain `grub` is only used on systems installed with {pve} 6.3 or earlier,
+which have not been manually migrated to using `proxmox-boot-tool` yet.
  
-Activate E-Mail Notification
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
-ZFS comes with an event daemon, which monitors events generated by the
-ZFS kernel module. The daemon can also send emails on ZFS events like
-pool errors. Newer ZFS packages ship the daemon in a separate package,
-and you can install it using `apt-get`:
+Configure E-Mail Notification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
-----
-# apt-get install zfs-zed
-----
+ZFS comes with an event daemon `ZED`, which monitors events generated by the ZFS
+kernel module. The daemon can also send emails on ZFS events like pool errors.
+Newer ZFS packages ship the daemon in a separate `zfs-zed` package, which should
+already be installed by default in {pve}.
  
-To activate the daemon it is necessary to edit `/etc/zfs/zed.d/zed.rc` with your
-favourite editor, and uncomment the `ZED_EMAIL_ADDR` setting:
+You can configure the daemon via the file `/etc/zfs/zed.d/zed.rc` with your
+favorite editor. The required setting for email notification is
+`ZED_EMAIL_ADDR`, which is set to `root` by default.
  
  --------
  ZED_EMAIL_ADDR="root"
@@ -442,9 +491,6 @@ ZED_EMAIL_ADDR="root"
  Please note {pve} forwards mails to `root` to the email address
  configured for the root user.
  
-IMPORTANT: The only setting that is required is `ZED_EMAIL_ADDR`. All
-other settings are optional.
-
  
  [[sysadmin_zfs_limit_memory_usage]]
  Limit ZFS Memory Usage
@@ -473,13 +519,26 @@ options zfs zfs_arc_max=8589934592
  
  This example setting limits the usage to 8 GiB ('8 * 2^30^').
  
+IMPORTANT: In case your desired +zfs_arc_max+ value is lower than or equal to
++zfs_arc_min+ (which defaults to 1/32 of the system memory), +zfs_arc_max+ will
+be ignored unless you also set +zfs_arc_min+ to at most +zfs_arc_max - 1+.
+
+----
+echo "$[8 * 1024*1024*1024 - 1]" >/sys/module/zfs/parameters/zfs_arc_min
+echo "$[8 * 1024*1024*1024]" >/sys/module/zfs/parameters/zfs_arc_max
+----
+
+This example setting (temporarily) limits the usage to 8 GiB ('8 * 2^30^') on
+systems with more than 256 GiB of total memory, where simply setting
++zfs_arc_max+ alone would not work.
+
  [IMPORTANT]
  ====
  If your root file system is ZFS, you must update your initramfs every
  time this value changes:
  
  ----
-# update-initramfs -u
+# update-initramfs -u -k all
  ----
  
  You *must reboot* to activate these changes.
@@ -496,7 +555,7 @@ to an external Storage.
  
  We strongly recommend to use enough memory, so that you normally do not
  run into low memory situations. Should you need or want to add swap, it is
-preferred to create a partition on a physical disk and use it as swapdevice.
+preferred to create a partition on a physical disk and use it as a swap device.
  You can leave some space free for this purpose in the advanced options of the
  installer. Additionally, you can lower the
  ``swappiness'' value. A good value for servers is 10:
@@ -530,6 +589,12 @@ improve performance when sufficient memory exists in a system.
  Encrypted ZFS Datasets
  ~~~~~~~~~~~~~~~~~~~~~~
  
+WARNING: Native ZFS encryption in {pve} is experimental. Known limitations and
+issues include Replication with encrypted datasets
+footnote:[https://bugzilla.proxmox.com/show_bug.cgi?id=2350],
+as well as checksum errors when using Snapshots or ZVOLs.
+footnote:[https://github.com/openzfs/zfs/issues/11688]
+
  ZFS on Linux version 0.8.0 introduced support for native encryption of
  datasets. After an upgrade from previous ZFS on Linux versions, the encryption
  feature can be enabled per pool:
@@ -671,7 +736,7 @@ WARNING: Adding a `special` device to a pool cannot be undone!
  
  ZFS datasets expose the `special_small_blocks=<size>` property. `size` can be
  `0` to disable storing small file blocks on the `special` device or a power of
-two in the range between `512B` to `128K`. After setting the property new file
+two in the range between `512B` to `1M`. After setting the property new file
  blocks smaller than `size` will be allocated on the `special` device.
  
  IMPORTANT: If the value for `special_small_blocks` is greater than or equal to