X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=local-zfs.adoc;h=5cce6778bd81dafa800e456a9bb1e8613c78a338;hp=6aae81e4a275c729a86514fd2b2d1b5321dcb59b;hb=cc38b9254cf54f33b3cd4c247ee273c1ddad146f;hpb=b2f242abe4c50227f5610767e6fcaa40654c2b88

diff --git a/local-zfs.adoc b/local-zfs.adoc
index 6aae81e..5cce677 100644
--- a/local-zfs.adoc
+++ b/local-zfs.adoc
@@ -1,6 +1,6 @@
+[[chapter_zfs]]
 ZFS on Linux
 ------------
-include::attributes.txt[]
 ifdef::wiki[]
 :pve-toplevel:
 endif::wiki[]
@@ -60,7 +60,7 @@ ZFS depends heavily on memory, so you need at least 8GB to start. In
 practice, use as much you can get for your hardware/budget. To prevent
 data corruption, we recommend the use of high quality ECC RAM.
 
-If you use a dedicated cache and/or log disk, you should use a
+If you use a dedicated cache and/or log disk, you should use an
 enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can
 increase the overall performance significantly.
 
@@ -154,15 +154,9 @@ rpool/swap        4.25G  7.69T    64K  -
 Bootloader
 ~~~~~~~~~~
 
-The default ZFS disk partitioning scheme does not use the first 2048
-sectors. This gives enough room to install a GRUB boot partition. The
-{pve} installer automatically allocates that space, and installs the
-GRUB boot loader there. If you use a redundant RAID setup, it installs
-the boot loader on all disk required for booting. So you can boot
-even if some disks fail.
-
-NOTE: It is not possible to use ZFS as root file system with UEFI
-boot.
+Depending on whether the system is booted in EFI or legacy BIOS mode the
+{pve} installer sets up either `grub` or `systemd-boot` as main bootloader.
+See the chapter on xref:sysboot[{pve} host bootladers] for details.
 
 
 ZFS Administration
@@ -184,41 +178,55 @@ To create a new pool, at least one disk is needed. The `ashift` should
 have the same sector-size (2 power of `ashift`) or larger as the
 underlying disk.
 
- zpool create -f -o ashift=12 <pool> <device>
+----
+# zpool create -f -o ashift=12 <pool> <device>
+----
 
-To activate compression
+To activate compression (see section <<zfs_compression,Compression in ZFS>>):
 
- zfs set compression=lz4 <pool>
+----
+# zfs set compression=lz4 <pool>
+----
 
 .Create a new pool with RAID-0
 
-Minimum 1 Disk
+Minimum 1 disk
 
- zpool create -f -o ashift=12 <pool> <device1> <device2>
+----
+# zpool create -f -o ashift=12 <pool> <device1> <device2>
+----
 
 .Create a new pool with RAID-1
 
-Minimum 2 Disks
+Minimum 2 disks
 
- zpool create -f -o ashift=12 <pool> mirror <device1> <device2>
+----
+# zpool create -f -o ashift=12 <pool> mirror <device1> <device2>
+----
 
 .Create a new pool with RAID-10
 
-Minimum 4 Disks
+Minimum 4 disks
 
- zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4> 
+----
+# zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4>
+----
 
 .Create a new pool with RAIDZ-1
 
-Minimum 3 Disks
+Minimum 3 disks
 
- zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3>
+----
+# zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3>
+----
 
 .Create a new pool with RAIDZ-2
 
-Minimum 4 Disks
+Minimum 4 disks
 
- zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4>
+----
+# zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4>
+----
 
 .Create a new pool with cache (L2ARC)
 
@@ -228,7 +236,9 @@ the performance (use SSD).
 As `<device>` it is possible to use more devices, like it's shown in
 "Create a new pool with RAID*".
 
- zpool create -f -o ashift=12 <pool> <device> cache <cache_device>
+----
+# zpool create -f -o ashift=12 <pool> <device> cache <cache_device>
+----
 
 .Create a new pool with log (ZIL)
 
@@ -238,11 +248,13 @@ the performance(SSD).
 As `<device>` it is possible to use more devices, like it's shown in
 "Create a new pool with RAID*".
 
- zpool create -f -o ashift=12 <pool> <device> log <log_device>
+----
+# zpool create -f -o ashift=12 <pool> <device> log <log_device>
+----
 
 .Add cache and log to an existing pool
 
-If you have an pool without cache and log. First partition the SSD in
+If you have a pool without cache and log. First partition the SSD in
 2 partition with `parted` or `gdisk`
 
 IMPORTANT: Always use GPT partition tables.
@@ -251,11 +263,29 @@ The maximum size of a log device should be about half the size of
 physical memory, so this is usually quite small. The rest of the SSD
 can be used as cache.
 
- zpool add -f <pool> log <device-part1> cache <device-part2> 
+----
+# zpool add -f <pool> log <device-part1> cache <device-part2>
+----
 
 .Changing a failed device
 
- zpool replace -f <pool> <old device> <new-device>
+----
+# zpool replace -f <pool> <old device> <new device>
+----
+
+.Changing a failed bootable device when using systemd-boot
+
+----
+# sgdisk <healthy bootable device> -R <new device>
+# sgdisk -G <new device>
+# zpool replace -f <pool> <old zfs partition> <new zfs partition>
+# pve-efiboot-tool format <new disk's ESP>
+# pve-efiboot-tool init <new disk's ESP>
+----
+
+NOTE: `ESP` stands for EFI System Partition, which is setup as partition #2 on
+bootable disks setup by the {pve} installer since version 5.4. For details, see
+xref:sysboot_systemd_boot_setup[Setting up a new partition for use as synced ESP].
 
 
 Activate E-Mail Notification
@@ -263,7 +293,12 @@ Activate E-Mail Notification
 
 ZFS comes with an event daemon, which monitors events generated by the
 ZFS kernel module. The daemon can also send emails on ZFS events like
-pool errors.
+pool errors. Newer ZFS packages ship the daemon in a separate package,
+and you can install it using `apt-get`:
+
+----
+# apt-get install zfs-zed
+----
 
 To activate the daemon it is necessary to edit `/etc/zfs/zed.d/zed.rc` with your
 favourite editor, and uncomment the `ZED_EMAIL_ADDR` setting:
@@ -298,21 +333,30 @@ This example setting limits the usage to 8GB.
 If your root file system is ZFS you must update your initramfs every
 time this value changes:
 
- update-initramfs -u
+----
+# update-initramfs -u
+----
 ====
 
 
-.SWAP on ZFS
+[[zfs_swap]]
+SWAP on ZFS
+~~~~~~~~~~~
 
-SWAP on ZFS on Linux may generate some troubles, like blocking the
+Swap-space created on a zvol may generate some troubles, like blocking the
 server or generating a high IO load, often seen when starting a Backup
 to an external Storage.
 
 We strongly recommend to use enough memory, so that you normally do not
-run into low memory situations. Additionally, you can lower the
+run into low memory situations. Should you need or want to add swap, it is
+preferred to create a partition on a physical disk and use it as swapdevice.
+You can leave some space free for this purpose in the advanced options of the
+installer. Additionally, you can lower the
 ``swappiness'' value. A good value for servers is 10:
 
- sysctl -w vm.swappiness=10
+----
+# sysctl -w vm.swappiness=10
+----
 
 To make the swappiness persistent, open `/etc/sysctl.conf` with
 an editor of your choice and add the following line:
@@ -334,3 +378,176 @@ improve performance when sufficient memory exists in a system.
 | vm.swappiness = 60  | The default value.
 | vm.swappiness = 100 | The kernel will swap aggressively.
 |===========================================================
+
+[[zfs_encryption]]
+Encrypted ZFS Datasets
+~~~~~~~~~~~~~~~~~~~~~~
+
+ZFS on Linux version 0.8.0 introduced support for native encryption of
+datasets. After an upgrade from previous ZFS on Linux versions, the encryption
+feature can be enabled per pool:
+
+----
+# zpool get feature@encryption tank
+NAME  PROPERTY            VALUE            SOURCE
+tank  feature@encryption  disabled         local
+
+# zpool set feature@encryption=enabled
+
+# zpool get feature@encryption tank
+NAME  PROPERTY            VALUE            SOURCE
+tank  feature@encryption  enabled         local
+----
+
+WARNING: There is currently no support for booting from pools with encrypted
+datasets using Grub, and only limited support for automatically unlocking
+encrypted datasets on boot. Older versions of ZFS without encryption support
+will not be able to decrypt stored data.
+
+NOTE: It is recommended to either unlock storage datasets manually after
+booting, or to write a custom unit to pass the key material needed for
+unlocking on boot to `zfs load-key`.
+
+WARNING: Establish and test a backup procedure before enabling encryption of
+production data. If the associated key material/passphrase/keyfile has been
+lost, accessing the encrypted data is no longer possible.
+
+Encryption needs to be setup when creating datasets/zvols, and is inherited by
+default to child datasets. For example, to create an encrypted dataset
+`tank/encrypted_data` and configure it as storage in {pve}, run the following
+commands:
+
+----
+# zfs create -o encryption=on -o keyformat=passphrase tank/encrypted_data
+Enter passphrase:
+Re-enter passphrase:
+
+# pvesm add zfspool encrypted_zfs -pool tank/encrypted_data
+----
+
+All guest volumes/disks create on this storage will be encrypted with the
+shared key material of the parent dataset.
+
+To actually use the storage, the associated key material needs to be loaded
+with `zfs load-key`:
+
+----
+# zfs load-key tank/encrypted_data
+Enter passphrase for 'tank/encrypted_data':
+----
+
+It is also possible to use a (random) keyfile instead of prompting for a
+passphrase by setting the `keylocation` and `keyformat` properties, either at
+creation time or with `zfs change-key` on existing datasets:
+
+----
+# dd if=/dev/urandom of=/path/to/keyfile bs=32 count=1
+
+# zfs change-key -o keyformat=raw -o keylocation=file:///path/to/keyfile tank/encrypted_data
+----
+
+WARNING: When using a keyfile, special care needs to be taken to secure the
+keyfile against unauthorized access or accidental loss. Without the keyfile, it
+is not possible to access the plaintext data!
+
+A guest volume created underneath an encrypted dataset will have its
+`encryptionroot` property set accordingly. The key material only needs to be
+loaded once per encryptionroot to be available to all encrypted datasets
+underneath it.
+
+See the `encryptionroot`, `encryption`, `keylocation`, `keyformat` and
+`keystatus` properties, the `zfs load-key`, `zfs unload-key` and `zfs
+change-key` commands and the `Encryption` section from `man zfs` for more
+details and advanced usage.
+
+
+[[zfs_compression]]
+Compression in ZFS
+~~~~~~~~~~~~~~~~~~
+
+When compression is enabled on a dataset, ZFS tries to compress all *new*
+blocks before writing them and decompresses them on reading. Already
+existing data will not be compressed retroactively.
+
+You can enable compression with:
+
+----
+# zfs set compression=<algorithm> <dataset>
+----
+
+We recommend using the `lz4` algorithm, because it adds very little CPU
+overhead. Other algorithms like `lzjb` and `gzip-N`, where `N` is an
+integer from `1` (fastest) to `9` (best compression ratio), are also
+available. Depending on the algorithm and how compressible the data is,
+having compression enabled can even increase I/O performance.
+
+You can disable compression at any time with:
+
+----
+# zfs set compression=off <dataset>
+----
+
+Again, only new blocks will be affected by this change.
+
+
+ZFS Special Device
+~~~~~~~~~~~~~~~~~~
+
+Since version 0.8.0 ZFS supports `special` devices. A `special` device in a
+pool is used to store metadata, deduplication tables, and optionally small
+file blocks.
+
+A `special` device can improve the speed of a pool consisting of slow spinning
+hard disks with a lot of metadata changes. For example workloads that involve
+creating, updating or deleting a large number of files will benefit from the
+presence of a `special` device. ZFS datasets can also be configured to store
+whole small files on the `special` device which can further improve the
+performance. Use fast SSDs for the `special` device.
+
+IMPORTANT: The redundancy of the `special` device should match the one of the
+pool, since the `special` device is a point of failure for the whole pool.
+
+WARNING: Adding a `special` device to a pool cannot be undone!
+
+.Create a pool with `special` device and RAID-1:
+
+----
+# zpool create -f -o ashift=12 <pool> mirror <device1> <device2> special mirror <device3> <device4>
+----
+
+.Add a `special` device to an existing pool with RAID-1:
+
+----
+# zpool add <pool> special mirror <device1> <device2>
+----
+
+ZFS datasets expose the `special_small_blocks=<size>` property. `size` can be
+`0` to disable storing small file blocks on the `special` device or a power of
+two in the range between `512B` to `128K`. After setting the property new file
+blocks smaller than `size` will be allocated on the `special` device.
+
+IMPORTANT: If the value for `special_small_blocks` is greater than or equal to
+the `recordsize` (default `128K`) of the dataset, *all* data will be written to
+the `special` device, so be careful!
+
+Setting the `special_small_blocks` property on a pool will change the default
+value of that property for all child ZFS datasets (for example all containers
+in the pool will opt in for small file blocks).
+
+.Opt in for all file smaller than 4K-blocks pool-wide:
+
+----
+# zfs set special_small_blocks=4K <pool>
+----
+
+.Opt in for small file blocks for a single dataset:
+
+----
+# zfs set special_small_blocks=4K <pool>/<filesystem>
+----
+
+.Opt out from small file blocks for a single dataset:
+
+----
+# zfs set special_small_blocks=0 <pool>/<filesystem>
+----