X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=local-zfs.adoc;h=5cce6778bd81dafa800e456a9bb1e8613c78a338;hp=fab009336b9b1bfb9667564505b0c88e07288456;hb=cc38b9254cf54f33b3cd4c247ee273c1ddad146f;hpb=5eba07434fd010e7b96459da2a5bb676a62fe8b1 diff --git a/local-zfs.adoc b/local-zfs.adoc index fab0093..5cce677 100644 --- a/local-zfs.adoc +++ b/local-zfs.adoc @@ -1,6 +1,9 @@ +[[chapter_zfs]] ZFS on Linux ------------ -include::attributes.txt[] +ifdef::wiki[] +:pve-toplevel: +endif::wiki[] ZFS is a combined file system and logical volume manager designed by Sun Microsystems. Starting with {pve} 3.4, the native Linux @@ -57,7 +60,7 @@ ZFS depends heavily on memory, so you need at least 8GB to start. In practice, use as much you can get for your hardware/budget. To prevent data corruption, we recommend the use of high quality ECC RAM. -If you use a dedicated cache and/or log disk, you should use a +If you use a dedicated cache and/or log disk, you should use an enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can increase the overall performance significantly. @@ -151,15 +154,9 @@ rpool/swap 4.25G 7.69T 64K - Bootloader ~~~~~~~~~~ -The default ZFS disk partitioning scheme does not use the first 2048 -sectors. This gives enough room to install a GRUB boot partition. The -{pve} installer automatically allocates that space, and installs the -GRUB boot loader there. If you use a redundant RAID setup, it installs -the boot loader on all disk required for booting. So you can boot -even if some disks fail. - -NOTE: It is not possible to use ZFS as root partition with UEFI -boot. +Depending on whether the system is booted in EFI or legacy BIOS mode the +{pve} installer sets up either `grub` or `systemd-boot` as main bootloader. +See the chapter on xref:sysboot[{pve} host bootladers] for details. ZFS Administration @@ -181,41 +178,55 @@ To create a new pool, at least one disk is needed. The `ashift` should have the same sector-size (2 power of `ashift`) or larger as the underlying disk. - zpool create -f -o ashift=12 +---- +# zpool create -f -o ashift=12 +---- -To activate compression +To activate compression (see section <>): - zfs set compression=lz4 +---- +# zfs set compression=lz4 +---- .Create a new pool with RAID-0 -Minimum 1 Disk +Minimum 1 disk - zpool create -f -o ashift=12 +---- +# zpool create -f -o ashift=12 +---- .Create a new pool with RAID-1 -Minimum 2 Disks +Minimum 2 disks - zpool create -f -o ashift=12 mirror +---- +# zpool create -f -o ashift=12 mirror +---- .Create a new pool with RAID-10 -Minimum 4 Disks +Minimum 4 disks - zpool create -f -o ashift=12 mirror mirror +---- +# zpool create -f -o ashift=12 mirror mirror +---- .Create a new pool with RAIDZ-1 -Minimum 3 Disks +Minimum 3 disks - zpool create -f -o ashift=12 raidz1 +---- +# zpool create -f -o ashift=12 raidz1 +---- .Create a new pool with RAIDZ-2 -Minimum 4 Disks +Minimum 4 disks - zpool create -f -o ashift=12 raidz2 +---- +# zpool create -f -o ashift=12 raidz2 +---- .Create a new pool with cache (L2ARC) @@ -225,7 +236,9 @@ the performance (use SSD). As `` it is possible to use more devices, like it's shown in "Create a new pool with RAID*". - zpool create -f -o ashift=12 cache +---- +# zpool create -f -o ashift=12 cache +---- .Create a new pool with log (ZIL) @@ -235,24 +248,44 @@ the performance(SSD). As `` it is possible to use more devices, like it's shown in "Create a new pool with RAID*". - zpool create -f -o ashift=12 log +---- +# zpool create -f -o ashift=12 log +---- .Add cache and log to an existing pool -If you have an pool without cache and log. First partition the SSD in +If you have a pool without cache and log. First partition the SSD in 2 partition with `parted` or `gdisk` -IMPORTANT: Always use GPT partition tables (gdisk or parted). +IMPORTANT: Always use GPT partition tables. The maximum size of a log device should be about half the size of physical memory, so this is usually quite small. The rest of the SSD can be used as cache. - zpool add -f log cache +---- +# zpool add -f log cache +---- .Changing a failed device - zpool replace -f +---- +# zpool replace -f +---- + +.Changing a failed bootable device when using systemd-boot + +---- +# sgdisk -R +# sgdisk -G +# zpool replace -f +# pve-efiboot-tool format +# pve-efiboot-tool init +---- + +NOTE: `ESP` stands for EFI System Partition, which is setup as partition #2 on +bootable disks setup by the {pve} installer since version 5.4. For details, see +xref:sysboot_systemd_boot_setup[Setting up a new partition for use as synced ESP]. Activate E-Mail Notification @@ -260,12 +293,19 @@ Activate E-Mail Notification ZFS comes with an event daemon, which monitors events generated by the ZFS kernel module. The daemon can also send emails on ZFS events like -pool errors. +pool errors. Newer ZFS packages ship the daemon in a separate package, +and you can install it using `apt-get`: + +---- +# apt-get install zfs-zed +---- To activate the daemon it is necessary to edit `/etc/zfs/zed.d/zed.rc` with your favourite editor, and uncomment the `ZED_EMAIL_ADDR` setting: +-------- ZED_EMAIL_ADDR="root" +-------- Please note {pve} forwards mails to `root` to the email address configured for the root user. @@ -293,26 +333,37 @@ This example setting limits the usage to 8GB. If your root file system is ZFS you must update your initramfs every time this value changes: - update-initramfs -u +---- +# update-initramfs -u +---- ==== -.SWAP on ZFS +[[zfs_swap]] +SWAP on ZFS +~~~~~~~~~~~ -SWAP on ZFS on Linux may generate some troubles, like blocking the +Swap-space created on a zvol may generate some troubles, like blocking the server or generating a high IO load, often seen when starting a Backup to an external Storage. We strongly recommend to use enough memory, so that you normally do not -run into low memory situations. Additionally, you can lower the +run into low memory situations. Should you need or want to add swap, it is +preferred to create a partition on a physical disk and use it as swapdevice. +You can leave some space free for this purpose in the advanced options of the +installer. Additionally, you can lower the ``swappiness'' value. A good value for servers is 10: - sysctl -w vm.swappiness=10 +---- +# sysctl -w vm.swappiness=10 +---- To make the swappiness persistent, open `/etc/sysctl.conf` with an editor of your choice and add the following line: - vm.swappiness = 10 +-------- +vm.swappiness = 10 +-------- .Linux kernel `swappiness` parameter values [width="100%",cols=" +---- + +We recommend using the `lz4` algorithm, because it adds very little CPU +overhead. Other algorithms like `lzjb` and `gzip-N`, where `N` is an +integer from `1` (fastest) to `9` (best compression ratio), are also +available. Depending on the algorithm and how compressible the data is, +having compression enabled can even increase I/O performance. + +You can disable compression at any time with: + +---- +# zfs set compression=off +---- + +Again, only new blocks will be affected by this change. + + +ZFS Special Device +~~~~~~~~~~~~~~~~~~ + +Since version 0.8.0 ZFS supports `special` devices. A `special` device in a +pool is used to store metadata, deduplication tables, and optionally small +file blocks. + +A `special` device can improve the speed of a pool consisting of slow spinning +hard disks with a lot of metadata changes. For example workloads that involve +creating, updating or deleting a large number of files will benefit from the +presence of a `special` device. ZFS datasets can also be configured to store +whole small files on the `special` device which can further improve the +performance. Use fast SSDs for the `special` device. + +IMPORTANT: The redundancy of the `special` device should match the one of the +pool, since the `special` device is a point of failure for the whole pool. + +WARNING: Adding a `special` device to a pool cannot be undone! + +.Create a pool with `special` device and RAID-1: + +---- +# zpool create -f -o ashift=12 mirror special mirror +---- + +.Add a `special` device to an existing pool with RAID-1: + +---- +# zpool add special mirror +---- + +ZFS datasets expose the `special_small_blocks=` property. `size` can be +`0` to disable storing small file blocks on the `special` device or a power of +two in the range between `512B` to `128K`. After setting the property new file +blocks smaller than `size` will be allocated on the `special` device. + +IMPORTANT: If the value for `special_small_blocks` is greater than or equal to +the `recordsize` (default `128K`) of the dataset, *all* data will be written to +the `special` device, so be careful! + +Setting the `special_small_blocks` property on a pool will change the default +value of that property for all child ZFS datasets (for example all containers +in the pool will opt in for small file blocks). + +.Opt in for all file smaller than 4K-blocks pool-wide: + +---- +# zfs set special_small_blocks=4K +---- + +.Opt in for small file blocks for a single dataset: + +---- +# zfs set special_small_blocks=4K / +---- + +.Opt out from small file blocks for a single dataset: + +---- +# zfs set special_small_blocks=0 / +----