]> git.proxmox.com Git - proxmox-backup.git/blame - docs/local-zfs.rst
docs: add documentation about the 'sync-level' tuning
[proxmox-backup.git] / docs / local-zfs.rst
CommitLineData
49df8ac1
OB
1
2.. _chapter-zfs:
3
859fe9c1 4ZFS on Linux
24406ebc 5------------
859fe9c1 6
7ccbce03 7ZFS is a combined file system and logical volume manager, designed by
512d50a4 8Sun Microsystems. There is no need to manually compile ZFS modules - all
859fe9c1
OB
9packages are included.
10
512d50a4 11By using ZFS, it's possible to achieve maximum enterprise features with
7ccbce03
DW
12low budget hardware, and also high performance systems by leveraging
13SSD caching or even SSD only setups. ZFS can replace expensive
14hardware raid cards with moderate CPU and memory load, combined with easy
859fe9c1
OB
15management.
16
7ccbce03 17General advantages of ZFS:
859fe9c1
OB
18
19* Easy configuration and management with GUI and CLI.
20* Reliable
21* Protection against data corruption
22* Data compression on file system level
23* Snapshots
24* Copy-on-write clone
25* Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3
26* Can use SSD for cache
27* Self healing
28* Continuous integrity checking
29* Designed for high storage capacities
859fe9c1
OB
30* Asynchronous replication over network
31* Open Source
32* Encryption
33
34Hardware
24406ebc 35~~~~~~~~~
859fe9c1 36
7ccbce03
DW
37ZFS depends heavily on memory, so it's recommended to have at least 8GB to
38start. In practice, use as much you can get for your hardware/budget. To prevent
859fe9c1
OB
39data corruption, we recommend the use of high quality ECC RAM.
40
41If you use a dedicated cache and/or log disk, you should use an
7ccbce03 42enterprise class SSD (for example, Intel SSD DC S3700 Series). This can
859fe9c1
OB
43increase the overall performance significantly.
44
7ccbce03 45IMPORTANT: Do not use ZFS on top of a hardware controller which has its
859fe9c1 46own cache management. ZFS needs to directly communicate with disks. An
7ccbce03
DW
47HBA adapter or something like an LSI controller flashed in ``IT`` mode is
48recommended.
859fe9c1
OB
49
50
859fe9c1 51ZFS Administration
24406ebc 52~~~~~~~~~~~~~~~~~~
859fe9c1
OB
53
54This section gives you some usage examples for common tasks. ZFS
55itself is really powerful and provides many options. The main commands
7ccbce03 56to manage ZFS are `zfs` and `zpool`. Both commands come with extensive
859fe9c1
OB
57manual pages, which can be read with:
58
59.. code-block:: console
24406ebc 60
859fe9c1
OB
61 # man zpool
62 # man zfs
63
64Create a new zpool
24406ebc 65^^^^^^^^^^^^^^^^^^
859fe9c1
OB
66
67To create a new pool, at least one disk is needed. The `ashift` should
68have the same sector-size (2 power of `ashift`) or larger as the
69underlying disk.
70
71.. code-block:: console
24406ebc 72
859fe9c1
OB
73 # zpool create -f -o ashift=12 <pool> <device>
74
75Create a new pool with RAID-0
24406ebc 76^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
859fe9c1
OB
77
78Minimum 1 disk
79
80.. code-block:: console
24406ebc 81
859fe9c1
OB
82 # zpool create -f -o ashift=12 <pool> <device1> <device2>
83
84Create a new pool with RAID-1
24406ebc 85^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
859fe9c1
OB
86
87Minimum 2 disks
88
89.. code-block:: console
24406ebc 90
859fe9c1
OB
91 # zpool create -f -o ashift=12 <pool> mirror <device1> <device2>
92
93Create a new pool with RAID-10
24406ebc 94^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
859fe9c1
OB
95
96Minimum 4 disks
97
98.. code-block:: console
24406ebc 99
859fe9c1
OB
100 # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4>
101
102Create a new pool with RAIDZ-1
24406ebc 103^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
859fe9c1
OB
104
105Minimum 3 disks
106
107.. code-block:: console
24406ebc 108
859fe9c1
OB
109 # zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3>
110
111Create a new pool with RAIDZ-2
24406ebc 112^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
859fe9c1
OB
113
114Minimum 4 disks
115
116.. code-block:: console
24406ebc 117
859fe9c1
OB
118 # zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4>
119
120Create a new pool with cache (L2ARC)
24406ebc 121^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
859fe9c1
OB
122
123It is possible to use a dedicated cache drive partition to increase
124the performance (use SSD).
125
7ccbce03 126For `<device>`, you can use multiple devices, as is shown in
859fe9c1
OB
127"Create a new pool with RAID*".
128
129.. code-block:: console
24406ebc 130
859fe9c1
OB
131 # zpool create -f -o ashift=12 <pool> <device> cache <cache_device>
132
133Create a new pool with log (ZIL)
24406ebc 134^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
859fe9c1
OB
135
136It is possible to use a dedicated cache drive partition to increase
137the performance (SSD).
138
7ccbce03 139For `<device>`, you can use multiple devices, as is shown in
859fe9c1
OB
140"Create a new pool with RAID*".
141
142.. code-block:: console
24406ebc 143
859fe9c1
OB
144 # zpool create -f -o ashift=12 <pool> <device> log <log_device>
145
146Add cache and log to an existing pool
24406ebc 147^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
859fe9c1 148
7ccbce03
DW
149You can add cache and log devices to a pool after its creation. In this example,
150we will use a single drive for both cache and log. First, you need to create
1512 partitions on the SSD with `parted` or `gdisk`
859fe9c1
OB
152
153.. important:: Always use GPT partition tables.
154
155The maximum size of a log device should be about half the size of
156physical memory, so this is usually quite small. The rest of the SSD
157can be used as cache.
158
159.. code-block:: console
24406ebc 160
859fe9c1
OB
161 # zpool add -f <pool> log <device-part1> cache <device-part2>
162
163
164Changing a failed device
24406ebc 165^^^^^^^^^^^^^^^^^^^^^^^^
859fe9c1
OB
166
167.. code-block:: console
24406ebc 168
859fe9c1
OB
169 # zpool replace -f <pool> <old device> <new device>
170
171
172Changing a failed bootable device
24406ebc 173^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
859fe9c1 174
7ccbce03
DW
175Depending on how Proxmox Backup was installed, it is either using `grub` or
176`systemd-boot` as a bootloader.
859fe9c1 177
7ccbce03
DW
178In either case, the first steps of copying the partition table, reissuing GUIDs
179and replacing the ZFS partition are the same. To make the system bootable from
180the new disk, different steps are needed which depend on the bootloader in use.
859fe9c1
OB
181
182.. code-block:: console
24406ebc 183
859fe9c1
OB
184 # sgdisk <healthy bootable device> -R <new device>
185 # sgdisk -G <new device>
186 # zpool replace -f <pool> <old zfs partition> <new zfs partition>
187
188.. NOTE:: Use the `zpool status -v` command to monitor how far the resilvering process of the new disk has progressed.
189
190With `systemd-boot`:
191
192.. code-block:: console
24406ebc 193
0ae5f762
SI
194 # proxmox-boot-tool format <new ESP>
195 # proxmox-boot-tool init <new ESP>
859fe9c1
OB
196
197.. NOTE:: `ESP` stands for EFI System Partition, which is setup as partition #2 on
0ae5f762
SI
198 bootable disks setup by the `Proxmox Backup`_ installer. For details, see
199 :ref:`Setting up a new partition for use as synced ESP <systembooting-proxmox-boot-setup>`.
859fe9c1
OB
200
201With `grub`:
202
203Usually `grub.cfg` is located in `/boot/grub/grub.cfg`
204
205.. code-block:: console
24406ebc 206
859fe9c1
OB
207 # grub-install <new disk>
208 # grub-mkconfig -o /path/to/grub.cfg
209
210
7ccbce03 211Activate e-mail notification
24406ebc 212^^^^^^^^^^^^^^^^^^^^^^^^^^^^
859fe9c1 213
6481fd24
DW
214ZFS comes with an event daemon, ``ZED``, which monitors events generated by the
215ZFS kernel module. The daemon can also send emails upon ZFS events, such as pool
5225817d
SI
216errors. Newer ZFS packages ship the daemon in a separate package ``zfs-zed``,
217which should already be installed by default in `Proxmox Backup`_.
859fe9c1 218
6481fd24
DW
219You can configure the daemon via the file ``/etc/zfs/zed.d/zed.rc``, using your
220preferred editor. The required setting for email notfication is
5225817d 221``ZED_EMAIL_ADDR``, which is set to ``root`` by default.
859fe9c1
OB
222
223.. code-block:: console
24406ebc 224
859fe9c1
OB
225 ZED_EMAIL_ADDR="root"
226
5225817d 227Please note that `Proxmox Backup`_ forwards mails to `root` to the email address
859fe9c1
OB
228configured for the root user.
229
859fe9c1 230
7ccbce03 231Limit ZFS memory usage
24406ebc 232^^^^^^^^^^^^^^^^^^^^^^
859fe9c1
OB
233
234It is good to use at most 50 percent (which is the default) of the
7ccbce03 235system memory for ZFS ARC, to prevent performance degradation of the
859fe9c1
OB
236host. Use your preferred editor to change the configuration in
237`/etc/modprobe.d/zfs.conf` and insert:
238
239.. code-block:: console
24406ebc 240
859fe9c1
OB
241 options zfs zfs_arc_max=8589934592
242
7ccbce03 243The above example limits the usage to 8 GiB ('8 * 2^30^').
859fe9c1 244
7ccbce03
DW
245.. IMPORTANT:: In case your desired `zfs_arc_max` value is lower than or equal
246 to `zfs_arc_min` (which defaults to 1/32 of the system memory), `zfs_arc_max`
247 will be ignored. Thus, for it to work in this case, you must set
248 `zfs_arc_min` to at most `zfs_arc_max - 1`. This would require updating the
249 configuration in `/etc/modprobe.d/zfs.conf`, with:
250
251.. code-block:: console
0ae5f762 252
7ccbce03
DW
253 options zfs zfs_arc_min=8589934591
254 options zfs zfs_arc_max=8589934592
255
256This example setting limits the usage to 8 GiB ('8 * 2^30^') on
257systems with more than 256 GiB of total memory, where simply setting
258`zfs_arc_max` alone would not work.
259
260.. IMPORTANT:: If your root file system is ZFS, you must update your initramfs
261 every time this value changes.
859fe9c1
OB
262
263.. code-block:: console
24406ebc 264
859fe9c1
OB
265 # update-initramfs -u
266
267
7ccbce03 268Swap on ZFS
24406ebc 269^^^^^^^^^^^
859fe9c1 270
7ccbce03 271Swap-space created on a zvol may cause some issues, such as blocking the
0ae5f762 272server or generating a high IO load.
859fe9c1 273
7ccbce03 274We strongly recommend using enough memory, so that you normally do not
859fe9c1 275run into low memory situations. Should you need or want to add swap, it is
7ccbce03 276preferred to create a partition on a physical disk and use it as a swap device.
859fe9c1 277You can leave some space free for this purpose in the advanced options of the
7ccbce03 278installer. Additionally, you can lower the `swappiness` value.
859fe9c1
OB
279A good value for servers is 10:
280
281.. code-block:: console
24406ebc 282
859fe9c1
OB
283 # sysctl -w vm.swappiness=10
284
285To make the swappiness persistent, open `/etc/sysctl.conf` with
286an editor of your choice and add the following line:
287
288.. code-block:: console
24406ebc 289
859fe9c1
OB
290 vm.swappiness = 10
291
292.. table:: Linux kernel `swappiness` parameter values
293 :widths:auto
24406ebc
TL
294
295 ==================== ===============================================================
859fe9c1 296 Value Strategy
24406ebc 297 ==================== ===============================================================
859fe9c1
OB
298 vm.swappiness = 0 The kernel will swap only to avoid an 'out of memory' condition
299 vm.swappiness = 1 Minimum amount of swapping without disabling it entirely.
24406ebc 300 vm.swappiness = 10 Sometimes recommended to improve performance when sufficient memory exists in a system.
859fe9c1
OB
301 vm.swappiness = 60 The default value.
302 vm.swappiness = 100 The kernel will swap aggressively.
24406ebc 303 ==================== ===============================================================
859fe9c1 304
7ccbce03 305ZFS compression
24406ebc 306^^^^^^^^^^^^^^^
859fe9c1
OB
307
308To activate compression:
0ae5f762 309
859fe9c1 310.. code-block:: console
24406ebc 311
859fe9c1
OB
312 # zpool set compression=lz4 <pool>
313
314We recommend using the `lz4` algorithm, since it adds very little CPU overhead.
0ae5f762 315Other algorithms such as `lzjb`, `zstd` and `gzip-N` (where `N` is an integer from `1-9`
7ccbce03
DW
316representing the compression ratio, where 1 is fastest and 9 is best
317compression) are also available. Depending on the algorithm and how
318compressible the data is, having compression enabled can even increase I/O
319performance.
859fe9c1
OB
320
321You can disable compression at any time with:
0ae5f762 322
859fe9c1 323.. code-block:: console
24406ebc 324
859fe9c1
OB
325 # zfs set compression=off <dataset>
326
327Only new blocks will be affected by this change.
328
7d4bf881
TL
329.. _local_zfs_special_device:
330
7ccbce03 331ZFS special device
24406ebc 332^^^^^^^^^^^^^^^^^^
859fe9c1 333
7ccbce03 334Since version 0.8.0, ZFS supports `special` devices. A `special` device in a
859fe9c1
OB
335pool is used to store metadata, deduplication tables, and optionally small
336file blocks.
337
338A `special` device can improve the speed of a pool consisting of slow spinning
7ccbce03 339hard disks with a lot of metadata changes. For example, workloads that involve
859fe9c1
OB
340creating, updating or deleting a large number of files will benefit from the
341presence of a `special` device. ZFS datasets can also be configured to store
7ccbce03 342small files on the `special` device, which can further improve the
859fe9c1
OB
343performance. Use fast SSDs for the `special` device.
344
345.. IMPORTANT:: The redundancy of the `special` device should match the one of the
7ccbce03 346 pool, since the `special` device is a point of failure for the entire pool.
859fe9c1
OB
347
348.. WARNING:: Adding a `special` device to a pool cannot be undone!
349
7ccbce03 350To create a pool with `special` device and RAID-1:
859fe9c1
OB
351
352.. code-block:: console
24406ebc 353
859fe9c1
OB
354 # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> special mirror <device3> <device4>
355
356Adding a `special` device to an existing pool with RAID-1:
357
358.. code-block:: console
24406ebc 359
859fe9c1
OB
360 # zpool add <pool> special mirror <device1> <device2>
361
362ZFS datasets expose the `special_small_blocks=<size>` property. `size` can be
7ccbce03
DW
363`0` to disable storing small file blocks on the `special` device, or a power of
364two in the range between `512B` to `128K`. After setting this property, new file
859fe9c1
OB
365blocks smaller than `size` will be allocated on the `special` device.
366
367.. IMPORTANT:: If the value for `special_small_blocks` is greater than or equal to
368 the `recordsize` (default `128K`) of the dataset, *all* data will be written to
369 the `special` device, so be careful!
370
371Setting the `special_small_blocks` property on a pool will change the default
7ccbce03 372value of that property for all child ZFS datasets (for example, all containers
859fe9c1
OB
373in the pool will opt in for small file blocks).
374
7ccbce03 375Opt in for all files smaller than 4K-blocks pool-wide:
859fe9c1
OB
376
377.. code-block:: console
24406ebc 378
859fe9c1
OB
379 # zfs set special_small_blocks=4K <pool>
380
381Opt in for small file blocks for a single dataset:
382
383.. code-block:: console
24406ebc 384
859fe9c1
OB
385 # zfs set special_small_blocks=4K <pool>/<filesystem>
386
387Opt out from small file blocks for a single dataset:
388
389.. code-block:: console
24406ebc 390
859fe9c1
OB
391 # zfs set special_small_blocks=0 <pool>/<filesystem>
392
393Troubleshooting
24406ebc 394^^^^^^^^^^^^^^^
859fe9c1 395
7ccbce03
DW
396Corrupt cache file
397""""""""""""""""""
398
399`zfs-import-cache.service` imports ZFS pools using the ZFS cache file. If this
400file becomes corrupted, the service won't be able to import the pools that it's
401unable to read from it.
859fe9c1 402
7ccbce03
DW
403As a result, in case of a corrupted ZFS cache file, some volumes may not be
404mounted during boot and must be mounted manually later.
859fe9c1
OB
405
406For each pool, run:
407
408.. code-block:: console
24406ebc 409
859fe9c1
OB
410 # zpool set cachefile=/etc/zfs/zpool.cache POOLNAME
411
7ccbce03 412then, update the `initramfs` by running:
859fe9c1
OB
413
414.. code-block:: console
24406ebc 415
859fe9c1
OB
416 # update-initramfs -u -k all
417
7ccbce03 418and finally, reboot the node.
859fe9c1
OB
419
420Another workaround to this problem is enabling the `zfs-import-scan.service`,
421which searches and imports pools via device scanning (usually slower).