]>
Commit | Line | Data |
---|---|---|
49df8ac1 OB |
1 | |
2 | .. _chapter-zfs: | |
3 | ||
859fe9c1 | 4 | ZFS on Linux |
24406ebc | 5 | ------------ |
859fe9c1 | 6 | |
7ccbce03 | 7 | ZFS is a combined file system and logical volume manager, designed by |
512d50a4 | 8 | Sun Microsystems. There is no need to manually compile ZFS modules - all |
859fe9c1 OB |
9 | packages are included. |
10 | ||
512d50a4 | 11 | By using ZFS, it's possible to achieve maximum enterprise features with |
7ccbce03 DW |
12 | low budget hardware, and also high performance systems by leveraging |
13 | SSD caching or even SSD only setups. ZFS can replace expensive | |
14 | hardware raid cards with moderate CPU and memory load, combined with easy | |
859fe9c1 OB |
15 | management. |
16 | ||
7ccbce03 | 17 | General advantages of ZFS: |
859fe9c1 OB |
18 | |
19 | * Easy configuration and management with GUI and CLI. | |
20 | * Reliable | |
21 | * Protection against data corruption | |
22 | * Data compression on file system level | |
23 | * Snapshots | |
24 | * Copy-on-write clone | |
25 | * Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3 | |
26 | * Can use SSD for cache | |
27 | * Self healing | |
28 | * Continuous integrity checking | |
29 | * Designed for high storage capacities | |
859fe9c1 OB |
30 | * Asynchronous replication over network |
31 | * Open Source | |
32 | * Encryption | |
33 | ||
34 | Hardware | |
24406ebc | 35 | ~~~~~~~~~ |
859fe9c1 | 36 | |
7ccbce03 DW |
37 | ZFS depends heavily on memory, so it's recommended to have at least 8GB to |
38 | start. In practice, use as much you can get for your hardware/budget. To prevent | |
859fe9c1 OB |
39 | data corruption, we recommend the use of high quality ECC RAM. |
40 | ||
41 | If you use a dedicated cache and/or log disk, you should use an | |
7ccbce03 | 42 | enterprise class SSD (for example, Intel SSD DC S3700 Series). This can |
859fe9c1 OB |
43 | increase the overall performance significantly. |
44 | ||
7ccbce03 | 45 | IMPORTANT: Do not use ZFS on top of a hardware controller which has its |
859fe9c1 | 46 | own cache management. ZFS needs to directly communicate with disks. An |
7ccbce03 DW |
47 | HBA adapter or something like an LSI controller flashed in ``IT`` mode is |
48 | recommended. | |
859fe9c1 OB |
49 | |
50 | ||
859fe9c1 | 51 | ZFS Administration |
24406ebc | 52 | ~~~~~~~~~~~~~~~~~~ |
859fe9c1 OB |
53 | |
54 | This section gives you some usage examples for common tasks. ZFS | |
55 | itself is really powerful and provides many options. The main commands | |
7ccbce03 | 56 | to manage ZFS are `zfs` and `zpool`. Both commands come with extensive |
859fe9c1 OB |
57 | manual pages, which can be read with: |
58 | ||
59 | .. code-block:: console | |
24406ebc | 60 | |
859fe9c1 OB |
61 | # man zpool |
62 | # man zfs | |
63 | ||
64 | Create a new zpool | |
24406ebc | 65 | ^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
66 | |
67 | To create a new pool, at least one disk is needed. The `ashift` should | |
68 | have the same sector-size (2 power of `ashift`) or larger as the | |
69 | underlying disk. | |
70 | ||
71 | .. code-block:: console | |
24406ebc | 72 | |
859fe9c1 OB |
73 | # zpool create -f -o ashift=12 <pool> <device> |
74 | ||
75 | Create a new pool with RAID-0 | |
24406ebc | 76 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
77 | |
78 | Minimum 1 disk | |
79 | ||
80 | .. code-block:: console | |
24406ebc | 81 | |
859fe9c1 OB |
82 | # zpool create -f -o ashift=12 <pool> <device1> <device2> |
83 | ||
84 | Create a new pool with RAID-1 | |
24406ebc | 85 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
86 | |
87 | Minimum 2 disks | |
88 | ||
89 | .. code-block:: console | |
24406ebc | 90 | |
859fe9c1 OB |
91 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> |
92 | ||
93 | Create a new pool with RAID-10 | |
24406ebc | 94 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
95 | |
96 | Minimum 4 disks | |
97 | ||
98 | .. code-block:: console | |
24406ebc | 99 | |
859fe9c1 OB |
100 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4> |
101 | ||
102 | Create a new pool with RAIDZ-1 | |
24406ebc | 103 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
104 | |
105 | Minimum 3 disks | |
106 | ||
107 | .. code-block:: console | |
24406ebc | 108 | |
859fe9c1 OB |
109 | # zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3> |
110 | ||
111 | Create a new pool with RAIDZ-2 | |
24406ebc | 112 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
113 | |
114 | Minimum 4 disks | |
115 | ||
116 | .. code-block:: console | |
24406ebc | 117 | |
859fe9c1 OB |
118 | # zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4> |
119 | ||
120 | Create a new pool with cache (L2ARC) | |
24406ebc | 121 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
122 | |
123 | It is possible to use a dedicated cache drive partition to increase | |
124 | the performance (use SSD). | |
125 | ||
7ccbce03 | 126 | For `<device>`, you can use multiple devices, as is shown in |
859fe9c1 OB |
127 | "Create a new pool with RAID*". |
128 | ||
129 | .. code-block:: console | |
24406ebc | 130 | |
859fe9c1 OB |
131 | # zpool create -f -o ashift=12 <pool> <device> cache <cache_device> |
132 | ||
133 | Create a new pool with log (ZIL) | |
24406ebc | 134 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
135 | |
136 | It is possible to use a dedicated cache drive partition to increase | |
137 | the performance (SSD). | |
138 | ||
7ccbce03 | 139 | For `<device>`, you can use multiple devices, as is shown in |
859fe9c1 OB |
140 | "Create a new pool with RAID*". |
141 | ||
142 | .. code-block:: console | |
24406ebc | 143 | |
859fe9c1 OB |
144 | # zpool create -f -o ashift=12 <pool> <device> log <log_device> |
145 | ||
146 | Add cache and log to an existing pool | |
24406ebc | 147 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 | 148 | |
7ccbce03 DW |
149 | You can add cache and log devices to a pool after its creation. In this example, |
150 | we will use a single drive for both cache and log. First, you need to create | |
151 | 2 partitions on the SSD with `parted` or `gdisk` | |
859fe9c1 OB |
152 | |
153 | .. important:: Always use GPT partition tables. | |
154 | ||
155 | The maximum size of a log device should be about half the size of | |
156 | physical memory, so this is usually quite small. The rest of the SSD | |
157 | can be used as cache. | |
158 | ||
159 | .. code-block:: console | |
24406ebc | 160 | |
859fe9c1 OB |
161 | # zpool add -f <pool> log <device-part1> cache <device-part2> |
162 | ||
163 | ||
164 | Changing a failed device | |
24406ebc | 165 | ^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
166 | |
167 | .. code-block:: console | |
24406ebc | 168 | |
859fe9c1 OB |
169 | # zpool replace -f <pool> <old device> <new device> |
170 | ||
171 | ||
172 | Changing a failed bootable device | |
24406ebc | 173 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 | 174 | |
7ccbce03 DW |
175 | Depending on how Proxmox Backup was installed, it is either using `grub` or |
176 | `systemd-boot` as a bootloader. | |
859fe9c1 | 177 | |
7ccbce03 DW |
178 | In either case, the first steps of copying the partition table, reissuing GUIDs |
179 | and replacing the ZFS partition are the same. To make the system bootable from | |
180 | the new disk, different steps are needed which depend on the bootloader in use. | |
859fe9c1 OB |
181 | |
182 | .. code-block:: console | |
24406ebc | 183 | |
859fe9c1 OB |
184 | # sgdisk <healthy bootable device> -R <new device> |
185 | # sgdisk -G <new device> | |
186 | # zpool replace -f <pool> <old zfs partition> <new zfs partition> | |
187 | ||
188 | .. NOTE:: Use the `zpool status -v` command to monitor how far the resilvering process of the new disk has progressed. | |
189 | ||
190 | With `systemd-boot`: | |
191 | ||
192 | .. code-block:: console | |
24406ebc | 193 | |
0ae5f762 SI |
194 | # proxmox-boot-tool format <new ESP> |
195 | # proxmox-boot-tool init <new ESP> | |
859fe9c1 OB |
196 | |
197 | .. NOTE:: `ESP` stands for EFI System Partition, which is setup as partition #2 on | |
0ae5f762 SI |
198 | bootable disks setup by the `Proxmox Backup`_ installer. For details, see |
199 | :ref:`Setting up a new partition for use as synced ESP <systembooting-proxmox-boot-setup>`. | |
859fe9c1 OB |
200 | |
201 | With `grub`: | |
202 | ||
203 | Usually `grub.cfg` is located in `/boot/grub/grub.cfg` | |
204 | ||
205 | .. code-block:: console | |
24406ebc | 206 | |
859fe9c1 OB |
207 | # grub-install <new disk> |
208 | # grub-mkconfig -o /path/to/grub.cfg | |
209 | ||
210 | ||
7ccbce03 | 211 | Activate e-mail notification |
24406ebc | 212 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 | 213 | |
5225817d SI |
214 | ZFS comes with an event daemon ``ZED``, which monitors events generated by the |
215 | ZFS kernel module. The daemon can also send emails on ZFS events like pool | |
216 | errors. Newer ZFS packages ship the daemon in a separate package ``zfs-zed``, | |
217 | which should already be installed by default in `Proxmox Backup`_. | |
859fe9c1 | 218 | |
5225817d | 219 | You can configure the daemon via the file ``/etc/zfs/zed.d/zed.rc`` with your |
e87e4499 | 220 | favorite editor. The required setting for email notification is |
5225817d | 221 | ``ZED_EMAIL_ADDR``, which is set to ``root`` by default. |
859fe9c1 OB |
222 | |
223 | .. code-block:: console | |
24406ebc | 224 | |
859fe9c1 OB |
225 | ZED_EMAIL_ADDR="root" |
226 | ||
5225817d | 227 | Please note that `Proxmox Backup`_ forwards mails to `root` to the email address |
859fe9c1 OB |
228 | configured for the root user. |
229 | ||
859fe9c1 | 230 | |
7ccbce03 | 231 | Limit ZFS memory usage |
24406ebc | 232 | ^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
233 | |
234 | It is good to use at most 50 percent (which is the default) of the | |
7ccbce03 | 235 | system memory for ZFS ARC, to prevent performance degradation of the |
859fe9c1 OB |
236 | host. Use your preferred editor to change the configuration in |
237 | `/etc/modprobe.d/zfs.conf` and insert: | |
238 | ||
239 | .. code-block:: console | |
24406ebc | 240 | |
859fe9c1 OB |
241 | options zfs zfs_arc_max=8589934592 |
242 | ||
7ccbce03 | 243 | The above example limits the usage to 8 GiB ('8 * 2^30^'). |
859fe9c1 | 244 | |
7ccbce03 DW |
245 | .. IMPORTANT:: In case your desired `zfs_arc_max` value is lower than or equal |
246 | to `zfs_arc_min` (which defaults to 1/32 of the system memory), `zfs_arc_max` | |
247 | will be ignored. Thus, for it to work in this case, you must set | |
248 | `zfs_arc_min` to at most `zfs_arc_max - 1`. This would require updating the | |
249 | configuration in `/etc/modprobe.d/zfs.conf`, with: | |
250 | ||
251 | .. code-block:: console | |
0ae5f762 | 252 | |
7ccbce03 DW |
253 | options zfs zfs_arc_min=8589934591 |
254 | options zfs zfs_arc_max=8589934592 | |
255 | ||
256 | This example setting limits the usage to 8 GiB ('8 * 2^30^') on | |
257 | systems with more than 256 GiB of total memory, where simply setting | |
258 | `zfs_arc_max` alone would not work. | |
259 | ||
260 | .. IMPORTANT:: If your root file system is ZFS, you must update your initramfs | |
261 | every time this value changes. | |
859fe9c1 OB |
262 | |
263 | .. code-block:: console | |
24406ebc | 264 | |
859fe9c1 OB |
265 | # update-initramfs -u |
266 | ||
267 | ||
7ccbce03 | 268 | Swap on ZFS |
24406ebc | 269 | ^^^^^^^^^^^ |
859fe9c1 | 270 | |
7ccbce03 | 271 | Swap-space created on a zvol may cause some issues, such as blocking the |
0ae5f762 | 272 | server or generating a high IO load. |
859fe9c1 | 273 | |
7ccbce03 | 274 | We strongly recommend using enough memory, so that you normally do not |
859fe9c1 | 275 | run into low memory situations. Should you need or want to add swap, it is |
7ccbce03 | 276 | preferred to create a partition on a physical disk and use it as a swap device. |
859fe9c1 | 277 | You can leave some space free for this purpose in the advanced options of the |
7ccbce03 | 278 | installer. Additionally, you can lower the `swappiness` value. |
859fe9c1 OB |
279 | A good value for servers is 10: |
280 | ||
281 | .. code-block:: console | |
24406ebc | 282 | |
859fe9c1 OB |
283 | # sysctl -w vm.swappiness=10 |
284 | ||
285 | To make the swappiness persistent, open `/etc/sysctl.conf` with | |
286 | an editor of your choice and add the following line: | |
287 | ||
288 | .. code-block:: console | |
24406ebc | 289 | |
859fe9c1 OB |
290 | vm.swappiness = 10 |
291 | ||
292 | .. table:: Linux kernel `swappiness` parameter values | |
293 | :widths:auto | |
24406ebc TL |
294 | |
295 | ==================== =============================================================== | |
859fe9c1 | 296 | Value Strategy |
24406ebc | 297 | ==================== =============================================================== |
859fe9c1 OB |
298 | vm.swappiness = 0 The kernel will swap only to avoid an 'out of memory' condition |
299 | vm.swappiness = 1 Minimum amount of swapping without disabling it entirely. | |
24406ebc | 300 | vm.swappiness = 10 Sometimes recommended to improve performance when sufficient memory exists in a system. |
859fe9c1 OB |
301 | vm.swappiness = 60 The default value. |
302 | vm.swappiness = 100 The kernel will swap aggressively. | |
24406ebc | 303 | ==================== =============================================================== |
859fe9c1 | 304 | |
7ccbce03 | 305 | ZFS compression |
24406ebc | 306 | ^^^^^^^^^^^^^^^ |
859fe9c1 OB |
307 | |
308 | To activate compression: | |
0ae5f762 | 309 | |
859fe9c1 | 310 | .. code-block:: console |
24406ebc | 311 | |
859fe9c1 OB |
312 | # zpool set compression=lz4 <pool> |
313 | ||
314 | We recommend using the `lz4` algorithm, since it adds very little CPU overhead. | |
0ae5f762 | 315 | Other algorithms such as `lzjb`, `zstd` and `gzip-N` (where `N` is an integer from `1-9` |
7ccbce03 DW |
316 | representing the compression ratio, where 1 is fastest and 9 is best |
317 | compression) are also available. Depending on the algorithm and how | |
318 | compressible the data is, having compression enabled can even increase I/O | |
319 | performance. | |
859fe9c1 OB |
320 | |
321 | You can disable compression at any time with: | |
0ae5f762 | 322 | |
859fe9c1 | 323 | .. code-block:: console |
24406ebc | 324 | |
859fe9c1 OB |
325 | # zfs set compression=off <dataset> |
326 | ||
327 | Only new blocks will be affected by this change. | |
328 | ||
7d4bf881 TL |
329 | .. _local_zfs_special_device: |
330 | ||
7ccbce03 | 331 | ZFS special device |
24406ebc | 332 | ^^^^^^^^^^^^^^^^^^ |
859fe9c1 | 333 | |
7ccbce03 | 334 | Since version 0.8.0, ZFS supports `special` devices. A `special` device in a |
859fe9c1 OB |
335 | pool is used to store metadata, deduplication tables, and optionally small |
336 | file blocks. | |
337 | ||
338 | A `special` device can improve the speed of a pool consisting of slow spinning | |
7ccbce03 | 339 | hard disks with a lot of metadata changes. For example, workloads that involve |
859fe9c1 OB |
340 | creating, updating or deleting a large number of files will benefit from the |
341 | presence of a `special` device. ZFS datasets can also be configured to store | |
7ccbce03 | 342 | small files on the `special` device, which can further improve the |
859fe9c1 OB |
343 | performance. Use fast SSDs for the `special` device. |
344 | ||
345 | .. IMPORTANT:: The redundancy of the `special` device should match the one of the | |
7ccbce03 | 346 | pool, since the `special` device is a point of failure for the entire pool. |
859fe9c1 OB |
347 | |
348 | .. WARNING:: Adding a `special` device to a pool cannot be undone! | |
349 | ||
7ccbce03 | 350 | To create a pool with `special` device and RAID-1: |
859fe9c1 OB |
351 | |
352 | .. code-block:: console | |
24406ebc | 353 | |
859fe9c1 OB |
354 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> special mirror <device3> <device4> |
355 | ||
356 | Adding a `special` device to an existing pool with RAID-1: | |
357 | ||
358 | .. code-block:: console | |
24406ebc | 359 | |
859fe9c1 OB |
360 | # zpool add <pool> special mirror <device1> <device2> |
361 | ||
362 | ZFS datasets expose the `special_small_blocks=<size>` property. `size` can be | |
7ccbce03 DW |
363 | `0` to disable storing small file blocks on the `special` device, or a power of |
364 | two in the range between `512B` to `128K`. After setting this property, new file | |
859fe9c1 OB |
365 | blocks smaller than `size` will be allocated on the `special` device. |
366 | ||
367 | .. IMPORTANT:: If the value for `special_small_blocks` is greater than or equal to | |
368 | the `recordsize` (default `128K`) of the dataset, *all* data will be written to | |
369 | the `special` device, so be careful! | |
370 | ||
371 | Setting the `special_small_blocks` property on a pool will change the default | |
7ccbce03 | 372 | value of that property for all child ZFS datasets (for example, all containers |
859fe9c1 OB |
373 | in the pool will opt in for small file blocks). |
374 | ||
7ccbce03 | 375 | Opt in for all files smaller than 4K-blocks pool-wide: |
859fe9c1 OB |
376 | |
377 | .. code-block:: console | |
24406ebc | 378 | |
859fe9c1 OB |
379 | # zfs set special_small_blocks=4K <pool> |
380 | ||
381 | Opt in for small file blocks for a single dataset: | |
382 | ||
383 | .. code-block:: console | |
24406ebc | 384 | |
859fe9c1 OB |
385 | # zfs set special_small_blocks=4K <pool>/<filesystem> |
386 | ||
387 | Opt out from small file blocks for a single dataset: | |
388 | ||
389 | .. code-block:: console | |
24406ebc | 390 | |
859fe9c1 OB |
391 | # zfs set special_small_blocks=0 <pool>/<filesystem> |
392 | ||
393 | Troubleshooting | |
24406ebc | 394 | ^^^^^^^^^^^^^^^ |
859fe9c1 | 395 | |
7ccbce03 DW |
396 | Corrupt cache file |
397 | """""""""""""""""" | |
398 | ||
399 | `zfs-import-cache.service` imports ZFS pools using the ZFS cache file. If this | |
400 | file becomes corrupted, the service won't be able to import the pools that it's | |
401 | unable to read from it. | |
859fe9c1 | 402 | |
7ccbce03 DW |
403 | As a result, in case of a corrupted ZFS cache file, some volumes may not be |
404 | mounted during boot and must be mounted manually later. | |
859fe9c1 OB |
405 | |
406 | For each pool, run: | |
407 | ||
408 | .. code-block:: console | |
24406ebc | 409 | |
859fe9c1 OB |
410 | # zpool set cachefile=/etc/zfs/zpool.cache POOLNAME |
411 | ||
7ccbce03 | 412 | then, update the `initramfs` by running: |
859fe9c1 OB |
413 | |
414 | .. code-block:: console | |
24406ebc | 415 | |
859fe9c1 OB |
416 | # update-initramfs -u -k all |
417 | ||
7ccbce03 | 418 | and finally, reboot the node. |
859fe9c1 OB |
419 | |
420 | Another workaround to this problem is enabling the `zfs-import-scan.service`, | |
421 | which searches and imports pools via device scanning (usually slower). |