]>
Commit | Line | Data |
---|---|---|
49df8ac1 OB |
1 | |
2 | .. _chapter-zfs: | |
3 | ||
859fe9c1 | 4 | ZFS on Linux |
24406ebc | 5 | ------------ |
859fe9c1 OB |
6 | |
7 | ZFS is a combined file system and logical volume manager designed by | |
512d50a4 | 8 | Sun Microsystems. There is no need to manually compile ZFS modules - all |
859fe9c1 OB |
9 | packages are included. |
10 | ||
512d50a4 | 11 | By using ZFS, it's possible to achieve maximum enterprise features with |
859fe9c1 OB |
12 | low budget hardware, but also high performance systems by leveraging |
13 | SSD caching or even SSD only setups. ZFS can replace cost intense | |
14 | hardware raid cards by moderate CPU and memory load combined with easy | |
15 | management. | |
16 | ||
17 | General ZFS advantages | |
18 | ||
19 | * Easy configuration and management with GUI and CLI. | |
20 | * Reliable | |
21 | * Protection against data corruption | |
22 | * Data compression on file system level | |
23 | * Snapshots | |
24 | * Copy-on-write clone | |
25 | * Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3 | |
26 | * Can use SSD for cache | |
27 | * Self healing | |
28 | * Continuous integrity checking | |
29 | * Designed for high storage capacities | |
859fe9c1 OB |
30 | * Asynchronous replication over network |
31 | * Open Source | |
32 | * Encryption | |
33 | ||
34 | Hardware | |
24406ebc | 35 | ~~~~~~~~~ |
859fe9c1 OB |
36 | |
37 | ZFS depends heavily on memory, so you need at least 8GB to start. In | |
38 | practice, use as much you can get for your hardware/budget. To prevent | |
39 | data corruption, we recommend the use of high quality ECC RAM. | |
40 | ||
41 | If you use a dedicated cache and/or log disk, you should use an | |
42 | enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can | |
43 | increase the overall performance significantly. | |
44 | ||
45 | IMPORTANT: Do not use ZFS on top of hardware controller which has its | |
46 | own cache management. ZFS needs to directly communicate with disks. An | |
47 | HBA adapter is the way to go, or something like LSI controller flashed | |
48 | in ``IT`` mode. | |
49 | ||
50 | ||
859fe9c1 | 51 | ZFS Administration |
24406ebc | 52 | ~~~~~~~~~~~~~~~~~~ |
859fe9c1 OB |
53 | |
54 | This section gives you some usage examples for common tasks. ZFS | |
55 | itself is really powerful and provides many options. The main commands | |
56 | to manage ZFS are `zfs` and `zpool`. Both commands come with great | |
57 | manual pages, which can be read with: | |
58 | ||
59 | .. code-block:: console | |
24406ebc | 60 | |
859fe9c1 OB |
61 | # man zpool |
62 | # man zfs | |
63 | ||
64 | Create a new zpool | |
24406ebc | 65 | ^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
66 | |
67 | To create a new pool, at least one disk is needed. The `ashift` should | |
68 | have the same sector-size (2 power of `ashift`) or larger as the | |
69 | underlying disk. | |
70 | ||
71 | .. code-block:: console | |
24406ebc | 72 | |
859fe9c1 OB |
73 | # zpool create -f -o ashift=12 <pool> <device> |
74 | ||
75 | Create a new pool with RAID-0 | |
24406ebc | 76 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
77 | |
78 | Minimum 1 disk | |
79 | ||
80 | .. code-block:: console | |
24406ebc | 81 | |
859fe9c1 OB |
82 | # zpool create -f -o ashift=12 <pool> <device1> <device2> |
83 | ||
84 | Create a new pool with RAID-1 | |
24406ebc | 85 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
86 | |
87 | Minimum 2 disks | |
88 | ||
89 | .. code-block:: console | |
24406ebc | 90 | |
859fe9c1 OB |
91 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> |
92 | ||
93 | Create a new pool with RAID-10 | |
24406ebc | 94 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
95 | |
96 | Minimum 4 disks | |
97 | ||
98 | .. code-block:: console | |
24406ebc | 99 | |
859fe9c1 OB |
100 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4> |
101 | ||
102 | Create a new pool with RAIDZ-1 | |
24406ebc | 103 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
104 | |
105 | Minimum 3 disks | |
106 | ||
107 | .. code-block:: console | |
24406ebc | 108 | |
859fe9c1 OB |
109 | # zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3> |
110 | ||
111 | Create a new pool with RAIDZ-2 | |
24406ebc | 112 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
113 | |
114 | Minimum 4 disks | |
115 | ||
116 | .. code-block:: console | |
24406ebc | 117 | |
859fe9c1 OB |
118 | # zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4> |
119 | ||
120 | Create a new pool with cache (L2ARC) | |
24406ebc | 121 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
122 | |
123 | It is possible to use a dedicated cache drive partition to increase | |
124 | the performance (use SSD). | |
125 | ||
126 | As `<device>` it is possible to use more devices, like it's shown in | |
127 | "Create a new pool with RAID*". | |
128 | ||
129 | .. code-block:: console | |
24406ebc | 130 | |
859fe9c1 OB |
131 | # zpool create -f -o ashift=12 <pool> <device> cache <cache_device> |
132 | ||
133 | Create a new pool with log (ZIL) | |
24406ebc | 134 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
135 | |
136 | It is possible to use a dedicated cache drive partition to increase | |
137 | the performance (SSD). | |
138 | ||
139 | As `<device>` it is possible to use more devices, like it's shown in | |
140 | "Create a new pool with RAID*". | |
141 | ||
142 | .. code-block:: console | |
24406ebc | 143 | |
859fe9c1 OB |
144 | # zpool create -f -o ashift=12 <pool> <device> log <log_device> |
145 | ||
146 | Add cache and log to an existing pool | |
24406ebc | 147 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
148 | |
149 | If you have a pool without cache and log. First partition the SSD in | |
150 | 2 partition with `parted` or `gdisk` | |
151 | ||
152 | .. important:: Always use GPT partition tables. | |
153 | ||
154 | The maximum size of a log device should be about half the size of | |
155 | physical memory, so this is usually quite small. The rest of the SSD | |
156 | can be used as cache. | |
157 | ||
158 | .. code-block:: console | |
24406ebc | 159 | |
859fe9c1 OB |
160 | # zpool add -f <pool> log <device-part1> cache <device-part2> |
161 | ||
162 | ||
163 | Changing a failed device | |
24406ebc | 164 | ^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
165 | |
166 | .. code-block:: console | |
24406ebc | 167 | |
859fe9c1 OB |
168 | # zpool replace -f <pool> <old device> <new device> |
169 | ||
170 | ||
171 | Changing a failed bootable device | |
24406ebc | 172 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
173 | |
174 | Depending on how Proxmox Backup was installed it is either using `grub` or `systemd-boot` | |
175 | as bootloader. | |
176 | ||
177 | The first steps of copying the partition table, reissuing GUIDs and replacing | |
178 | the ZFS partition are the same. To make the system bootable from the new disk, | |
179 | different steps are needed which depend on the bootloader in use. | |
180 | ||
181 | .. code-block:: console | |
24406ebc | 182 | |
859fe9c1 OB |
183 | # sgdisk <healthy bootable device> -R <new device> |
184 | # sgdisk -G <new device> | |
185 | # zpool replace -f <pool> <old zfs partition> <new zfs partition> | |
186 | ||
187 | .. NOTE:: Use the `zpool status -v` command to monitor how far the resilvering process of the new disk has progressed. | |
188 | ||
189 | With `systemd-boot`: | |
190 | ||
191 | .. code-block:: console | |
24406ebc | 192 | |
859fe9c1 OB |
193 | # pve-efiboot-tool format <new disk's ESP> |
194 | # pve-efiboot-tool init <new disk's ESP> | |
195 | ||
196 | .. NOTE:: `ESP` stands for EFI System Partition, which is setup as partition #2 on | |
197 | bootable disks setup by the {pve} installer since version 5.4. For details, see | |
198 | xref:sysboot_systemd_boot_setup[Setting up a new partition for use as synced ESP]. | |
199 | ||
200 | With `grub`: | |
201 | ||
202 | Usually `grub.cfg` is located in `/boot/grub/grub.cfg` | |
203 | ||
204 | .. code-block:: console | |
24406ebc | 205 | |
859fe9c1 OB |
206 | # grub-install <new disk> |
207 | # grub-mkconfig -o /path/to/grub.cfg | |
208 | ||
209 | ||
210 | Activate E-Mail Notification | |
24406ebc | 211 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
212 | |
213 | ZFS comes with an event daemon, which monitors events generated by the | |
214 | ZFS kernel module. The daemon can also send emails on ZFS events like | |
215 | pool errors. Newer ZFS packages ship the daemon in a separate package, | |
216 | and you can install it using `apt-get`: | |
217 | ||
218 | .. code-block:: console | |
24406ebc | 219 | |
859fe9c1 OB |
220 | # apt-get install zfs-zed |
221 | ||
222 | To activate the daemon it is necessary to edit `/etc/zfs/zed.d/zed.rc` with your | |
3bbb70b3 | 223 | favorite editor, and uncomment the `ZED_EMAIL_ADDR` setting: |
859fe9c1 OB |
224 | |
225 | .. code-block:: console | |
24406ebc | 226 | |
859fe9c1 OB |
227 | ZED_EMAIL_ADDR="root" |
228 | ||
229 | Please note Proxmox Backup forwards mails to `root` to the email address | |
230 | configured for the root user. | |
231 | ||
232 | IMPORTANT: The only setting that is required is `ZED_EMAIL_ADDR`. All | |
233 | other settings are optional. | |
234 | ||
235 | Limit ZFS Memory Usage | |
24406ebc | 236 | ^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
237 | |
238 | It is good to use at most 50 percent (which is the default) of the | |
239 | system memory for ZFS ARC to prevent performance shortage of the | |
240 | host. Use your preferred editor to change the configuration in | |
241 | `/etc/modprobe.d/zfs.conf` and insert: | |
242 | ||
243 | .. code-block:: console | |
24406ebc | 244 | |
859fe9c1 OB |
245 | options zfs zfs_arc_max=8589934592 |
246 | ||
247 | This example setting limits the usage to 8GB. | |
248 | ||
249 | .. IMPORTANT:: If your root file system is ZFS you must update your initramfs every time this value changes: | |
250 | ||
251 | .. code-block:: console | |
24406ebc | 252 | |
859fe9c1 OB |
253 | # update-initramfs -u |
254 | ||
255 | ||
256 | SWAP on ZFS | |
24406ebc | 257 | ^^^^^^^^^^^ |
859fe9c1 OB |
258 | |
259 | Swap-space created on a zvol may generate some troubles, like blocking the | |
260 | server or generating a high IO load, often seen when starting a Backup | |
261 | to an external Storage. | |
262 | ||
263 | We strongly recommend to use enough memory, so that you normally do not | |
264 | run into low memory situations. Should you need or want to add swap, it is | |
3bbb70b3 | 265 | preferred to create a partition on a physical disk and use it as swap device. |
859fe9c1 OB |
266 | You can leave some space free for this purpose in the advanced options of the |
267 | installer. Additionally, you can lower the `swappiness` value. | |
268 | A good value for servers is 10: | |
269 | ||
270 | .. code-block:: console | |
24406ebc | 271 | |
859fe9c1 OB |
272 | # sysctl -w vm.swappiness=10 |
273 | ||
274 | To make the swappiness persistent, open `/etc/sysctl.conf` with | |
275 | an editor of your choice and add the following line: | |
276 | ||
277 | .. code-block:: console | |
24406ebc | 278 | |
859fe9c1 OB |
279 | vm.swappiness = 10 |
280 | ||
281 | .. table:: Linux kernel `swappiness` parameter values | |
282 | :widths:auto | |
24406ebc TL |
283 | |
284 | ==================== =============================================================== | |
859fe9c1 | 285 | Value Strategy |
24406ebc | 286 | ==================== =============================================================== |
859fe9c1 OB |
287 | vm.swappiness = 0 The kernel will swap only to avoid an 'out of memory' condition |
288 | vm.swappiness = 1 Minimum amount of swapping without disabling it entirely. | |
24406ebc | 289 | vm.swappiness = 10 Sometimes recommended to improve performance when sufficient memory exists in a system. |
859fe9c1 OB |
290 | vm.swappiness = 60 The default value. |
291 | vm.swappiness = 100 The kernel will swap aggressively. | |
24406ebc | 292 | ==================== =============================================================== |
859fe9c1 OB |
293 | |
294 | ZFS Compression | |
24406ebc | 295 | ^^^^^^^^^^^^^^^ |
859fe9c1 OB |
296 | |
297 | To activate compression: | |
298 | .. code-block:: console | |
24406ebc | 299 | |
859fe9c1 OB |
300 | # zpool set compression=lz4 <pool> |
301 | ||
302 | We recommend using the `lz4` algorithm, since it adds very little CPU overhead. | |
303 | Other algorithms such as `lzjb` and `gzip-N` (where `N` is an integer `1-9` representing | |
304 | the compression ratio, 1 is fastest and 9 is best compression) are also available. | |
305 | Depending on the algorithm and how compressible the data is, having compression enabled can even increase | |
306 | I/O performance. | |
307 | ||
308 | You can disable compression at any time with: | |
309 | .. code-block:: console | |
24406ebc | 310 | |
859fe9c1 OB |
311 | # zfs set compression=off <dataset> |
312 | ||
313 | Only new blocks will be affected by this change. | |
314 | ||
7d4bf881 TL |
315 | .. _local_zfs_special_device: |
316 | ||
859fe9c1 | 317 | ZFS Special Device |
24406ebc | 318 | ^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
319 | |
320 | Since version 0.8.0 ZFS supports `special` devices. A `special` device in a | |
321 | pool is used to store metadata, deduplication tables, and optionally small | |
322 | file blocks. | |
323 | ||
324 | A `special` device can improve the speed of a pool consisting of slow spinning | |
325 | hard disks with a lot of metadata changes. For example workloads that involve | |
326 | creating, updating or deleting a large number of files will benefit from the | |
327 | presence of a `special` device. ZFS datasets can also be configured to store | |
328 | whole small files on the `special` device which can further improve the | |
329 | performance. Use fast SSDs for the `special` device. | |
330 | ||
331 | .. IMPORTANT:: The redundancy of the `special` device should match the one of the | |
332 | pool, since the `special` device is a point of failure for the whole pool. | |
333 | ||
334 | .. WARNING:: Adding a `special` device to a pool cannot be undone! | |
335 | ||
336 | Create a pool with `special` device and RAID-1: | |
337 | ||
338 | .. code-block:: console | |
24406ebc | 339 | |
859fe9c1 OB |
340 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> special mirror <device3> <device4> |
341 | ||
342 | Adding a `special` device to an existing pool with RAID-1: | |
343 | ||
344 | .. code-block:: console | |
24406ebc | 345 | |
859fe9c1 OB |
346 | # zpool add <pool> special mirror <device1> <device2> |
347 | ||
348 | ZFS datasets expose the `special_small_blocks=<size>` property. `size` can be | |
349 | `0` to disable storing small file blocks on the `special` device or a power of | |
350 | two in the range between `512B` to `128K`. After setting the property new file | |
351 | blocks smaller than `size` will be allocated on the `special` device. | |
352 | ||
353 | .. IMPORTANT:: If the value for `special_small_blocks` is greater than or equal to | |
354 | the `recordsize` (default `128K`) of the dataset, *all* data will be written to | |
355 | the `special` device, so be careful! | |
356 | ||
357 | Setting the `special_small_blocks` property on a pool will change the default | |
358 | value of that property for all child ZFS datasets (for example all containers | |
359 | in the pool will opt in for small file blocks). | |
360 | ||
361 | Opt in for all file smaller than 4K-blocks pool-wide: | |
362 | ||
363 | .. code-block:: console | |
24406ebc | 364 | |
859fe9c1 OB |
365 | # zfs set special_small_blocks=4K <pool> |
366 | ||
367 | Opt in for small file blocks for a single dataset: | |
368 | ||
369 | .. code-block:: console | |
24406ebc | 370 | |
859fe9c1 OB |
371 | # zfs set special_small_blocks=4K <pool>/<filesystem> |
372 | ||
373 | Opt out from small file blocks for a single dataset: | |
374 | ||
375 | .. code-block:: console | |
24406ebc | 376 | |
859fe9c1 OB |
377 | # zfs set special_small_blocks=0 <pool>/<filesystem> |
378 | ||
379 | Troubleshooting | |
24406ebc | 380 | ^^^^^^^^^^^^^^^ |
859fe9c1 OB |
381 | |
382 | Corrupted cachefile | |
383 | ||
384 | In case of a corrupted ZFS cachefile, some volumes may not be mounted during | |
385 | boot until mounted manually later. | |
386 | ||
387 | For each pool, run: | |
388 | ||
389 | .. code-block:: console | |
24406ebc | 390 | |
859fe9c1 OB |
391 | # zpool set cachefile=/etc/zfs/zpool.cache POOLNAME |
392 | ||
393 | and afterwards update the `initramfs` by running: | |
394 | ||
395 | .. code-block:: console | |
24406ebc | 396 | |
859fe9c1 OB |
397 | # update-initramfs -u -k all |
398 | ||
399 | and finally reboot your node. | |
400 | ||
401 | Sometimes the ZFS cachefile can get corrupted, and `zfs-import-cache.service` | |
402 | doesn't import the pools that aren't present in the cachefile. | |
403 | ||
404 | Another workaround to this problem is enabling the `zfs-import-scan.service`, | |
405 | which searches and imports pools via device scanning (usually slower). |