]>
Commit | Line | Data |
---|---|---|
859fe9c1 | 1 | ZFS on Linux |
24406ebc | 2 | ------------ |
859fe9c1 OB |
3 | |
4 | ZFS is a combined file system and logical volume manager designed by | |
512d50a4 | 5 | Sun Microsystems. There is no need to manually compile ZFS modules - all |
859fe9c1 OB |
6 | packages are included. |
7 | ||
512d50a4 | 8 | By using ZFS, it's possible to achieve maximum enterprise features with |
859fe9c1 OB |
9 | low budget hardware, but also high performance systems by leveraging |
10 | SSD caching or even SSD only setups. ZFS can replace cost intense | |
11 | hardware raid cards by moderate CPU and memory load combined with easy | |
12 | management. | |
13 | ||
14 | General ZFS advantages | |
15 | ||
16 | * Easy configuration and management with GUI and CLI. | |
17 | * Reliable | |
18 | * Protection against data corruption | |
19 | * Data compression on file system level | |
20 | * Snapshots | |
21 | * Copy-on-write clone | |
22 | * Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3 | |
23 | * Can use SSD for cache | |
24 | * Self healing | |
25 | * Continuous integrity checking | |
26 | * Designed for high storage capacities | |
859fe9c1 OB |
27 | * Asynchronous replication over network |
28 | * Open Source | |
29 | * Encryption | |
30 | ||
31 | Hardware | |
24406ebc | 32 | ~~~~~~~~~ |
859fe9c1 OB |
33 | |
34 | ZFS depends heavily on memory, so you need at least 8GB to start. In | |
35 | practice, use as much you can get for your hardware/budget. To prevent | |
36 | data corruption, we recommend the use of high quality ECC RAM. | |
37 | ||
38 | If you use a dedicated cache and/or log disk, you should use an | |
39 | enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can | |
40 | increase the overall performance significantly. | |
41 | ||
42 | IMPORTANT: Do not use ZFS on top of hardware controller which has its | |
43 | own cache management. ZFS needs to directly communicate with disks. An | |
44 | HBA adapter is the way to go, or something like LSI controller flashed | |
45 | in ``IT`` mode. | |
46 | ||
47 | ||
859fe9c1 | 48 | ZFS Administration |
24406ebc | 49 | ~~~~~~~~~~~~~~~~~~ |
859fe9c1 OB |
50 | |
51 | This section gives you some usage examples for common tasks. ZFS | |
52 | itself is really powerful and provides many options. The main commands | |
53 | to manage ZFS are `zfs` and `zpool`. Both commands come with great | |
54 | manual pages, which can be read with: | |
55 | ||
56 | .. code-block:: console | |
24406ebc | 57 | |
859fe9c1 OB |
58 | # man zpool |
59 | # man zfs | |
60 | ||
61 | Create a new zpool | |
24406ebc | 62 | ^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
63 | |
64 | To create a new pool, at least one disk is needed. The `ashift` should | |
65 | have the same sector-size (2 power of `ashift`) or larger as the | |
66 | underlying disk. | |
67 | ||
68 | .. code-block:: console | |
24406ebc | 69 | |
859fe9c1 OB |
70 | # zpool create -f -o ashift=12 <pool> <device> |
71 | ||
72 | Create a new pool with RAID-0 | |
24406ebc | 73 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
74 | |
75 | Minimum 1 disk | |
76 | ||
77 | .. code-block:: console | |
24406ebc | 78 | |
859fe9c1 OB |
79 | # zpool create -f -o ashift=12 <pool> <device1> <device2> |
80 | ||
81 | Create a new pool with RAID-1 | |
24406ebc | 82 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
83 | |
84 | Minimum 2 disks | |
85 | ||
86 | .. code-block:: console | |
24406ebc | 87 | |
859fe9c1 OB |
88 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> |
89 | ||
90 | Create a new pool with RAID-10 | |
24406ebc | 91 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
92 | |
93 | Minimum 4 disks | |
94 | ||
95 | .. code-block:: console | |
24406ebc | 96 | |
859fe9c1 OB |
97 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4> |
98 | ||
99 | Create a new pool with RAIDZ-1 | |
24406ebc | 100 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
101 | |
102 | Minimum 3 disks | |
103 | ||
104 | .. code-block:: console | |
24406ebc | 105 | |
859fe9c1 OB |
106 | # zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3> |
107 | ||
108 | Create a new pool with RAIDZ-2 | |
24406ebc | 109 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
110 | |
111 | Minimum 4 disks | |
112 | ||
113 | .. code-block:: console | |
24406ebc | 114 | |
859fe9c1 OB |
115 | # zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4> |
116 | ||
117 | Create a new pool with cache (L2ARC) | |
24406ebc | 118 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
119 | |
120 | It is possible to use a dedicated cache drive partition to increase | |
121 | the performance (use SSD). | |
122 | ||
123 | As `<device>` it is possible to use more devices, like it's shown in | |
124 | "Create a new pool with RAID*". | |
125 | ||
126 | .. code-block:: console | |
24406ebc | 127 | |
859fe9c1 OB |
128 | # zpool create -f -o ashift=12 <pool> <device> cache <cache_device> |
129 | ||
130 | Create a new pool with log (ZIL) | |
24406ebc | 131 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
132 | |
133 | It is possible to use a dedicated cache drive partition to increase | |
134 | the performance (SSD). | |
135 | ||
136 | As `<device>` it is possible to use more devices, like it's shown in | |
137 | "Create a new pool with RAID*". | |
138 | ||
139 | .. code-block:: console | |
24406ebc | 140 | |
859fe9c1 OB |
141 | # zpool create -f -o ashift=12 <pool> <device> log <log_device> |
142 | ||
143 | Add cache and log to an existing pool | |
24406ebc | 144 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
145 | |
146 | If you have a pool without cache and log. First partition the SSD in | |
147 | 2 partition with `parted` or `gdisk` | |
148 | ||
149 | .. important:: Always use GPT partition tables. | |
150 | ||
151 | The maximum size of a log device should be about half the size of | |
152 | physical memory, so this is usually quite small. The rest of the SSD | |
153 | can be used as cache. | |
154 | ||
155 | .. code-block:: console | |
24406ebc | 156 | |
859fe9c1 OB |
157 | # zpool add -f <pool> log <device-part1> cache <device-part2> |
158 | ||
159 | ||
160 | Changing a failed device | |
24406ebc | 161 | ^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
162 | |
163 | .. code-block:: console | |
24406ebc | 164 | |
859fe9c1 OB |
165 | # zpool replace -f <pool> <old device> <new device> |
166 | ||
167 | ||
168 | Changing a failed bootable device | |
24406ebc | 169 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
170 | |
171 | Depending on how Proxmox Backup was installed it is either using `grub` or `systemd-boot` | |
172 | as bootloader. | |
173 | ||
174 | The first steps of copying the partition table, reissuing GUIDs and replacing | |
175 | the ZFS partition are the same. To make the system bootable from the new disk, | |
176 | different steps are needed which depend on the bootloader in use. | |
177 | ||
178 | .. code-block:: console | |
24406ebc | 179 | |
859fe9c1 OB |
180 | # sgdisk <healthy bootable device> -R <new device> |
181 | # sgdisk -G <new device> | |
182 | # zpool replace -f <pool> <old zfs partition> <new zfs partition> | |
183 | ||
184 | .. NOTE:: Use the `zpool status -v` command to monitor how far the resilvering process of the new disk has progressed. | |
185 | ||
186 | With `systemd-boot`: | |
187 | ||
188 | .. code-block:: console | |
24406ebc | 189 | |
859fe9c1 OB |
190 | # pve-efiboot-tool format <new disk's ESP> |
191 | # pve-efiboot-tool init <new disk's ESP> | |
192 | ||
193 | .. NOTE:: `ESP` stands for EFI System Partition, which is setup as partition #2 on | |
194 | bootable disks setup by the {pve} installer since version 5.4. For details, see | |
195 | xref:sysboot_systemd_boot_setup[Setting up a new partition for use as synced ESP]. | |
196 | ||
197 | With `grub`: | |
198 | ||
199 | Usually `grub.cfg` is located in `/boot/grub/grub.cfg` | |
200 | ||
201 | .. code-block:: console | |
24406ebc | 202 | |
859fe9c1 OB |
203 | # grub-install <new disk> |
204 | # grub-mkconfig -o /path/to/grub.cfg | |
205 | ||
206 | ||
207 | Activate E-Mail Notification | |
24406ebc | 208 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
209 | |
210 | ZFS comes with an event daemon, which monitors events generated by the | |
211 | ZFS kernel module. The daemon can also send emails on ZFS events like | |
212 | pool errors. Newer ZFS packages ship the daemon in a separate package, | |
213 | and you can install it using `apt-get`: | |
214 | ||
215 | .. code-block:: console | |
24406ebc | 216 | |
859fe9c1 OB |
217 | # apt-get install zfs-zed |
218 | ||
219 | To activate the daemon it is necessary to edit `/etc/zfs/zed.d/zed.rc` with your | |
220 | favourite editor, and uncomment the `ZED_EMAIL_ADDR` setting: | |
221 | ||
222 | .. code-block:: console | |
24406ebc | 223 | |
859fe9c1 OB |
224 | ZED_EMAIL_ADDR="root" |
225 | ||
226 | Please note Proxmox Backup forwards mails to `root` to the email address | |
227 | configured for the root user. | |
228 | ||
229 | IMPORTANT: The only setting that is required is `ZED_EMAIL_ADDR`. All | |
230 | other settings are optional. | |
231 | ||
232 | Limit ZFS Memory Usage | |
24406ebc | 233 | ^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
234 | |
235 | It is good to use at most 50 percent (which is the default) of the | |
236 | system memory for ZFS ARC to prevent performance shortage of the | |
237 | host. Use your preferred editor to change the configuration in | |
238 | `/etc/modprobe.d/zfs.conf` and insert: | |
239 | ||
240 | .. code-block:: console | |
24406ebc | 241 | |
859fe9c1 OB |
242 | options zfs zfs_arc_max=8589934592 |
243 | ||
244 | This example setting limits the usage to 8GB. | |
245 | ||
246 | .. IMPORTANT:: If your root file system is ZFS you must update your initramfs every time this value changes: | |
247 | ||
248 | .. code-block:: console | |
24406ebc | 249 | |
859fe9c1 OB |
250 | # update-initramfs -u |
251 | ||
252 | ||
253 | SWAP on ZFS | |
24406ebc | 254 | ^^^^^^^^^^^ |
859fe9c1 OB |
255 | |
256 | Swap-space created on a zvol may generate some troubles, like blocking the | |
257 | server or generating a high IO load, often seen when starting a Backup | |
258 | to an external Storage. | |
259 | ||
260 | We strongly recommend to use enough memory, so that you normally do not | |
261 | run into low memory situations. Should you need or want to add swap, it is | |
262 | preferred to create a partition on a physical disk and use it as swapdevice. | |
263 | You can leave some space free for this purpose in the advanced options of the | |
264 | installer. Additionally, you can lower the `swappiness` value. | |
265 | A good value for servers is 10: | |
266 | ||
267 | .. code-block:: console | |
24406ebc | 268 | |
859fe9c1 OB |
269 | # sysctl -w vm.swappiness=10 |
270 | ||
271 | To make the swappiness persistent, open `/etc/sysctl.conf` with | |
272 | an editor of your choice and add the following line: | |
273 | ||
274 | .. code-block:: console | |
24406ebc | 275 | |
859fe9c1 OB |
276 | vm.swappiness = 10 |
277 | ||
278 | .. table:: Linux kernel `swappiness` parameter values | |
279 | :widths:auto | |
24406ebc TL |
280 | |
281 | ==================== =============================================================== | |
859fe9c1 | 282 | Value Strategy |
24406ebc | 283 | ==================== =============================================================== |
859fe9c1 OB |
284 | vm.swappiness = 0 The kernel will swap only to avoid an 'out of memory' condition |
285 | vm.swappiness = 1 Minimum amount of swapping without disabling it entirely. | |
24406ebc | 286 | vm.swappiness = 10 Sometimes recommended to improve performance when sufficient memory exists in a system. |
859fe9c1 OB |
287 | vm.swappiness = 60 The default value. |
288 | vm.swappiness = 100 The kernel will swap aggressively. | |
24406ebc | 289 | ==================== =============================================================== |
859fe9c1 OB |
290 | |
291 | ZFS Compression | |
24406ebc | 292 | ^^^^^^^^^^^^^^^ |
859fe9c1 OB |
293 | |
294 | To activate compression: | |
295 | .. code-block:: console | |
24406ebc | 296 | |
859fe9c1 OB |
297 | # zpool set compression=lz4 <pool> |
298 | ||
299 | We recommend using the `lz4` algorithm, since it adds very little CPU overhead. | |
300 | Other algorithms such as `lzjb` and `gzip-N` (where `N` is an integer `1-9` representing | |
301 | the compression ratio, 1 is fastest and 9 is best compression) are also available. | |
302 | Depending on the algorithm and how compressible the data is, having compression enabled can even increase | |
303 | I/O performance. | |
304 | ||
305 | You can disable compression at any time with: | |
306 | .. code-block:: console | |
24406ebc | 307 | |
859fe9c1 OB |
308 | # zfs set compression=off <dataset> |
309 | ||
310 | Only new blocks will be affected by this change. | |
311 | ||
312 | ZFS Special Device | |
24406ebc | 313 | ^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
314 | |
315 | Since version 0.8.0 ZFS supports `special` devices. A `special` device in a | |
316 | pool is used to store metadata, deduplication tables, and optionally small | |
317 | file blocks. | |
318 | ||
319 | A `special` device can improve the speed of a pool consisting of slow spinning | |
320 | hard disks with a lot of metadata changes. For example workloads that involve | |
321 | creating, updating or deleting a large number of files will benefit from the | |
322 | presence of a `special` device. ZFS datasets can also be configured to store | |
323 | whole small files on the `special` device which can further improve the | |
324 | performance. Use fast SSDs for the `special` device. | |
325 | ||
326 | .. IMPORTANT:: The redundancy of the `special` device should match the one of the | |
327 | pool, since the `special` device is a point of failure for the whole pool. | |
328 | ||
329 | .. WARNING:: Adding a `special` device to a pool cannot be undone! | |
330 | ||
331 | Create a pool with `special` device and RAID-1: | |
332 | ||
333 | .. code-block:: console | |
24406ebc | 334 | |
859fe9c1 OB |
335 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> special mirror <device3> <device4> |
336 | ||
337 | Adding a `special` device to an existing pool with RAID-1: | |
338 | ||
339 | .. code-block:: console | |
24406ebc | 340 | |
859fe9c1 OB |
341 | # zpool add <pool> special mirror <device1> <device2> |
342 | ||
343 | ZFS datasets expose the `special_small_blocks=<size>` property. `size` can be | |
344 | `0` to disable storing small file blocks on the `special` device or a power of | |
345 | two in the range between `512B` to `128K`. After setting the property new file | |
346 | blocks smaller than `size` will be allocated on the `special` device. | |
347 | ||
348 | .. IMPORTANT:: If the value for `special_small_blocks` is greater than or equal to | |
349 | the `recordsize` (default `128K`) of the dataset, *all* data will be written to | |
350 | the `special` device, so be careful! | |
351 | ||
352 | Setting the `special_small_blocks` property on a pool will change the default | |
353 | value of that property for all child ZFS datasets (for example all containers | |
354 | in the pool will opt in for small file blocks). | |
355 | ||
356 | Opt in for all file smaller than 4K-blocks pool-wide: | |
357 | ||
358 | .. code-block:: console | |
24406ebc | 359 | |
859fe9c1 OB |
360 | # zfs set special_small_blocks=4K <pool> |
361 | ||
362 | Opt in for small file blocks for a single dataset: | |
363 | ||
364 | .. code-block:: console | |
24406ebc | 365 | |
859fe9c1 OB |
366 | # zfs set special_small_blocks=4K <pool>/<filesystem> |
367 | ||
368 | Opt out from small file blocks for a single dataset: | |
369 | ||
370 | .. code-block:: console | |
24406ebc | 371 | |
859fe9c1 OB |
372 | # zfs set special_small_blocks=0 <pool>/<filesystem> |
373 | ||
374 | Troubleshooting | |
24406ebc | 375 | ^^^^^^^^^^^^^^^ |
859fe9c1 OB |
376 | |
377 | Corrupted cachefile | |
378 | ||
379 | In case of a corrupted ZFS cachefile, some volumes may not be mounted during | |
380 | boot until mounted manually later. | |
381 | ||
382 | For each pool, run: | |
383 | ||
384 | .. code-block:: console | |
24406ebc | 385 | |
859fe9c1 OB |
386 | # zpool set cachefile=/etc/zfs/zpool.cache POOLNAME |
387 | ||
388 | and afterwards update the `initramfs` by running: | |
389 | ||
390 | .. code-block:: console | |
24406ebc | 391 | |
859fe9c1 OB |
392 | # update-initramfs -u -k all |
393 | ||
394 | and finally reboot your node. | |
395 | ||
396 | Sometimes the ZFS cachefile can get corrupted, and `zfs-import-cache.service` | |
397 | doesn't import the pools that aren't present in the cachefile. | |
398 | ||
399 | Another workaround to this problem is enabling the `zfs-import-scan.service`, | |
400 | which searches and imports pools via device scanning (usually slower). |