]>
Commit | Line | Data |
---|---|---|
859fe9c1 | 1 | ZFS on Linux |
24406ebc | 2 | ------------ |
859fe9c1 OB |
3 | |
4 | ZFS is a combined file system and logical volume manager designed by | |
5 | Sun Microsystems. There is no need for manually compile ZFS modules - all | |
6 | packages are included. | |
7 | ||
8 | By using ZFS, its possible to achieve maximum enterprise features with | |
9 | low budget hardware, but also high performance systems by leveraging | |
10 | SSD caching or even SSD only setups. ZFS can replace cost intense | |
11 | hardware raid cards by moderate CPU and memory load combined with easy | |
12 | management. | |
13 | ||
14 | General ZFS advantages | |
15 | ||
16 | * Easy configuration and management with GUI and CLI. | |
17 | * Reliable | |
18 | * Protection against data corruption | |
19 | * Data compression on file system level | |
20 | * Snapshots | |
21 | * Copy-on-write clone | |
22 | * Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3 | |
23 | * Can use SSD for cache | |
24 | * Self healing | |
25 | * Continuous integrity checking | |
26 | * Designed for high storage capacities | |
27 | * Protection against data corruption | |
28 | * Asynchronous replication over network | |
29 | * Open Source | |
30 | * Encryption | |
31 | ||
32 | Hardware | |
24406ebc | 33 | ~~~~~~~~~ |
859fe9c1 OB |
34 | |
35 | ZFS depends heavily on memory, so you need at least 8GB to start. In | |
36 | practice, use as much you can get for your hardware/budget. To prevent | |
37 | data corruption, we recommend the use of high quality ECC RAM. | |
38 | ||
39 | If you use a dedicated cache and/or log disk, you should use an | |
40 | enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can | |
41 | increase the overall performance significantly. | |
42 | ||
43 | IMPORTANT: Do not use ZFS on top of hardware controller which has its | |
44 | own cache management. ZFS needs to directly communicate with disks. An | |
45 | HBA adapter is the way to go, or something like LSI controller flashed | |
46 | in ``IT`` mode. | |
47 | ||
48 | ||
859fe9c1 | 49 | ZFS Administration |
24406ebc | 50 | ~~~~~~~~~~~~~~~~~~ |
859fe9c1 OB |
51 | |
52 | This section gives you some usage examples for common tasks. ZFS | |
53 | itself is really powerful and provides many options. The main commands | |
54 | to manage ZFS are `zfs` and `zpool`. Both commands come with great | |
55 | manual pages, which can be read with: | |
56 | ||
57 | .. code-block:: console | |
24406ebc | 58 | |
859fe9c1 OB |
59 | # man zpool |
60 | # man zfs | |
61 | ||
62 | Create a new zpool | |
24406ebc | 63 | ^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
64 | |
65 | To create a new pool, at least one disk is needed. The `ashift` should | |
66 | have the same sector-size (2 power of `ashift`) or larger as the | |
67 | underlying disk. | |
68 | ||
69 | .. code-block:: console | |
24406ebc | 70 | |
859fe9c1 OB |
71 | # zpool create -f -o ashift=12 <pool> <device> |
72 | ||
73 | Create a new pool with RAID-0 | |
24406ebc | 74 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
75 | |
76 | Minimum 1 disk | |
77 | ||
78 | .. code-block:: console | |
24406ebc | 79 | |
859fe9c1 OB |
80 | # zpool create -f -o ashift=12 <pool> <device1> <device2> |
81 | ||
82 | Create a new pool with RAID-1 | |
24406ebc | 83 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
84 | |
85 | Minimum 2 disks | |
86 | ||
87 | .. code-block:: console | |
24406ebc | 88 | |
859fe9c1 OB |
89 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> |
90 | ||
91 | Create a new pool with RAID-10 | |
24406ebc | 92 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
93 | |
94 | Minimum 4 disks | |
95 | ||
96 | .. code-block:: console | |
24406ebc | 97 | |
859fe9c1 OB |
98 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4> |
99 | ||
100 | Create a new pool with RAIDZ-1 | |
24406ebc | 101 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
102 | |
103 | Minimum 3 disks | |
104 | ||
105 | .. code-block:: console | |
24406ebc | 106 | |
859fe9c1 OB |
107 | # zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3> |
108 | ||
109 | Create a new pool with RAIDZ-2 | |
24406ebc | 110 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
111 | |
112 | Minimum 4 disks | |
113 | ||
114 | .. code-block:: console | |
24406ebc | 115 | |
859fe9c1 OB |
116 | # zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4> |
117 | ||
118 | Create a new pool with cache (L2ARC) | |
24406ebc | 119 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
120 | |
121 | It is possible to use a dedicated cache drive partition to increase | |
122 | the performance (use SSD). | |
123 | ||
124 | As `<device>` it is possible to use more devices, like it's shown in | |
125 | "Create a new pool with RAID*". | |
126 | ||
127 | .. code-block:: console | |
24406ebc | 128 | |
859fe9c1 OB |
129 | # zpool create -f -o ashift=12 <pool> <device> cache <cache_device> |
130 | ||
131 | Create a new pool with log (ZIL) | |
24406ebc | 132 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
133 | |
134 | It is possible to use a dedicated cache drive partition to increase | |
135 | the performance (SSD). | |
136 | ||
137 | As `<device>` it is possible to use more devices, like it's shown in | |
138 | "Create a new pool with RAID*". | |
139 | ||
140 | .. code-block:: console | |
24406ebc | 141 | |
859fe9c1 OB |
142 | # zpool create -f -o ashift=12 <pool> <device> log <log_device> |
143 | ||
144 | Add cache and log to an existing pool | |
24406ebc | 145 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
146 | |
147 | If you have a pool without cache and log. First partition the SSD in | |
148 | 2 partition with `parted` or `gdisk` | |
149 | ||
150 | .. important:: Always use GPT partition tables. | |
151 | ||
152 | The maximum size of a log device should be about half the size of | |
153 | physical memory, so this is usually quite small. The rest of the SSD | |
154 | can be used as cache. | |
155 | ||
156 | .. code-block:: console | |
24406ebc | 157 | |
859fe9c1 OB |
158 | # zpool add -f <pool> log <device-part1> cache <device-part2> |
159 | ||
160 | ||
161 | Changing a failed device | |
24406ebc | 162 | ^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
163 | |
164 | .. code-block:: console | |
24406ebc | 165 | |
859fe9c1 OB |
166 | # zpool replace -f <pool> <old device> <new device> |
167 | ||
168 | ||
169 | Changing a failed bootable device | |
24406ebc | 170 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
171 | |
172 | Depending on how Proxmox Backup was installed it is either using `grub` or `systemd-boot` | |
173 | as bootloader. | |
174 | ||
175 | The first steps of copying the partition table, reissuing GUIDs and replacing | |
176 | the ZFS partition are the same. To make the system bootable from the new disk, | |
177 | different steps are needed which depend on the bootloader in use. | |
178 | ||
179 | .. code-block:: console | |
24406ebc | 180 | |
859fe9c1 OB |
181 | # sgdisk <healthy bootable device> -R <new device> |
182 | # sgdisk -G <new device> | |
183 | # zpool replace -f <pool> <old zfs partition> <new zfs partition> | |
184 | ||
185 | .. NOTE:: Use the `zpool status -v` command to monitor how far the resilvering process of the new disk has progressed. | |
186 | ||
187 | With `systemd-boot`: | |
188 | ||
189 | .. code-block:: console | |
24406ebc | 190 | |
859fe9c1 OB |
191 | # pve-efiboot-tool format <new disk's ESP> |
192 | # pve-efiboot-tool init <new disk's ESP> | |
193 | ||
194 | .. NOTE:: `ESP` stands for EFI System Partition, which is setup as partition #2 on | |
195 | bootable disks setup by the {pve} installer since version 5.4. For details, see | |
196 | xref:sysboot_systemd_boot_setup[Setting up a new partition for use as synced ESP]. | |
197 | ||
198 | With `grub`: | |
199 | ||
200 | Usually `grub.cfg` is located in `/boot/grub/grub.cfg` | |
201 | ||
202 | .. code-block:: console | |
24406ebc | 203 | |
859fe9c1 OB |
204 | # grub-install <new disk> |
205 | # grub-mkconfig -o /path/to/grub.cfg | |
206 | ||
207 | ||
208 | Activate E-Mail Notification | |
24406ebc | 209 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
210 | |
211 | ZFS comes with an event daemon, which monitors events generated by the | |
212 | ZFS kernel module. The daemon can also send emails on ZFS events like | |
213 | pool errors. Newer ZFS packages ship the daemon in a separate package, | |
214 | and you can install it using `apt-get`: | |
215 | ||
216 | .. code-block:: console | |
24406ebc | 217 | |
859fe9c1 OB |
218 | # apt-get install zfs-zed |
219 | ||
220 | To activate the daemon it is necessary to edit `/etc/zfs/zed.d/zed.rc` with your | |
221 | favourite editor, and uncomment the `ZED_EMAIL_ADDR` setting: | |
222 | ||
223 | .. code-block:: console | |
24406ebc | 224 | |
859fe9c1 OB |
225 | ZED_EMAIL_ADDR="root" |
226 | ||
227 | Please note Proxmox Backup forwards mails to `root` to the email address | |
228 | configured for the root user. | |
229 | ||
230 | IMPORTANT: The only setting that is required is `ZED_EMAIL_ADDR`. All | |
231 | other settings are optional. | |
232 | ||
233 | Limit ZFS Memory Usage | |
24406ebc | 234 | ^^^^^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
235 | |
236 | It is good to use at most 50 percent (which is the default) of the | |
237 | system memory for ZFS ARC to prevent performance shortage of the | |
238 | host. Use your preferred editor to change the configuration in | |
239 | `/etc/modprobe.d/zfs.conf` and insert: | |
240 | ||
241 | .. code-block:: console | |
24406ebc | 242 | |
859fe9c1 OB |
243 | options zfs zfs_arc_max=8589934592 |
244 | ||
245 | This example setting limits the usage to 8GB. | |
246 | ||
247 | .. IMPORTANT:: If your root file system is ZFS you must update your initramfs every time this value changes: | |
248 | ||
249 | .. code-block:: console | |
24406ebc | 250 | |
859fe9c1 OB |
251 | # update-initramfs -u |
252 | ||
253 | ||
254 | SWAP on ZFS | |
24406ebc | 255 | ^^^^^^^^^^^ |
859fe9c1 OB |
256 | |
257 | Swap-space created on a zvol may generate some troubles, like blocking the | |
258 | server or generating a high IO load, often seen when starting a Backup | |
259 | to an external Storage. | |
260 | ||
261 | We strongly recommend to use enough memory, so that you normally do not | |
262 | run into low memory situations. Should you need or want to add swap, it is | |
263 | preferred to create a partition on a physical disk and use it as swapdevice. | |
264 | You can leave some space free for this purpose in the advanced options of the | |
265 | installer. Additionally, you can lower the `swappiness` value. | |
266 | A good value for servers is 10: | |
267 | ||
268 | .. code-block:: console | |
24406ebc | 269 | |
859fe9c1 OB |
270 | # sysctl -w vm.swappiness=10 |
271 | ||
272 | To make the swappiness persistent, open `/etc/sysctl.conf` with | |
273 | an editor of your choice and add the following line: | |
274 | ||
275 | .. code-block:: console | |
24406ebc | 276 | |
859fe9c1 OB |
277 | vm.swappiness = 10 |
278 | ||
279 | .. table:: Linux kernel `swappiness` parameter values | |
280 | :widths:auto | |
24406ebc TL |
281 | |
282 | ==================== =============================================================== | |
859fe9c1 | 283 | Value Strategy |
24406ebc | 284 | ==================== =============================================================== |
859fe9c1 OB |
285 | vm.swappiness = 0 The kernel will swap only to avoid an 'out of memory' condition |
286 | vm.swappiness = 1 Minimum amount of swapping without disabling it entirely. | |
24406ebc | 287 | vm.swappiness = 10 Sometimes recommended to improve performance when sufficient memory exists in a system. |
859fe9c1 OB |
288 | vm.swappiness = 60 The default value. |
289 | vm.swappiness = 100 The kernel will swap aggressively. | |
24406ebc | 290 | ==================== =============================================================== |
859fe9c1 OB |
291 | |
292 | ZFS Compression | |
24406ebc | 293 | ^^^^^^^^^^^^^^^ |
859fe9c1 OB |
294 | |
295 | To activate compression: | |
296 | .. code-block:: console | |
24406ebc | 297 | |
859fe9c1 OB |
298 | # zpool set compression=lz4 <pool> |
299 | ||
300 | We recommend using the `lz4` algorithm, since it adds very little CPU overhead. | |
301 | Other algorithms such as `lzjb` and `gzip-N` (where `N` is an integer `1-9` representing | |
302 | the compression ratio, 1 is fastest and 9 is best compression) are also available. | |
303 | Depending on the algorithm and how compressible the data is, having compression enabled can even increase | |
304 | I/O performance. | |
305 | ||
306 | You can disable compression at any time with: | |
307 | .. code-block:: console | |
24406ebc | 308 | |
859fe9c1 OB |
309 | # zfs set compression=off <dataset> |
310 | ||
311 | Only new blocks will be affected by this change. | |
312 | ||
313 | ZFS Special Device | |
24406ebc | 314 | ^^^^^^^^^^^^^^^^^^ |
859fe9c1 OB |
315 | |
316 | Since version 0.8.0 ZFS supports `special` devices. A `special` device in a | |
317 | pool is used to store metadata, deduplication tables, and optionally small | |
318 | file blocks. | |
319 | ||
320 | A `special` device can improve the speed of a pool consisting of slow spinning | |
321 | hard disks with a lot of metadata changes. For example workloads that involve | |
322 | creating, updating or deleting a large number of files will benefit from the | |
323 | presence of a `special` device. ZFS datasets can also be configured to store | |
324 | whole small files on the `special` device which can further improve the | |
325 | performance. Use fast SSDs for the `special` device. | |
326 | ||
327 | .. IMPORTANT:: The redundancy of the `special` device should match the one of the | |
328 | pool, since the `special` device is a point of failure for the whole pool. | |
329 | ||
330 | .. WARNING:: Adding a `special` device to a pool cannot be undone! | |
331 | ||
332 | Create a pool with `special` device and RAID-1: | |
333 | ||
334 | .. code-block:: console | |
24406ebc | 335 | |
859fe9c1 OB |
336 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> special mirror <device3> <device4> |
337 | ||
338 | Adding a `special` device to an existing pool with RAID-1: | |
339 | ||
340 | .. code-block:: console | |
24406ebc | 341 | |
859fe9c1 OB |
342 | # zpool add <pool> special mirror <device1> <device2> |
343 | ||
344 | ZFS datasets expose the `special_small_blocks=<size>` property. `size` can be | |
345 | `0` to disable storing small file blocks on the `special` device or a power of | |
346 | two in the range between `512B` to `128K`. After setting the property new file | |
347 | blocks smaller than `size` will be allocated on the `special` device. | |
348 | ||
349 | .. IMPORTANT:: If the value for `special_small_blocks` is greater than or equal to | |
350 | the `recordsize` (default `128K`) of the dataset, *all* data will be written to | |
351 | the `special` device, so be careful! | |
352 | ||
353 | Setting the `special_small_blocks` property on a pool will change the default | |
354 | value of that property for all child ZFS datasets (for example all containers | |
355 | in the pool will opt in for small file blocks). | |
356 | ||
357 | Opt in for all file smaller than 4K-blocks pool-wide: | |
358 | ||
359 | .. code-block:: console | |
24406ebc | 360 | |
859fe9c1 OB |
361 | # zfs set special_small_blocks=4K <pool> |
362 | ||
363 | Opt in for small file blocks for a single dataset: | |
364 | ||
365 | .. code-block:: console | |
24406ebc | 366 | |
859fe9c1 OB |
367 | # zfs set special_small_blocks=4K <pool>/<filesystem> |
368 | ||
369 | Opt out from small file blocks for a single dataset: | |
370 | ||
371 | .. code-block:: console | |
24406ebc | 372 | |
859fe9c1 OB |
373 | # zfs set special_small_blocks=0 <pool>/<filesystem> |
374 | ||
375 | Troubleshooting | |
24406ebc | 376 | ^^^^^^^^^^^^^^^ |
859fe9c1 OB |
377 | |
378 | Corrupted cachefile | |
379 | ||
380 | In case of a corrupted ZFS cachefile, some volumes may not be mounted during | |
381 | boot until mounted manually later. | |
382 | ||
383 | For each pool, run: | |
384 | ||
385 | .. code-block:: console | |
24406ebc | 386 | |
859fe9c1 OB |
387 | # zpool set cachefile=/etc/zfs/zpool.cache POOLNAME |
388 | ||
389 | and afterwards update the `initramfs` by running: | |
390 | ||
391 | .. code-block:: console | |
24406ebc | 392 | |
859fe9c1 OB |
393 | # update-initramfs -u -k all |
394 | ||
395 | and finally reboot your node. | |
396 | ||
397 | Sometimes the ZFS cachefile can get corrupted, and `zfs-import-cache.service` | |
398 | doesn't import the pools that aren't present in the cachefile. | |
399 | ||
400 | Another workaround to this problem is enabling the `zfs-import-scan.service`, | |
401 | which searches and imports pools via device scanning (usually slower). |