]>
Commit | Line | Data |
---|---|---|
0235c741 | 1 | [[chapter_zfs]] |
9ee94323 DM |
2 | ZFS on Linux |
3 | ------------ | |
5f09af76 DM |
4 | ifdef::wiki[] |
5 | :pve-toplevel: | |
6 | endif::wiki[] | |
7 | ||
9ee94323 DM |
8 | ZFS is a combined file system and logical volume manager designed by |
9 | Sun Microsystems. Starting with {pve} 3.4, the native Linux | |
10 | kernel port of the ZFS file system is introduced as optional | |
5eba0743 FG |
11 | file system and also as an additional selection for the root |
12 | file system. There is no need for manually compile ZFS modules - all | |
9ee94323 DM |
13 | packages are included. |
14 | ||
5eba0743 | 15 | By using ZFS, its possible to achieve maximum enterprise features with |
9ee94323 DM |
16 | low budget hardware, but also high performance systems by leveraging |
17 | SSD caching or even SSD only setups. ZFS can replace cost intense | |
18 | hardware raid cards by moderate CPU and memory load combined with easy | |
19 | management. | |
20 | ||
21 | .General ZFS advantages | |
22 | ||
23 | * Easy configuration and management with {pve} GUI and CLI. | |
24 | ||
25 | * Reliable | |
26 | ||
27 | * Protection against data corruption | |
28 | ||
5eba0743 | 29 | * Data compression on file system level |
9ee94323 DM |
30 | |
31 | * Snapshots | |
32 | ||
33 | * Copy-on-write clone | |
34 | ||
35 | * Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3 | |
36 | ||
37 | * Can use SSD for cache | |
38 | ||
39 | * Self healing | |
40 | ||
41 | * Continuous integrity checking | |
42 | ||
43 | * Designed for high storage capacities | |
44 | ||
45 | * Protection against data corruption | |
46 | ||
47 | * Asynchronous replication over network | |
48 | ||
49 | * Open Source | |
50 | ||
51 | * Encryption | |
52 | ||
53 | * ... | |
54 | ||
55 | ||
56 | Hardware | |
57 | ~~~~~~~~ | |
58 | ||
59 | ZFS depends heavily on memory, so you need at least 8GB to start. In | |
60 | practice, use as much you can get for your hardware/budget. To prevent | |
61 | data corruption, we recommend the use of high quality ECC RAM. | |
62 | ||
d48bdcf2 | 63 | If you use a dedicated cache and/or log disk, you should use an |
9ee94323 DM |
64 | enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can |
65 | increase the overall performance significantly. | |
66 | ||
5eba0743 | 67 | IMPORTANT: Do not use ZFS on top of hardware controller which has its |
9ee94323 DM |
68 | own cache management. ZFS needs to directly communicate with disks. An |
69 | HBA adapter is the way to go, or something like LSI controller flashed | |
8c1189b6 | 70 | in ``IT'' mode. |
9ee94323 DM |
71 | |
72 | If you are experimenting with an installation of {pve} inside a VM | |
8c1189b6 | 73 | (Nested Virtualization), don't use `virtio` for disks of that VM, |
9ee94323 | 74 | since they are not supported by ZFS. Use IDE or SCSI instead (works |
8c1189b6 | 75 | also with `virtio` SCSI controller type). |
9ee94323 DM |
76 | |
77 | ||
5eba0743 | 78 | Installation as Root File System |
9ee94323 DM |
79 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
80 | ||
81 | When you install using the {pve} installer, you can choose ZFS for the | |
82 | root file system. You need to select the RAID type at installation | |
83 | time: | |
84 | ||
85 | [horizontal] | |
8c1189b6 FG |
86 | RAID0:: Also called ``striping''. The capacity of such volume is the sum |
87 | of the capacities of all disks. But RAID0 does not add any redundancy, | |
9ee94323 DM |
88 | so the failure of a single drive makes the volume unusable. |
89 | ||
8c1189b6 | 90 | RAID1:: Also called ``mirroring''. Data is written identically to all |
9ee94323 DM |
91 | disks. This mode requires at least 2 disks with the same size. The |
92 | resulting capacity is that of a single disk. | |
93 | ||
94 | RAID10:: A combination of RAID0 and RAID1. Requires at least 4 disks. | |
95 | ||
96 | RAIDZ-1:: A variation on RAID-5, single parity. Requires at least 3 disks. | |
97 | ||
98 | RAIDZ-2:: A variation on RAID-5, double parity. Requires at least 4 disks. | |
99 | ||
100 | RAIDZ-3:: A variation on RAID-5, triple parity. Requires at least 5 disks. | |
101 | ||
102 | The installer automatically partitions the disks, creates a ZFS pool | |
8c1189b6 FG |
103 | called `rpool`, and installs the root file system on the ZFS subvolume |
104 | `rpool/ROOT/pve-1`. | |
9ee94323 | 105 | |
8c1189b6 | 106 | Another subvolume called `rpool/data` is created to store VM |
9ee94323 | 107 | images. In order to use that with the {pve} tools, the installer |
8c1189b6 | 108 | creates the following configuration entry in `/etc/pve/storage.cfg`: |
9ee94323 DM |
109 | |
110 | ---- | |
111 | zfspool: local-zfs | |
112 | pool rpool/data | |
113 | sparse | |
114 | content images,rootdir | |
115 | ---- | |
116 | ||
117 | After installation, you can view your ZFS pool status using the | |
8c1189b6 | 118 | `zpool` command: |
9ee94323 DM |
119 | |
120 | ---- | |
121 | # zpool status | |
122 | pool: rpool | |
123 | state: ONLINE | |
124 | scan: none requested | |
125 | config: | |
126 | ||
127 | NAME STATE READ WRITE CKSUM | |
128 | rpool ONLINE 0 0 0 | |
129 | mirror-0 ONLINE 0 0 0 | |
130 | sda2 ONLINE 0 0 0 | |
131 | sdb2 ONLINE 0 0 0 | |
132 | mirror-1 ONLINE 0 0 0 | |
133 | sdc ONLINE 0 0 0 | |
134 | sdd ONLINE 0 0 0 | |
135 | ||
136 | errors: No known data errors | |
137 | ---- | |
138 | ||
8c1189b6 | 139 | The `zfs` command is used configure and manage your ZFS file |
9ee94323 DM |
140 | systems. The following command lists all file systems after |
141 | installation: | |
142 | ||
143 | ---- | |
144 | # zfs list | |
145 | NAME USED AVAIL REFER MOUNTPOINT | |
146 | rpool 4.94G 7.68T 96K /rpool | |
147 | rpool/ROOT 702M 7.68T 96K /rpool/ROOT | |
148 | rpool/ROOT/pve-1 702M 7.68T 702M / | |
149 | rpool/data 96K 7.68T 96K /rpool/data | |
150 | rpool/swap 4.25G 7.69T 64K - | |
151 | ---- | |
152 | ||
153 | ||
154 | Bootloader | |
155 | ~~~~~~~~~~ | |
156 | ||
1748211a SI |
157 | Depending on whether the system is booted in EFI or legacy BIOS mode the |
158 | {pve} installer sets up either `grub` or `systemd-boot` as main bootloader. | |
69055103 | 159 | See the chapter on xref:sysboot[{pve} host bootladers] for details. |
9ee94323 DM |
160 | |
161 | ||
162 | ZFS Administration | |
163 | ~~~~~~~~~~~~~~~~~~ | |
164 | ||
165 | This section gives you some usage examples for common tasks. ZFS | |
166 | itself is really powerful and provides many options. The main commands | |
8c1189b6 FG |
167 | to manage ZFS are `zfs` and `zpool`. Both commands come with great |
168 | manual pages, which can be read with: | |
9ee94323 DM |
169 | |
170 | ---- | |
171 | # man zpool | |
172 | # man zfs | |
173 | ----- | |
174 | ||
42449bdf TL |
175 | [[sysadmin_zfs_create_new_zpool]] |
176 | Create a new zpool | |
177 | ^^^^^^^^^^^^^^^^^^ | |
9ee94323 | 178 | |
8c1189b6 FG |
179 | To create a new pool, at least one disk is needed. The `ashift` should |
180 | have the same sector-size (2 power of `ashift`) or larger as the | |
9ee94323 DM |
181 | underlying disk. |
182 | ||
eaefe614 FE |
183 | ---- |
184 | # zpool create -f -o ashift=12 <pool> <device> | |
185 | ---- | |
9ee94323 | 186 | |
e06707f2 | 187 | To activate compression (see section <<zfs_compression,Compression in ZFS>>): |
9ee94323 | 188 | |
eaefe614 FE |
189 | ---- |
190 | # zfs set compression=lz4 <pool> | |
191 | ---- | |
9ee94323 | 192 | |
42449bdf TL |
193 | [[sysadmin_zfs_create_new_zpool_raid0]] |
194 | Create a new pool with RAID-0 | |
195 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
9ee94323 | 196 | |
dc2d00a0 | 197 | Minimum 1 disk |
9ee94323 | 198 | |
eaefe614 FE |
199 | ---- |
200 | # zpool create -f -o ashift=12 <pool> <device1> <device2> | |
201 | ---- | |
9ee94323 | 202 | |
42449bdf TL |
203 | [[sysadmin_zfs_create_new_zpool_raid1]] |
204 | Create a new pool with RAID-1 | |
205 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
9ee94323 | 206 | |
dc2d00a0 | 207 | Minimum 2 disks |
9ee94323 | 208 | |
eaefe614 FE |
209 | ---- |
210 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> | |
211 | ---- | |
9ee94323 | 212 | |
42449bdf TL |
213 | [[sysadmin_zfs_create_new_zpool_raid10]] |
214 | Create a new pool with RAID-10 | |
215 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
9ee94323 | 216 | |
dc2d00a0 | 217 | Minimum 4 disks |
9ee94323 | 218 | |
eaefe614 FE |
219 | ---- |
220 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4> | |
221 | ---- | |
9ee94323 | 222 | |
42449bdf TL |
223 | [[sysadmin_zfs_create_new_zpool_raidz1]] |
224 | Create a new pool with RAIDZ-1 | |
225 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
9ee94323 | 226 | |
dc2d00a0 | 227 | Minimum 3 disks |
9ee94323 | 228 | |
eaefe614 FE |
229 | ---- |
230 | # zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3> | |
231 | ---- | |
9ee94323 | 232 | |
42449bdf TL |
233 | Create a new pool with RAIDZ-2 |
234 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
9ee94323 | 235 | |
dc2d00a0 | 236 | Minimum 4 disks |
9ee94323 | 237 | |
eaefe614 FE |
238 | ---- |
239 | # zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4> | |
240 | ---- | |
9ee94323 | 241 | |
42449bdf TL |
242 | [[sysadmin_zfs_create_new_zpool_with_cache]] |
243 | Create a new pool with cache (L2ARC) | |
244 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
9ee94323 DM |
245 | |
246 | It is possible to use a dedicated cache drive partition to increase | |
247 | the performance (use SSD). | |
248 | ||
8c1189b6 | 249 | As `<device>` it is possible to use more devices, like it's shown in |
9ee94323 DM |
250 | "Create a new pool with RAID*". |
251 | ||
eaefe614 FE |
252 | ---- |
253 | # zpool create -f -o ashift=12 <pool> <device> cache <cache_device> | |
254 | ---- | |
9ee94323 | 255 | |
42449bdf TL |
256 | [[sysadmin_zfs_create_new_zpool_with_log]] |
257 | Create a new pool with log (ZIL) | |
258 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
9ee94323 DM |
259 | |
260 | It is possible to use a dedicated cache drive partition to increase | |
261 | the performance(SSD). | |
262 | ||
8c1189b6 | 263 | As `<device>` it is possible to use more devices, like it's shown in |
9ee94323 DM |
264 | "Create a new pool with RAID*". |
265 | ||
eaefe614 FE |
266 | ---- |
267 | # zpool create -f -o ashift=12 <pool> <device> log <log_device> | |
268 | ---- | |
9ee94323 | 269 | |
42449bdf TL |
270 | [[sysadmin_zfs_add_cache_and_log_dev]] |
271 | Add cache and log to an existing pool | |
272 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
9ee94323 | 273 | |
5dfeeece | 274 | If you have a pool without cache and log. First partition the SSD in |
8c1189b6 | 275 | 2 partition with `parted` or `gdisk` |
9ee94323 | 276 | |
e300cf7d | 277 | IMPORTANT: Always use GPT partition tables. |
9ee94323 DM |
278 | |
279 | The maximum size of a log device should be about half the size of | |
280 | physical memory, so this is usually quite small. The rest of the SSD | |
5eba0743 | 281 | can be used as cache. |
9ee94323 | 282 | |
eaefe614 | 283 | ---- |
237007eb | 284 | # zpool add -f <pool> log <device-part1> cache <device-part2> |
eaefe614 | 285 | ---- |
9ee94323 | 286 | |
42449bdf TL |
287 | [[sysadmin_zfs_change_failed_dev]] |
288 | Changing a failed device | |
289 | ^^^^^^^^^^^^^^^^^^^^^^^^ | |
9ee94323 | 290 | |
eaefe614 FE |
291 | ---- |
292 | # zpool replace -f <pool> <old device> <new device> | |
293 | ---- | |
1748211a | 294 | |
11a6e022 AL |
295 | .Changing a failed bootable device |
296 | ||
297 | Depending on how {pve} was installed it is either using `grub` or `systemd-boot` | |
298 | as bootloader (see xref:sysboot[Host Bootloader]). | |
299 | ||
300 | The first steps of copying the partition table, reissuing GUIDs and replacing | |
301 | the ZFS partition are the same. To make the system bootable from the new disk, | |
302 | different steps are needed which depend on the bootloader in use. | |
1748211a | 303 | |
eaefe614 FE |
304 | ---- |
305 | # sgdisk <healthy bootable device> -R <new device> | |
306 | # sgdisk -G <new device> | |
307 | # zpool replace -f <pool> <old zfs partition> <new zfs partition> | |
11a6e022 AL |
308 | ---- |
309 | ||
310 | NOTE: Use the `zpool status -v` command to monitor how far the resivlering | |
311 | process of the new disk has progressed. | |
312 | ||
42449bdf | 313 | .With `systemd-boot`: |
11a6e022 AL |
314 | |
315 | ---- | |
eaefe614 FE |
316 | # pve-efiboot-tool format <new disk's ESP> |
317 | # pve-efiboot-tool init <new disk's ESP> | |
318 | ---- | |
0daaddbd FG |
319 | |
320 | NOTE: `ESP` stands for EFI System Partition, which is setup as partition #2 on | |
321 | bootable disks setup by the {pve} installer since version 5.4. For details, see | |
322 | xref:sysboot_systemd_boot_setup[Setting up a new partition for use as synced ESP]. | |
9ee94323 | 323 | |
42449bdf | 324 | .With `grub`: |
11a6e022 AL |
325 | |
326 | ---- | |
327 | # grub-install <new disk> | |
328 | ---- | |
9ee94323 DM |
329 | |
330 | Activate E-Mail Notification | |
331 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
332 | ||
333 | ZFS comes with an event daemon, which monitors events generated by the | |
5eba0743 | 334 | ZFS kernel module. The daemon can also send emails on ZFS events like |
5dfeeece | 335 | pool errors. Newer ZFS packages ship the daemon in a separate package, |
e280a948 DM |
336 | and you can install it using `apt-get`: |
337 | ||
338 | ---- | |
339 | # apt-get install zfs-zed | |
340 | ---- | |
9ee94323 | 341 | |
8c1189b6 FG |
342 | To activate the daemon it is necessary to edit `/etc/zfs/zed.d/zed.rc` with your |
343 | favourite editor, and uncomment the `ZED_EMAIL_ADDR` setting: | |
9ee94323 | 344 | |
083adc34 | 345 | -------- |
9ee94323 | 346 | ZED_EMAIL_ADDR="root" |
083adc34 | 347 | -------- |
9ee94323 | 348 | |
8c1189b6 | 349 | Please note {pve} forwards mails to `root` to the email address |
9ee94323 DM |
350 | configured for the root user. |
351 | ||
8c1189b6 | 352 | IMPORTANT: The only setting that is required is `ZED_EMAIL_ADDR`. All |
9ee94323 DM |
353 | other settings are optional. |
354 | ||
355 | ||
42449bdf | 356 | [[sysadmin_zfs_limit_memory_usage]] |
5eba0743 | 357 | Limit ZFS Memory Usage |
9ee94323 DM |
358 | ~~~~~~~~~~~~~~~~~~~~~~ |
359 | ||
5eba0743 | 360 | It is good to use at most 50 percent (which is the default) of the |
d362b7f4 DM |
361 | system memory for ZFS ARC to prevent performance shortage of the |
362 | host. Use your preferred editor to change the configuration in | |
8c1189b6 | 363 | `/etc/modprobe.d/zfs.conf` and insert: |
9ee94323 | 364 | |
5eba0743 FG |
365 | -------- |
366 | options zfs zfs_arc_max=8589934592 | |
367 | -------- | |
9ee94323 DM |
368 | |
369 | This example setting limits the usage to 8GB. | |
370 | ||
371 | [IMPORTANT] | |
372 | ==== | |
5eba0743 FG |
373 | If your root file system is ZFS you must update your initramfs every |
374 | time this value changes: | |
9ee94323 | 375 | |
eaefe614 FE |
376 | ---- |
377 | # update-initramfs -u | |
378 | ---- | |
9ee94323 DM |
379 | ==== |
380 | ||
381 | ||
dc74fc63 | 382 | [[zfs_swap]] |
4128e7ff TL |
383 | SWAP on ZFS |
384 | ~~~~~~~~~~~ | |
9ee94323 | 385 | |
dc74fc63 | 386 | Swap-space created on a zvol may generate some troubles, like blocking the |
9ee94323 DM |
387 | server or generating a high IO load, often seen when starting a Backup |
388 | to an external Storage. | |
389 | ||
390 | We strongly recommend to use enough memory, so that you normally do not | |
dc74fc63 SI |
391 | run into low memory situations. Should you need or want to add swap, it is |
392 | preferred to create a partition on a physical disk and use it as swapdevice. | |
393 | You can leave some space free for this purpose in the advanced options of the | |
394 | installer. Additionally, you can lower the | |
8c1189b6 | 395 | ``swappiness'' value. A good value for servers is 10: |
9ee94323 | 396 | |
eaefe614 FE |
397 | ---- |
398 | # sysctl -w vm.swappiness=10 | |
399 | ---- | |
9ee94323 | 400 | |
8c1189b6 | 401 | To make the swappiness persistent, open `/etc/sysctl.conf` with |
9ee94323 DM |
402 | an editor of your choice and add the following line: |
403 | ||
083adc34 FG |
404 | -------- |
405 | vm.swappiness = 10 | |
406 | -------- | |
9ee94323 | 407 | |
8c1189b6 | 408 | .Linux kernel `swappiness` parameter values |
9ee94323 DM |
409 | [width="100%",cols="<m,2d",options="header"] |
410 | |=========================================================== | |
411 | | Value | Strategy | |
412 | | vm.swappiness = 0 | The kernel will swap only to avoid | |
413 | an 'out of memory' condition | |
414 | | vm.swappiness = 1 | Minimum amount of swapping without | |
415 | disabling it entirely. | |
416 | | vm.swappiness = 10 | This value is sometimes recommended to | |
417 | improve performance when sufficient memory exists in a system. | |
418 | | vm.swappiness = 60 | The default value. | |
419 | | vm.swappiness = 100 | The kernel will swap aggressively. | |
420 | |=========================================================== | |
cca0540e FG |
421 | |
422 | [[zfs_encryption]] | |
4128e7ff TL |
423 | Encrypted ZFS Datasets |
424 | ~~~~~~~~~~~~~~~~~~~~~~ | |
cca0540e FG |
425 | |
426 | ZFS on Linux version 0.8.0 introduced support for native encryption of | |
427 | datasets. After an upgrade from previous ZFS on Linux versions, the encryption | |
229426eb | 428 | feature can be enabled per pool: |
cca0540e FG |
429 | |
430 | ---- | |
431 | # zpool get feature@encryption tank | |
432 | NAME PROPERTY VALUE SOURCE | |
433 | tank feature@encryption disabled local | |
434 | ||
435 | # zpool set feature@encryption=enabled | |
436 | ||
437 | # zpool get feature@encryption tank | |
438 | NAME PROPERTY VALUE SOURCE | |
439 | tank feature@encryption enabled local | |
440 | ---- | |
441 | ||
442 | WARNING: There is currently no support for booting from pools with encrypted | |
443 | datasets using Grub, and only limited support for automatically unlocking | |
444 | encrypted datasets on boot. Older versions of ZFS without encryption support | |
445 | will not be able to decrypt stored data. | |
446 | ||
447 | NOTE: It is recommended to either unlock storage datasets manually after | |
448 | booting, or to write a custom unit to pass the key material needed for | |
449 | unlocking on boot to `zfs load-key`. | |
450 | ||
451 | WARNING: Establish and test a backup procedure before enabling encryption of | |
5dfeeece | 452 | production data. If the associated key material/passphrase/keyfile has been |
cca0540e FG |
453 | lost, accessing the encrypted data is no longer possible. |
454 | ||
455 | Encryption needs to be setup when creating datasets/zvols, and is inherited by | |
456 | default to child datasets. For example, to create an encrypted dataset | |
457 | `tank/encrypted_data` and configure it as storage in {pve}, run the following | |
458 | commands: | |
459 | ||
460 | ---- | |
461 | # zfs create -o encryption=on -o keyformat=passphrase tank/encrypted_data | |
462 | Enter passphrase: | |
463 | Re-enter passphrase: | |
464 | ||
465 | # pvesm add zfspool encrypted_zfs -pool tank/encrypted_data | |
466 | ---- | |
467 | ||
468 | All guest volumes/disks create on this storage will be encrypted with the | |
469 | shared key material of the parent dataset. | |
470 | ||
471 | To actually use the storage, the associated key material needs to be loaded | |
472 | with `zfs load-key`: | |
473 | ||
474 | ---- | |
475 | # zfs load-key tank/encrypted_data | |
476 | Enter passphrase for 'tank/encrypted_data': | |
477 | ---- | |
478 | ||
479 | It is also possible to use a (random) keyfile instead of prompting for a | |
480 | passphrase by setting the `keylocation` and `keyformat` properties, either at | |
229426eb | 481 | creation time or with `zfs change-key` on existing datasets: |
cca0540e FG |
482 | |
483 | ---- | |
484 | # dd if=/dev/urandom of=/path/to/keyfile bs=32 count=1 | |
485 | ||
486 | # zfs change-key -o keyformat=raw -o keylocation=file:///path/to/keyfile tank/encrypted_data | |
487 | ---- | |
488 | ||
489 | WARNING: When using a keyfile, special care needs to be taken to secure the | |
490 | keyfile against unauthorized access or accidental loss. Without the keyfile, it | |
491 | is not possible to access the plaintext data! | |
492 | ||
493 | A guest volume created underneath an encrypted dataset will have its | |
494 | `encryptionroot` property set accordingly. The key material only needs to be | |
495 | loaded once per encryptionroot to be available to all encrypted datasets | |
496 | underneath it. | |
497 | ||
498 | See the `encryptionroot`, `encryption`, `keylocation`, `keyformat` and | |
499 | `keystatus` properties, the `zfs load-key`, `zfs unload-key` and `zfs | |
500 | change-key` commands and the `Encryption` section from `man zfs` for more | |
501 | details and advanced usage. | |
68029ec8 FE |
502 | |
503 | ||
e06707f2 FE |
504 | [[zfs_compression]] |
505 | Compression in ZFS | |
506 | ~~~~~~~~~~~~~~~~~~ | |
507 | ||
508 | When compression is enabled on a dataset, ZFS tries to compress all *new* | |
509 | blocks before writing them and decompresses them on reading. Already | |
510 | existing data will not be compressed retroactively. | |
511 | ||
512 | You can enable compression with: | |
513 | ||
514 | ---- | |
515 | # zfs set compression=<algorithm> <dataset> | |
516 | ---- | |
517 | ||
518 | We recommend using the `lz4` algorithm, because it adds very little CPU | |
519 | overhead. Other algorithms like `lzjb` and `gzip-N`, where `N` is an | |
520 | integer from `1` (fastest) to `9` (best compression ratio), are also | |
521 | available. Depending on the algorithm and how compressible the data is, | |
522 | having compression enabled can even increase I/O performance. | |
523 | ||
524 | You can disable compression at any time with: | |
525 | ||
526 | ---- | |
527 | # zfs set compression=off <dataset> | |
528 | ---- | |
529 | ||
530 | Again, only new blocks will be affected by this change. | |
531 | ||
532 | ||
42449bdf | 533 | [[sysadmin_zfs_special_device]] |
68029ec8 FE |
534 | ZFS Special Device |
535 | ~~~~~~~~~~~~~~~~~~ | |
536 | ||
537 | Since version 0.8.0 ZFS supports `special` devices. A `special` device in a | |
538 | pool is used to store metadata, deduplication tables, and optionally small | |
539 | file blocks. | |
540 | ||
541 | A `special` device can improve the speed of a pool consisting of slow spinning | |
51e544b6 TL |
542 | hard disks with a lot of metadata changes. For example workloads that involve |
543 | creating, updating or deleting a large number of files will benefit from the | |
544 | presence of a `special` device. ZFS datasets can also be configured to store | |
545 | whole small files on the `special` device which can further improve the | |
546 | performance. Use fast SSDs for the `special` device. | |
68029ec8 FE |
547 | |
548 | IMPORTANT: The redundancy of the `special` device should match the one of the | |
549 | pool, since the `special` device is a point of failure for the whole pool. | |
550 | ||
551 | WARNING: Adding a `special` device to a pool cannot be undone! | |
552 | ||
553 | .Create a pool with `special` device and RAID-1: | |
554 | ||
eaefe614 FE |
555 | ---- |
556 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> special mirror <device3> <device4> | |
557 | ---- | |
68029ec8 FE |
558 | |
559 | .Add a `special` device to an existing pool with RAID-1: | |
560 | ||
eaefe614 FE |
561 | ---- |
562 | # zpool add <pool> special mirror <device1> <device2> | |
563 | ---- | |
68029ec8 FE |
564 | |
565 | ZFS datasets expose the `special_small_blocks=<size>` property. `size` can be | |
566 | `0` to disable storing small file blocks on the `special` device or a power of | |
567 | two in the range between `512B` to `128K`. After setting the property new file | |
568 | blocks smaller than `size` will be allocated on the `special` device. | |
569 | ||
570 | IMPORTANT: If the value for `special_small_blocks` is greater than or equal to | |
51e544b6 TL |
571 | the `recordsize` (default `128K`) of the dataset, *all* data will be written to |
572 | the `special` device, so be careful! | |
68029ec8 FE |
573 | |
574 | Setting the `special_small_blocks` property on a pool will change the default | |
575 | value of that property for all child ZFS datasets (for example all containers | |
576 | in the pool will opt in for small file blocks). | |
577 | ||
51e544b6 | 578 | .Opt in for all file smaller than 4K-blocks pool-wide: |
68029ec8 | 579 | |
eaefe614 FE |
580 | ---- |
581 | # zfs set special_small_blocks=4K <pool> | |
582 | ---- | |
68029ec8 FE |
583 | |
584 | .Opt in for small file blocks for a single dataset: | |
585 | ||
eaefe614 FE |
586 | ---- |
587 | # zfs set special_small_blocks=4K <pool>/<filesystem> | |
588 | ---- | |
68029ec8 FE |
589 | |
590 | .Opt out from small file blocks for a single dataset: | |
591 | ||
eaefe614 FE |
592 | ---- |
593 | # zfs set special_small_blocks=0 <pool>/<filesystem> | |
594 | ---- |