]>
Commit | Line | Data |
---|---|---|
859fe9c1 OB |
1 | ZFS on Linux |
2 | ============= | |
3 | .. code-block:: console.. code-block:: console.. code-block:: console | |
4 | ||
5 | ZFS is a combined file system and logical volume manager designed by | |
6 | Sun Microsystems. There is no need for manually compile ZFS modules - all | |
7 | packages are included. | |
8 | ||
9 | By using ZFS, its possible to achieve maximum enterprise features with | |
10 | low budget hardware, but also high performance systems by leveraging | |
11 | SSD caching or even SSD only setups. ZFS can replace cost intense | |
12 | hardware raid cards by moderate CPU and memory load combined with easy | |
13 | management. | |
14 | ||
15 | General ZFS advantages | |
16 | ||
17 | * Easy configuration and management with GUI and CLI. | |
18 | * Reliable | |
19 | * Protection against data corruption | |
20 | * Data compression on file system level | |
21 | * Snapshots | |
22 | * Copy-on-write clone | |
23 | * Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3 | |
24 | * Can use SSD for cache | |
25 | * Self healing | |
26 | * Continuous integrity checking | |
27 | * Designed for high storage capacities | |
28 | * Protection against data corruption | |
29 | * Asynchronous replication over network | |
30 | * Open Source | |
31 | * Encryption | |
32 | ||
33 | Hardware | |
34 | --------- | |
35 | ||
36 | ZFS depends heavily on memory, so you need at least 8GB to start. In | |
37 | practice, use as much you can get for your hardware/budget. To prevent | |
38 | data corruption, we recommend the use of high quality ECC RAM. | |
39 | ||
40 | If you use a dedicated cache and/or log disk, you should use an | |
41 | enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can | |
42 | increase the overall performance significantly. | |
43 | ||
44 | IMPORTANT: Do not use ZFS on top of hardware controller which has its | |
45 | own cache management. ZFS needs to directly communicate with disks. An | |
46 | HBA adapter is the way to go, or something like LSI controller flashed | |
47 | in ``IT`` mode. | |
48 | ||
49 | ||
50 | ||
51 | ||
52 | ZFS Administration | |
53 | ------------------ | |
54 | ||
55 | This section gives you some usage examples for common tasks. ZFS | |
56 | itself is really powerful and provides many options. The main commands | |
57 | to manage ZFS are `zfs` and `zpool`. Both commands come with great | |
58 | manual pages, which can be read with: | |
59 | ||
60 | .. code-block:: console | |
61 | # man zpool | |
62 | # man zfs | |
63 | ||
64 | Create a new zpool | |
65 | ~~~~~~~~~~~~~~~~~~ | |
66 | ||
67 | To create a new pool, at least one disk is needed. The `ashift` should | |
68 | have the same sector-size (2 power of `ashift`) or larger as the | |
69 | underlying disk. | |
70 | ||
71 | .. code-block:: console | |
72 | # zpool create -f -o ashift=12 <pool> <device> | |
73 | ||
74 | Create a new pool with RAID-0 | |
75 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
76 | ||
77 | Minimum 1 disk | |
78 | ||
79 | .. code-block:: console | |
80 | # zpool create -f -o ashift=12 <pool> <device1> <device2> | |
81 | ||
82 | Create a new pool with RAID-1 | |
83 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
84 | ||
85 | Minimum 2 disks | |
86 | ||
87 | .. code-block:: console | |
88 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> | |
89 | ||
90 | Create a new pool with RAID-10 | |
91 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
92 | ||
93 | Minimum 4 disks | |
94 | ||
95 | .. code-block:: console | |
96 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4> | |
97 | ||
98 | Create a new pool with RAIDZ-1 | |
99 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
100 | ||
101 | Minimum 3 disks | |
102 | ||
103 | .. code-block:: console | |
104 | # zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3> | |
105 | ||
106 | Create a new pool with RAIDZ-2 | |
107 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
108 | ||
109 | Minimum 4 disks | |
110 | ||
111 | .. code-block:: console | |
112 | # zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4> | |
113 | ||
114 | Create a new pool with cache (L2ARC) | |
115 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
116 | ||
117 | It is possible to use a dedicated cache drive partition to increase | |
118 | the performance (use SSD). | |
119 | ||
120 | As `<device>` it is possible to use more devices, like it's shown in | |
121 | "Create a new pool with RAID*". | |
122 | ||
123 | .. code-block:: console | |
124 | # zpool create -f -o ashift=12 <pool> <device> cache <cache_device> | |
125 | ||
126 | Create a new pool with log (ZIL) | |
127 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
128 | ||
129 | It is possible to use a dedicated cache drive partition to increase | |
130 | the performance (SSD). | |
131 | ||
132 | As `<device>` it is possible to use more devices, like it's shown in | |
133 | "Create a new pool with RAID*". | |
134 | ||
135 | .. code-block:: console | |
136 | # zpool create -f -o ashift=12 <pool> <device> log <log_device> | |
137 | ||
138 | Add cache and log to an existing pool | |
139 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
140 | ||
141 | If you have a pool without cache and log. First partition the SSD in | |
142 | 2 partition with `parted` or `gdisk` | |
143 | ||
144 | .. important:: Always use GPT partition tables. | |
145 | ||
146 | The maximum size of a log device should be about half the size of | |
147 | physical memory, so this is usually quite small. The rest of the SSD | |
148 | can be used as cache. | |
149 | ||
150 | .. code-block:: console | |
151 | # zpool add -f <pool> log <device-part1> cache <device-part2> | |
152 | ||
153 | ||
154 | Changing a failed device | |
155 | ~~~~~~~~~~~~~~~~~~~~~~~~ | |
156 | ||
157 | .. code-block:: console | |
158 | # zpool replace -f <pool> <old device> <new device> | |
159 | ||
160 | ||
161 | Changing a failed bootable device | |
162 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
163 | ||
164 | Depending on how Proxmox Backup was installed it is either using `grub` or `systemd-boot` | |
165 | as bootloader. | |
166 | ||
167 | The first steps of copying the partition table, reissuing GUIDs and replacing | |
168 | the ZFS partition are the same. To make the system bootable from the new disk, | |
169 | different steps are needed which depend on the bootloader in use. | |
170 | ||
171 | .. code-block:: console | |
172 | # sgdisk <healthy bootable device> -R <new device> | |
173 | # sgdisk -G <new device> | |
174 | # zpool replace -f <pool> <old zfs partition> <new zfs partition> | |
175 | ||
176 | .. NOTE:: Use the `zpool status -v` command to monitor how far the resilvering process of the new disk has progressed. | |
177 | ||
178 | With `systemd-boot`: | |
179 | ||
180 | .. code-block:: console | |
181 | # pve-efiboot-tool format <new disk's ESP> | |
182 | # pve-efiboot-tool init <new disk's ESP> | |
183 | ||
184 | .. NOTE:: `ESP` stands for EFI System Partition, which is setup as partition #2 on | |
185 | bootable disks setup by the {pve} installer since version 5.4. For details, see | |
186 | xref:sysboot_systemd_boot_setup[Setting up a new partition for use as synced ESP]. | |
187 | ||
188 | With `grub`: | |
189 | ||
190 | Usually `grub.cfg` is located in `/boot/grub/grub.cfg` | |
191 | ||
192 | .. code-block:: console | |
193 | # grub-install <new disk> | |
194 | # grub-mkconfig -o /path/to/grub.cfg | |
195 | ||
196 | ||
197 | Activate E-Mail Notification | |
198 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
199 | ||
200 | ZFS comes with an event daemon, which monitors events generated by the | |
201 | ZFS kernel module. The daemon can also send emails on ZFS events like | |
202 | pool errors. Newer ZFS packages ship the daemon in a separate package, | |
203 | and you can install it using `apt-get`: | |
204 | ||
205 | .. code-block:: console | |
206 | # apt-get install zfs-zed | |
207 | ||
208 | To activate the daemon it is necessary to edit `/etc/zfs/zed.d/zed.rc` with your | |
209 | favourite editor, and uncomment the `ZED_EMAIL_ADDR` setting: | |
210 | ||
211 | .. code-block:: console | |
212 | ZED_EMAIL_ADDR="root" | |
213 | ||
214 | Please note Proxmox Backup forwards mails to `root` to the email address | |
215 | configured for the root user. | |
216 | ||
217 | IMPORTANT: The only setting that is required is `ZED_EMAIL_ADDR`. All | |
218 | other settings are optional. | |
219 | ||
220 | Limit ZFS Memory Usage | |
221 | ~~~~~~~~~~~~~~~~~~~~~~ | |
222 | ||
223 | It is good to use at most 50 percent (which is the default) of the | |
224 | system memory for ZFS ARC to prevent performance shortage of the | |
225 | host. Use your preferred editor to change the configuration in | |
226 | `/etc/modprobe.d/zfs.conf` and insert: | |
227 | ||
228 | .. code-block:: console | |
229 | options zfs zfs_arc_max=8589934592 | |
230 | ||
231 | This example setting limits the usage to 8GB. | |
232 | ||
233 | .. IMPORTANT:: If your root file system is ZFS you must update your initramfs every time this value changes: | |
234 | ||
235 | .. code-block:: console | |
236 | # update-initramfs -u | |
237 | ||
238 | ||
239 | SWAP on ZFS | |
240 | ~~~~~~~~~~~ | |
241 | ||
242 | Swap-space created on a zvol may generate some troubles, like blocking the | |
243 | server or generating a high IO load, often seen when starting a Backup | |
244 | to an external Storage. | |
245 | ||
246 | We strongly recommend to use enough memory, so that you normally do not | |
247 | run into low memory situations. Should you need or want to add swap, it is | |
248 | preferred to create a partition on a physical disk and use it as swapdevice. | |
249 | You can leave some space free for this purpose in the advanced options of the | |
250 | installer. Additionally, you can lower the `swappiness` value. | |
251 | A good value for servers is 10: | |
252 | ||
253 | .. code-block:: console | |
254 | # sysctl -w vm.swappiness=10 | |
255 | ||
256 | To make the swappiness persistent, open `/etc/sysctl.conf` with | |
257 | an editor of your choice and add the following line: | |
258 | ||
259 | .. code-block:: console | |
260 | vm.swappiness = 10 | |
261 | ||
262 | .. table:: Linux kernel `swappiness` parameter values | |
263 | :widths:auto | |
264 | ========= ============ | |
265 | Value Strategy | |
266 | ========= ============ | |
267 | vm.swappiness = 0 The kernel will swap only to avoid an 'out of memory' condition | |
268 | vm.swappiness = 1 Minimum amount of swapping without disabling it entirely. | |
269 | vm.swappiness = 10 This value is sometimes recommended to improve performance when sufficient memory exists in a system. | |
270 | vm.swappiness = 60 The default value. | |
271 | vm.swappiness = 100 The kernel will swap aggressively. | |
272 | ========= ============ | |
273 | ||
274 | ZFS Compression | |
275 | ~~~~~~~~~~~~~~~ | |
276 | ||
277 | To activate compression: | |
278 | .. code-block:: console | |
279 | # zpool set compression=lz4 <pool> | |
280 | ||
281 | We recommend using the `lz4` algorithm, since it adds very little CPU overhead. | |
282 | Other algorithms such as `lzjb` and `gzip-N` (where `N` is an integer `1-9` representing | |
283 | the compression ratio, 1 is fastest and 9 is best compression) are also available. | |
284 | Depending on the algorithm and how compressible the data is, having compression enabled can even increase | |
285 | I/O performance. | |
286 | ||
287 | You can disable compression at any time with: | |
288 | .. code-block:: console | |
289 | # zfs set compression=off <dataset> | |
290 | ||
291 | Only new blocks will be affected by this change. | |
292 | ||
293 | ZFS Special Device | |
294 | ~~~~~~~~~~~~~~~~~~ | |
295 | ||
296 | Since version 0.8.0 ZFS supports `special` devices. A `special` device in a | |
297 | pool is used to store metadata, deduplication tables, and optionally small | |
298 | file blocks. | |
299 | ||
300 | A `special` device can improve the speed of a pool consisting of slow spinning | |
301 | hard disks with a lot of metadata changes. For example workloads that involve | |
302 | creating, updating or deleting a large number of files will benefit from the | |
303 | presence of a `special` device. ZFS datasets can also be configured to store | |
304 | whole small files on the `special` device which can further improve the | |
305 | performance. Use fast SSDs for the `special` device. | |
306 | ||
307 | .. IMPORTANT:: The redundancy of the `special` device should match the one of the | |
308 | pool, since the `special` device is a point of failure for the whole pool. | |
309 | ||
310 | .. WARNING:: Adding a `special` device to a pool cannot be undone! | |
311 | ||
312 | Create a pool with `special` device and RAID-1: | |
313 | ||
314 | .. code-block:: console | |
315 | # zpool create -f -o ashift=12 <pool> mirror <device1> <device2> special mirror <device3> <device4> | |
316 | ||
317 | Adding a `special` device to an existing pool with RAID-1: | |
318 | ||
319 | .. code-block:: console | |
320 | # zpool add <pool> special mirror <device1> <device2> | |
321 | ||
322 | ZFS datasets expose the `special_small_blocks=<size>` property. `size` can be | |
323 | `0` to disable storing small file blocks on the `special` device or a power of | |
324 | two in the range between `512B` to `128K`. After setting the property new file | |
325 | blocks smaller than `size` will be allocated on the `special` device. | |
326 | ||
327 | .. IMPORTANT:: If the value for `special_small_blocks` is greater than or equal to | |
328 | the `recordsize` (default `128K`) of the dataset, *all* data will be written to | |
329 | the `special` device, so be careful! | |
330 | ||
331 | Setting the `special_small_blocks` property on a pool will change the default | |
332 | value of that property for all child ZFS datasets (for example all containers | |
333 | in the pool will opt in for small file blocks). | |
334 | ||
335 | Opt in for all file smaller than 4K-blocks pool-wide: | |
336 | ||
337 | .. code-block:: console | |
338 | # zfs set special_small_blocks=4K <pool> | |
339 | ||
340 | Opt in for small file blocks for a single dataset: | |
341 | ||
342 | .. code-block:: console | |
343 | # zfs set special_small_blocks=4K <pool>/<filesystem> | |
344 | ||
345 | Opt out from small file blocks for a single dataset: | |
346 | ||
347 | .. code-block:: console | |
348 | # zfs set special_small_blocks=0 <pool>/<filesystem> | |
349 | ||
350 | Troubleshooting | |
351 | ~~~~~~~~~~~~~~~ | |
352 | ||
353 | Corrupted cachefile | |
354 | ||
355 | In case of a corrupted ZFS cachefile, some volumes may not be mounted during | |
356 | boot until mounted manually later. | |
357 | ||
358 | For each pool, run: | |
359 | ||
360 | .. code-block:: console | |
361 | # zpool set cachefile=/etc/zfs/zpool.cache POOLNAME | |
362 | ||
363 | and afterwards update the `initramfs` by running: | |
364 | ||
365 | .. code-block:: console | |
366 | # update-initramfs -u -k all | |
367 | ||
368 | and finally reboot your node. | |
369 | ||
370 | Sometimes the ZFS cachefile can get corrupted, and `zfs-import-cache.service` | |
371 | doesn't import the pools that aren't present in the cachefile. | |
372 | ||
373 | Another workaround to this problem is enabling the `zfs-import-scan.service`, | |
374 | which searches and imports pools via device scanning (usually slower). |