]>
Commit | Line | Data |
---|---|---|
1 | [[chapter_zfs]] | |
2 | ZFS on Linux | |
3 | ------------ | |
4 | ifdef::wiki[] | |
5 | :pve-toplevel: | |
6 | endif::wiki[] | |
7 | ||
8 | ZFS is a combined file system and logical volume manager designed by | |
9 | Sun Microsystems. Starting with {pve} 3.4, the native Linux | |
10 | kernel port of the ZFS file system is introduced as optional | |
11 | file system and also as an additional selection for the root | |
12 | file system. There is no need for manually compile ZFS modules - all | |
13 | packages are included. | |
14 | ||
15 | By using ZFS, its possible to achieve maximum enterprise features with | |
16 | low budget hardware, but also high performance systems by leveraging | |
17 | SSD caching or even SSD only setups. ZFS can replace cost intense | |
18 | hardware raid cards by moderate CPU and memory load combined with easy | |
19 | management. | |
20 | ||
21 | .General ZFS advantages | |
22 | ||
23 | * Easy configuration and management with {pve} GUI and CLI. | |
24 | ||
25 | * Reliable | |
26 | ||
27 | * Protection against data corruption | |
28 | ||
29 | * Data compression on file system level | |
30 | ||
31 | * Snapshots | |
32 | ||
33 | * Copy-on-write clone | |
34 | ||
35 | * Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3 | |
36 | ||
37 | * Can use SSD for cache | |
38 | ||
39 | * Self healing | |
40 | ||
41 | * Continuous integrity checking | |
42 | ||
43 | * Designed for high storage capacities | |
44 | ||
45 | * Protection against data corruption | |
46 | ||
47 | * Asynchronous replication over network | |
48 | ||
49 | * Open Source | |
50 | ||
51 | * Encryption | |
52 | ||
53 | * ... | |
54 | ||
55 | ||
56 | Hardware | |
57 | ~~~~~~~~ | |
58 | ||
59 | ZFS depends heavily on memory, so you need at least 8GB to start. In | |
60 | practice, use as much you can get for your hardware/budget. To prevent | |
61 | data corruption, we recommend the use of high quality ECC RAM. | |
62 | ||
63 | If you use a dedicated cache and/or log disk, you should use an | |
64 | enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can | |
65 | increase the overall performance significantly. | |
66 | ||
67 | IMPORTANT: Do not use ZFS on top of hardware controller which has its | |
68 | own cache management. ZFS needs to directly communicate with disks. An | |
69 | HBA adapter is the way to go, or something like LSI controller flashed | |
70 | in ``IT'' mode. | |
71 | ||
72 | If you are experimenting with an installation of {pve} inside a VM | |
73 | (Nested Virtualization), don't use `virtio` for disks of that VM, | |
74 | since they are not supported by ZFS. Use IDE or SCSI instead (works | |
75 | also with `virtio` SCSI controller type). | |
76 | ||
77 | ||
78 | Installation as Root File System | |
79 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
80 | ||
81 | When you install using the {pve} installer, you can choose ZFS for the | |
82 | root file system. You need to select the RAID type at installation | |
83 | time: | |
84 | ||
85 | [horizontal] | |
86 | RAID0:: Also called ``striping''. The capacity of such volume is the sum | |
87 | of the capacities of all disks. But RAID0 does not add any redundancy, | |
88 | so the failure of a single drive makes the volume unusable. | |
89 | ||
90 | RAID1:: Also called ``mirroring''. Data is written identically to all | |
91 | disks. This mode requires at least 2 disks with the same size. The | |
92 | resulting capacity is that of a single disk. | |
93 | ||
94 | RAID10:: A combination of RAID0 and RAID1. Requires at least 4 disks. | |
95 | ||
96 | RAIDZ-1:: A variation on RAID-5, single parity. Requires at least 3 disks. | |
97 | ||
98 | RAIDZ-2:: A variation on RAID-5, double parity. Requires at least 4 disks. | |
99 | ||
100 | RAIDZ-3:: A variation on RAID-5, triple parity. Requires at least 5 disks. | |
101 | ||
102 | The installer automatically partitions the disks, creates a ZFS pool | |
103 | called `rpool`, and installs the root file system on the ZFS subvolume | |
104 | `rpool/ROOT/pve-1`. | |
105 | ||
106 | Another subvolume called `rpool/data` is created to store VM | |
107 | images. In order to use that with the {pve} tools, the installer | |
108 | creates the following configuration entry in `/etc/pve/storage.cfg`: | |
109 | ||
110 | ---- | |
111 | zfspool: local-zfs | |
112 | pool rpool/data | |
113 | sparse | |
114 | content images,rootdir | |
115 | ---- | |
116 | ||
117 | After installation, you can view your ZFS pool status using the | |
118 | `zpool` command: | |
119 | ||
120 | ---- | |
121 | # zpool status | |
122 | pool: rpool | |
123 | state: ONLINE | |
124 | scan: none requested | |
125 | config: | |
126 | ||
127 | NAME STATE READ WRITE CKSUM | |
128 | rpool ONLINE 0 0 0 | |
129 | mirror-0 ONLINE 0 0 0 | |
130 | sda2 ONLINE 0 0 0 | |
131 | sdb2 ONLINE 0 0 0 | |
132 | mirror-1 ONLINE 0 0 0 | |
133 | sdc ONLINE 0 0 0 | |
134 | sdd ONLINE 0 0 0 | |
135 | ||
136 | errors: No known data errors | |
137 | ---- | |
138 | ||
139 | The `zfs` command is used configure and manage your ZFS file | |
140 | systems. The following command lists all file systems after | |
141 | installation: | |
142 | ||
143 | ---- | |
144 | # zfs list | |
145 | NAME USED AVAIL REFER MOUNTPOINT | |
146 | rpool 4.94G 7.68T 96K /rpool | |
147 | rpool/ROOT 702M 7.68T 96K /rpool/ROOT | |
148 | rpool/ROOT/pve-1 702M 7.68T 702M / | |
149 | rpool/data 96K 7.68T 96K /rpool/data | |
150 | rpool/swap 4.25G 7.69T 64K - | |
151 | ---- | |
152 | ||
153 | ||
154 | Bootloader | |
155 | ~~~~~~~~~~ | |
156 | ||
157 | Depending on whether the system is booted in EFI or legacy BIOS mode the | |
158 | {pve} installer sets up either `grub` or `systemd-boot` as main bootloader. | |
159 | See the chapter on xref:sysboot[{pve} host bootladers] for details. | |
160 | ||
161 | ||
162 | ZFS Administration | |
163 | ~~~~~~~~~~~~~~~~~~ | |
164 | ||
165 | This section gives you some usage examples for common tasks. ZFS | |
166 | itself is really powerful and provides many options. The main commands | |
167 | to manage ZFS are `zfs` and `zpool`. Both commands come with great | |
168 | manual pages, which can be read with: | |
169 | ||
170 | ---- | |
171 | # man zpool | |
172 | # man zfs | |
173 | ----- | |
174 | ||
175 | .Create a new zpool | |
176 | ||
177 | To create a new pool, at least one disk is needed. The `ashift` should | |
178 | have the same sector-size (2 power of `ashift`) or larger as the | |
179 | underlying disk. | |
180 | ||
181 | zpool create -f -o ashift=12 <pool> <device> | |
182 | ||
183 | To activate compression | |
184 | ||
185 | zfs set compression=lz4 <pool> | |
186 | ||
187 | .Create a new pool with RAID-0 | |
188 | ||
189 | Minimum 1 Disk | |
190 | ||
191 | zpool create -f -o ashift=12 <pool> <device1> <device2> | |
192 | ||
193 | .Create a new pool with RAID-1 | |
194 | ||
195 | Minimum 2 Disks | |
196 | ||
197 | zpool create -f -o ashift=12 <pool> mirror <device1> <device2> | |
198 | ||
199 | .Create a new pool with RAID-10 | |
200 | ||
201 | Minimum 4 Disks | |
202 | ||
203 | zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4> | |
204 | ||
205 | .Create a new pool with RAIDZ-1 | |
206 | ||
207 | Minimum 3 Disks | |
208 | ||
209 | zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3> | |
210 | ||
211 | .Create a new pool with RAIDZ-2 | |
212 | ||
213 | Minimum 4 Disks | |
214 | ||
215 | zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4> | |
216 | ||
217 | .Create a new pool with cache (L2ARC) | |
218 | ||
219 | It is possible to use a dedicated cache drive partition to increase | |
220 | the performance (use SSD). | |
221 | ||
222 | As `<device>` it is possible to use more devices, like it's shown in | |
223 | "Create a new pool with RAID*". | |
224 | ||
225 | zpool create -f -o ashift=12 <pool> <device> cache <cache_device> | |
226 | ||
227 | .Create a new pool with log (ZIL) | |
228 | ||
229 | It is possible to use a dedicated cache drive partition to increase | |
230 | the performance(SSD). | |
231 | ||
232 | As `<device>` it is possible to use more devices, like it's shown in | |
233 | "Create a new pool with RAID*". | |
234 | ||
235 | zpool create -f -o ashift=12 <pool> <device> log <log_device> | |
236 | ||
237 | .Add cache and log to an existing pool | |
238 | ||
239 | If you have an pool without cache and log. First partition the SSD in | |
240 | 2 partition with `parted` or `gdisk` | |
241 | ||
242 | IMPORTANT: Always use GPT partition tables. | |
243 | ||
244 | The maximum size of a log device should be about half the size of | |
245 | physical memory, so this is usually quite small. The rest of the SSD | |
246 | can be used as cache. | |
247 | ||
248 | zpool add -f <pool> log <device-part1> cache <device-part2> | |
249 | ||
250 | .Changing a failed device | |
251 | ||
252 | zpool replace -f <pool> <old device> <new device> | |
253 | ||
254 | .Changing a failed bootable device when using systemd-boot | |
255 | ||
256 | sgdisk <healthy bootable device> -R <new device> | |
257 | sgdisk -G <new device> | |
258 | zpool replace -f <pool> <old zfs partition> <new zfs partition> | |
259 | ||
260 | ||
261 | Activate E-Mail Notification | |
262 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
263 | ||
264 | ZFS comes with an event daemon, which monitors events generated by the | |
265 | ZFS kernel module. The daemon can also send emails on ZFS events like | |
266 | pool errors. Newer ZFS packages ships the daemon in a separate package, | |
267 | and you can install it using `apt-get`: | |
268 | ||
269 | ---- | |
270 | # apt-get install zfs-zed | |
271 | ---- | |
272 | ||
273 | To activate the daemon it is necessary to edit `/etc/zfs/zed.d/zed.rc` with your | |
274 | favourite editor, and uncomment the `ZED_EMAIL_ADDR` setting: | |
275 | ||
276 | -------- | |
277 | ZED_EMAIL_ADDR="root" | |
278 | -------- | |
279 | ||
280 | Please note {pve} forwards mails to `root` to the email address | |
281 | configured for the root user. | |
282 | ||
283 | IMPORTANT: The only setting that is required is `ZED_EMAIL_ADDR`. All | |
284 | other settings are optional. | |
285 | ||
286 | ||
287 | Limit ZFS Memory Usage | |
288 | ~~~~~~~~~~~~~~~~~~~~~~ | |
289 | ||
290 | It is good to use at most 50 percent (which is the default) of the | |
291 | system memory for ZFS ARC to prevent performance shortage of the | |
292 | host. Use your preferred editor to change the configuration in | |
293 | `/etc/modprobe.d/zfs.conf` and insert: | |
294 | ||
295 | -------- | |
296 | options zfs zfs_arc_max=8589934592 | |
297 | -------- | |
298 | ||
299 | This example setting limits the usage to 8GB. | |
300 | ||
301 | [IMPORTANT] | |
302 | ==== | |
303 | If your root file system is ZFS you must update your initramfs every | |
304 | time this value changes: | |
305 | ||
306 | update-initramfs -u | |
307 | ==== | |
308 | ||
309 | ||
310 | [[zfs_swap]] | |
311 | .SWAP on ZFS | |
312 | ||
313 | Swap-space created on a zvol may generate some troubles, like blocking the | |
314 | server or generating a high IO load, often seen when starting a Backup | |
315 | to an external Storage. | |
316 | ||
317 | We strongly recommend to use enough memory, so that you normally do not | |
318 | run into low memory situations. Should you need or want to add swap, it is | |
319 | preferred to create a partition on a physical disk and use it as swapdevice. | |
320 | You can leave some space free for this purpose in the advanced options of the | |
321 | installer. Additionally, you can lower the | |
322 | ``swappiness'' value. A good value for servers is 10: | |
323 | ||
324 | sysctl -w vm.swappiness=10 | |
325 | ||
326 | To make the swappiness persistent, open `/etc/sysctl.conf` with | |
327 | an editor of your choice and add the following line: | |
328 | ||
329 | -------- | |
330 | vm.swappiness = 10 | |
331 | -------- | |
332 | ||
333 | .Linux kernel `swappiness` parameter values | |
334 | [width="100%",cols="<m,2d",options="header"] | |
335 | |=========================================================== | |
336 | | Value | Strategy | |
337 | | vm.swappiness = 0 | The kernel will swap only to avoid | |
338 | an 'out of memory' condition | |
339 | | vm.swappiness = 1 | Minimum amount of swapping without | |
340 | disabling it entirely. | |
341 | | vm.swappiness = 10 | This value is sometimes recommended to | |
342 | improve performance when sufficient memory exists in a system. | |
343 | | vm.swappiness = 60 | The default value. | |
344 | | vm.swappiness = 100 | The kernel will swap aggressively. | |
345 | |=========================================================== |