]>
Commit | Line | Data |
---|---|---|
9ee94323 DM |
1 | ZFS on Linux |
2 | ------------ | |
5f09af76 DM |
3 | ifdef::wiki[] |
4 | :pve-toplevel: | |
5 | endif::wiki[] | |
6 | ||
9ee94323 DM |
7 | ZFS is a combined file system and logical volume manager designed by |
8 | Sun Microsystems. Starting with {pve} 3.4, the native Linux | |
9 | kernel port of the ZFS file system is introduced as optional | |
5eba0743 FG |
10 | file system and also as an additional selection for the root |
11 | file system. There is no need for manually compile ZFS modules - all | |
9ee94323 DM |
12 | packages are included. |
13 | ||
5eba0743 | 14 | By using ZFS, its possible to achieve maximum enterprise features with |
9ee94323 DM |
15 | low budget hardware, but also high performance systems by leveraging |
16 | SSD caching or even SSD only setups. ZFS can replace cost intense | |
17 | hardware raid cards by moderate CPU and memory load combined with easy | |
18 | management. | |
19 | ||
20 | .General ZFS advantages | |
21 | ||
22 | * Easy configuration and management with {pve} GUI and CLI. | |
23 | ||
24 | * Reliable | |
25 | ||
26 | * Protection against data corruption | |
27 | ||
5eba0743 | 28 | * Data compression on file system level |
9ee94323 DM |
29 | |
30 | * Snapshots | |
31 | ||
32 | * Copy-on-write clone | |
33 | ||
34 | * Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3 | |
35 | ||
36 | * Can use SSD for cache | |
37 | ||
38 | * Self healing | |
39 | ||
40 | * Continuous integrity checking | |
41 | ||
42 | * Designed for high storage capacities | |
43 | ||
44 | * Protection against data corruption | |
45 | ||
46 | * Asynchronous replication over network | |
47 | ||
48 | * Open Source | |
49 | ||
50 | * Encryption | |
51 | ||
52 | * ... | |
53 | ||
54 | ||
55 | Hardware | |
56 | ~~~~~~~~ | |
57 | ||
58 | ZFS depends heavily on memory, so you need at least 8GB to start. In | |
59 | practice, use as much you can get for your hardware/budget. To prevent | |
60 | data corruption, we recommend the use of high quality ECC RAM. | |
61 | ||
62 | If you use a dedicated cache and/or log disk, you should use a | |
63 | enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can | |
64 | increase the overall performance significantly. | |
65 | ||
5eba0743 | 66 | IMPORTANT: Do not use ZFS on top of hardware controller which has its |
9ee94323 DM |
67 | own cache management. ZFS needs to directly communicate with disks. An |
68 | HBA adapter is the way to go, or something like LSI controller flashed | |
8c1189b6 | 69 | in ``IT'' mode. |
9ee94323 DM |
70 | |
71 | If you are experimenting with an installation of {pve} inside a VM | |
8c1189b6 | 72 | (Nested Virtualization), don't use `virtio` for disks of that VM, |
9ee94323 | 73 | since they are not supported by ZFS. Use IDE or SCSI instead (works |
8c1189b6 | 74 | also with `virtio` SCSI controller type). |
9ee94323 DM |
75 | |
76 | ||
5eba0743 | 77 | Installation as Root File System |
9ee94323 DM |
78 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
79 | ||
80 | When you install using the {pve} installer, you can choose ZFS for the | |
81 | root file system. You need to select the RAID type at installation | |
82 | time: | |
83 | ||
84 | [horizontal] | |
8c1189b6 FG |
85 | RAID0:: Also called ``striping''. The capacity of such volume is the sum |
86 | of the capacities of all disks. But RAID0 does not add any redundancy, | |
9ee94323 DM |
87 | so the failure of a single drive makes the volume unusable. |
88 | ||
8c1189b6 | 89 | RAID1:: Also called ``mirroring''. Data is written identically to all |
9ee94323 DM |
90 | disks. This mode requires at least 2 disks with the same size. The |
91 | resulting capacity is that of a single disk. | |
92 | ||
93 | RAID10:: A combination of RAID0 and RAID1. Requires at least 4 disks. | |
94 | ||
95 | RAIDZ-1:: A variation on RAID-5, single parity. Requires at least 3 disks. | |
96 | ||
97 | RAIDZ-2:: A variation on RAID-5, double parity. Requires at least 4 disks. | |
98 | ||
99 | RAIDZ-3:: A variation on RAID-5, triple parity. Requires at least 5 disks. | |
100 | ||
101 | The installer automatically partitions the disks, creates a ZFS pool | |
8c1189b6 FG |
102 | called `rpool`, and installs the root file system on the ZFS subvolume |
103 | `rpool/ROOT/pve-1`. | |
9ee94323 | 104 | |
8c1189b6 | 105 | Another subvolume called `rpool/data` is created to store VM |
9ee94323 | 106 | images. In order to use that with the {pve} tools, the installer |
8c1189b6 | 107 | creates the following configuration entry in `/etc/pve/storage.cfg`: |
9ee94323 DM |
108 | |
109 | ---- | |
110 | zfspool: local-zfs | |
111 | pool rpool/data | |
112 | sparse | |
113 | content images,rootdir | |
114 | ---- | |
115 | ||
116 | After installation, you can view your ZFS pool status using the | |
8c1189b6 | 117 | `zpool` command: |
9ee94323 DM |
118 | |
119 | ---- | |
120 | # zpool status | |
121 | pool: rpool | |
122 | state: ONLINE | |
123 | scan: none requested | |
124 | config: | |
125 | ||
126 | NAME STATE READ WRITE CKSUM | |
127 | rpool ONLINE 0 0 0 | |
128 | mirror-0 ONLINE 0 0 0 | |
129 | sda2 ONLINE 0 0 0 | |
130 | sdb2 ONLINE 0 0 0 | |
131 | mirror-1 ONLINE 0 0 0 | |
132 | sdc ONLINE 0 0 0 | |
133 | sdd ONLINE 0 0 0 | |
134 | ||
135 | errors: No known data errors | |
136 | ---- | |
137 | ||
8c1189b6 | 138 | The `zfs` command is used configure and manage your ZFS file |
9ee94323 DM |
139 | systems. The following command lists all file systems after |
140 | installation: | |
141 | ||
142 | ---- | |
143 | # zfs list | |
144 | NAME USED AVAIL REFER MOUNTPOINT | |
145 | rpool 4.94G 7.68T 96K /rpool | |
146 | rpool/ROOT 702M 7.68T 96K /rpool/ROOT | |
147 | rpool/ROOT/pve-1 702M 7.68T 702M / | |
148 | rpool/data 96K 7.68T 96K /rpool/data | |
149 | rpool/swap 4.25G 7.69T 64K - | |
150 | ---- | |
151 | ||
152 | ||
153 | Bootloader | |
154 | ~~~~~~~~~~ | |
155 | ||
156 | The default ZFS disk partitioning scheme does not use the first 2048 | |
157 | sectors. This gives enough room to install a GRUB boot partition. The | |
158 | {pve} installer automatically allocates that space, and installs the | |
159 | GRUB boot loader there. If you use a redundant RAID setup, it installs | |
160 | the boot loader on all disk required for booting. So you can boot | |
161 | even if some disks fail. | |
162 | ||
e300cf7d | 163 | NOTE: It is not possible to use ZFS as root file system with UEFI |
9ee94323 DM |
164 | boot. |
165 | ||
166 | ||
167 | ZFS Administration | |
168 | ~~~~~~~~~~~~~~~~~~ | |
169 | ||
170 | This section gives you some usage examples for common tasks. ZFS | |
171 | itself is really powerful and provides many options. The main commands | |
8c1189b6 FG |
172 | to manage ZFS are `zfs` and `zpool`. Both commands come with great |
173 | manual pages, which can be read with: | |
9ee94323 DM |
174 | |
175 | ---- | |
176 | # man zpool | |
177 | # man zfs | |
178 | ----- | |
179 | ||
5eba0743 | 180 | .Create a new zpool |
9ee94323 | 181 | |
8c1189b6 FG |
182 | To create a new pool, at least one disk is needed. The `ashift` should |
183 | have the same sector-size (2 power of `ashift`) or larger as the | |
9ee94323 DM |
184 | underlying disk. |
185 | ||
186 | zpool create -f -o ashift=12 <pool> <device> | |
187 | ||
5eba0743 | 188 | To activate compression |
9ee94323 DM |
189 | |
190 | zfs set compression=lz4 <pool> | |
191 | ||
192 | .Create a new pool with RAID-0 | |
193 | ||
194 | Minimum 1 Disk | |
195 | ||
196 | zpool create -f -o ashift=12 <pool> <device1> <device2> | |
197 | ||
198 | .Create a new pool with RAID-1 | |
199 | ||
200 | Minimum 2 Disks | |
201 | ||
202 | zpool create -f -o ashift=12 <pool> mirror <device1> <device2> | |
203 | ||
204 | .Create a new pool with RAID-10 | |
205 | ||
206 | Minimum 4 Disks | |
207 | ||
208 | zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4> | |
209 | ||
210 | .Create a new pool with RAIDZ-1 | |
211 | ||
212 | Minimum 3 Disks | |
213 | ||
214 | zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3> | |
215 | ||
216 | .Create a new pool with RAIDZ-2 | |
217 | ||
218 | Minimum 4 Disks | |
219 | ||
220 | zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4> | |
221 | ||
5eba0743 | 222 | .Create a new pool with cache (L2ARC) |
9ee94323 DM |
223 | |
224 | It is possible to use a dedicated cache drive partition to increase | |
225 | the performance (use SSD). | |
226 | ||
8c1189b6 | 227 | As `<device>` it is possible to use more devices, like it's shown in |
9ee94323 DM |
228 | "Create a new pool with RAID*". |
229 | ||
230 | zpool create -f -o ashift=12 <pool> <device> cache <cache_device> | |
231 | ||
5eba0743 | 232 | .Create a new pool with log (ZIL) |
9ee94323 DM |
233 | |
234 | It is possible to use a dedicated cache drive partition to increase | |
235 | the performance(SSD). | |
236 | ||
8c1189b6 | 237 | As `<device>` it is possible to use more devices, like it's shown in |
9ee94323 DM |
238 | "Create a new pool with RAID*". |
239 | ||
240 | zpool create -f -o ashift=12 <pool> <device> log <log_device> | |
241 | ||
5eba0743 | 242 | .Add cache and log to an existing pool |
9ee94323 DM |
243 | |
244 | If you have an pool without cache and log. First partition the SSD in | |
8c1189b6 | 245 | 2 partition with `parted` or `gdisk` |
9ee94323 | 246 | |
e300cf7d | 247 | IMPORTANT: Always use GPT partition tables. |
9ee94323 DM |
248 | |
249 | The maximum size of a log device should be about half the size of | |
250 | physical memory, so this is usually quite small. The rest of the SSD | |
5eba0743 | 251 | can be used as cache. |
9ee94323 DM |
252 | |
253 | zpool add -f <pool> log <device-part1> cache <device-part2> | |
254 | ||
5eba0743 | 255 | .Changing a failed device |
9ee94323 DM |
256 | |
257 | zpool replace -f <pool> <old device> <new-device> | |
258 | ||
259 | ||
260 | Activate E-Mail Notification | |
261 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
262 | ||
263 | ZFS comes with an event daemon, which monitors events generated by the | |
5eba0743 | 264 | ZFS kernel module. The daemon can also send emails on ZFS events like |
e280a948 DM |
265 | pool errors. Newer ZFS packages ships the daemon in a sparate package, |
266 | and you can install it using `apt-get`: | |
267 | ||
268 | ---- | |
269 | # apt-get install zfs-zed | |
270 | ---- | |
9ee94323 | 271 | |
8c1189b6 FG |
272 | To activate the daemon it is necessary to edit `/etc/zfs/zed.d/zed.rc` with your |
273 | favourite editor, and uncomment the `ZED_EMAIL_ADDR` setting: | |
9ee94323 | 274 | |
083adc34 | 275 | -------- |
9ee94323 | 276 | ZED_EMAIL_ADDR="root" |
083adc34 | 277 | -------- |
9ee94323 | 278 | |
8c1189b6 | 279 | Please note {pve} forwards mails to `root` to the email address |
9ee94323 DM |
280 | configured for the root user. |
281 | ||
8c1189b6 | 282 | IMPORTANT: The only setting that is required is `ZED_EMAIL_ADDR`. All |
9ee94323 DM |
283 | other settings are optional. |
284 | ||
285 | ||
5eba0743 | 286 | Limit ZFS Memory Usage |
9ee94323 DM |
287 | ~~~~~~~~~~~~~~~~~~~~~~ |
288 | ||
5eba0743 | 289 | It is good to use at most 50 percent (which is the default) of the |
d362b7f4 DM |
290 | system memory for ZFS ARC to prevent performance shortage of the |
291 | host. Use your preferred editor to change the configuration in | |
8c1189b6 | 292 | `/etc/modprobe.d/zfs.conf` and insert: |
9ee94323 | 293 | |
5eba0743 FG |
294 | -------- |
295 | options zfs zfs_arc_max=8589934592 | |
296 | -------- | |
9ee94323 DM |
297 | |
298 | This example setting limits the usage to 8GB. | |
299 | ||
300 | [IMPORTANT] | |
301 | ==== | |
5eba0743 FG |
302 | If your root file system is ZFS you must update your initramfs every |
303 | time this value changes: | |
9ee94323 DM |
304 | |
305 | update-initramfs -u | |
306 | ==== | |
307 | ||
308 | ||
309 | .SWAP on ZFS | |
310 | ||
311 | SWAP on ZFS on Linux may generate some troubles, like blocking the | |
312 | server or generating a high IO load, often seen when starting a Backup | |
313 | to an external Storage. | |
314 | ||
315 | We strongly recommend to use enough memory, so that you normally do not | |
316 | run into low memory situations. Additionally, you can lower the | |
8c1189b6 | 317 | ``swappiness'' value. A good value for servers is 10: |
9ee94323 DM |
318 | |
319 | sysctl -w vm.swappiness=10 | |
320 | ||
8c1189b6 | 321 | To make the swappiness persistent, open `/etc/sysctl.conf` with |
9ee94323 DM |
322 | an editor of your choice and add the following line: |
323 | ||
083adc34 FG |
324 | -------- |
325 | vm.swappiness = 10 | |
326 | -------- | |
9ee94323 | 327 | |
8c1189b6 | 328 | .Linux kernel `swappiness` parameter values |
9ee94323 DM |
329 | [width="100%",cols="<m,2d",options="header"] |
330 | |=========================================================== | |
331 | | Value | Strategy | |
332 | | vm.swappiness = 0 | The kernel will swap only to avoid | |
333 | an 'out of memory' condition | |
334 | | vm.swappiness = 1 | Minimum amount of swapping without | |
335 | disabling it entirely. | |
336 | | vm.swappiness = 10 | This value is sometimes recommended to | |
337 | improve performance when sufficient memory exists in a system. | |
338 | | vm.swappiness = 60 | The default value. | |
339 | | vm.swappiness = 100 | The kernel will swap aggressively. | |
340 | |=========================================================== |