Commit | Line | Data |
---|---|---|
9ee94323 DM |
1 | ZFS on Linux |
2 | ------------ | |
3 | include::attributes.txt[] | |
4 | ||
5 | ZFS is a combined file system and logical volume manager designed by | |
6 | Sun Microsystems. Starting with {pve} 3.4, the native Linux | |
7 | kernel port of the ZFS file system is introduced as optional | |
8 | file-system and also as an additional selection for the root | |
9 | file-system. There is no need for manually compile ZFS modules - all | |
10 | packages are included. | |
11 | ||
12 | By using ZFS, its possible to achieve maximal enterprise features with | |
13 | low budget hardware, but also high performance systems by leveraging | |
14 | SSD caching or even SSD only setups. ZFS can replace cost intense | |
15 | hardware raid cards by moderate CPU and memory load combined with easy | |
16 | management. | |
17 | ||
18 | .General ZFS advantages | |
19 | ||
20 | * Easy configuration and management with {pve} GUI and CLI. | |
21 | ||
22 | * Reliable | |
23 | ||
24 | * Protection against data corruption | |
25 | ||
26 | * Data compression on file-system level | |
27 | ||
28 | * Snapshots | |
29 | ||
30 | * Copy-on-write clone | |
31 | ||
32 | * Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3 | |
33 | ||
34 | * Can use SSD for cache | |
35 | ||
36 | * Self healing | |
37 | ||
38 | * Continuous integrity checking | |
39 | ||
40 | * Designed for high storage capacities | |
41 | ||
42 | * Protection against data corruption | |
43 | ||
44 | * Asynchronous replication over network | |
45 | ||
46 | * Open Source | |
47 | ||
48 | * Encryption | |
49 | ||
50 | * ... | |
51 | ||
52 | ||
53 | Hardware | |
54 | ~~~~~~~~ | |
55 | ||
56 | ZFS depends heavily on memory, so you need at least 8GB to start. In | |
57 | practice, use as much you can get for your hardware/budget. To prevent | |
58 | data corruption, we recommend the use of high quality ECC RAM. | |
59 | ||
60 | If you use a dedicated cache and/or log disk, you should use a | |
61 | enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can | |
62 | increase the overall performance significantly. | |
63 | ||
64 | IMPORTANT: Do not use ZFS on top of hardware controller which has it's | |
65 | own cache management. ZFS needs to directly communicate with disks. An | |
66 | HBA adapter is the way to go, or something like LSI controller flashed | |
67 | in 'IT' mode. | |
68 | ||
69 | If you are experimenting with an installation of {pve} inside a VM | |
70 | (Nested Virtualization), don't use 'virtio' for disks of that VM, | |
71 | since they are not supported by ZFS. Use IDE or SCSI instead (works | |
72 | also with 'virtio' SCSI controller type). | |
73 | ||
74 | ||
75 | Installation as root file system | |
76 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
77 | ||
78 | When you install using the {pve} installer, you can choose ZFS for the | |
79 | root file system. You need to select the RAID type at installation | |
80 | time: | |
81 | ||
82 | [horizontal] | |
83 | RAID0:: Also called 'striping'. The capacity of such volume is the sum | |
84 | of the capacity of all disks. But RAID0 does not add any redundancy, | |
85 | so the failure of a single drive makes the volume unusable. | |
86 | ||
87 | RAID1:: Also called mirroring. Data is written identically to all | |
88 | disks. This mode requires at least 2 disks with the same size. The | |
89 | resulting capacity is that of a single disk. | |
90 | ||
91 | RAID10:: A combination of RAID0 and RAID1. Requires at least 4 disks. | |
92 | ||
93 | RAIDZ-1:: A variation on RAID-5, single parity. Requires at least 3 disks. | |
94 | ||
95 | RAIDZ-2:: A variation on RAID-5, double parity. Requires at least 4 disks. | |
96 | ||
97 | RAIDZ-3:: A variation on RAID-5, triple parity. Requires at least 5 disks. | |
98 | ||
99 | The installer automatically partitions the disks, creates a ZFS pool | |
100 | called 'rpool', and installs the root file system on the ZFS subvolume | |
101 | 'rpool/ROOT/pve-1'. | |
102 | ||
103 | Another subvolume called 'rpool/data' is created to store VM | |
104 | images. In order to use that with the {pve} tools, the installer | |
105 | creates the following configuration entry in '/etc/pve/storage.cfg': | |
106 | ||
107 | ---- | |
108 | zfspool: local-zfs | |
109 | pool rpool/data | |
110 | sparse | |
111 | content images,rootdir | |
112 | ---- | |
113 | ||
114 | After installation, you can view your ZFS pool status using the | |
115 | 'zpool' command: | |
116 | ||
117 | ---- | |
118 | # zpool status | |
119 | pool: rpool | |
120 | state: ONLINE | |
121 | scan: none requested | |
122 | config: | |
123 | ||
124 | NAME STATE READ WRITE CKSUM | |
125 | rpool ONLINE 0 0 0 | |
126 | mirror-0 ONLINE 0 0 0 | |
127 | sda2 ONLINE 0 0 0 | |
128 | sdb2 ONLINE 0 0 0 | |
129 | mirror-1 ONLINE 0 0 0 | |
130 | sdc ONLINE 0 0 0 | |
131 | sdd ONLINE 0 0 0 | |
132 | ||
133 | errors: No known data errors | |
134 | ---- | |
135 | ||
136 | The 'zfs' command is used configure and manage your ZFS file | |
137 | systems. The following command lists all file systems after | |
138 | installation: | |
139 | ||
140 | ---- | |
141 | # zfs list | |
142 | NAME USED AVAIL REFER MOUNTPOINT | |
143 | rpool 4.94G 7.68T 96K /rpool | |
144 | rpool/ROOT 702M 7.68T 96K /rpool/ROOT | |
145 | rpool/ROOT/pve-1 702M 7.68T 702M / | |
146 | rpool/data 96K 7.68T 96K /rpool/data | |
147 | rpool/swap 4.25G 7.69T 64K - | |
148 | ---- | |
149 | ||
150 | ||
151 | Bootloader | |
152 | ~~~~~~~~~~ | |
153 | ||
154 | The default ZFS disk partitioning scheme does not use the first 2048 | |
155 | sectors. This gives enough room to install a GRUB boot partition. The | |
156 | {pve} installer automatically allocates that space, and installs the | |
157 | GRUB boot loader there. If you use a redundant RAID setup, it installs | |
158 | the boot loader on all disk required for booting. So you can boot | |
159 | even if some disks fail. | |
160 | ||
161 | NOTE: It is not possible to use ZFS as root partition with UEFI | |
162 | boot. | |
163 | ||
164 | ||
165 | ZFS Administration | |
166 | ~~~~~~~~~~~~~~~~~~ | |
167 | ||
168 | This section gives you some usage examples for common tasks. ZFS | |
169 | itself is really powerful and provides many options. The main commands | |
170 | to manage ZFS are 'zfs' and 'zpool'. Both commands comes with great | |
171 | manual pages, worth to read: | |
172 | ||
173 | ---- | |
174 | # man zpool | |
175 | # man zfs | |
176 | ----- | |
177 | ||
178 | .Create a new ZPool | |
179 | ||
180 | To create a new pool, at least one disk is needed. The 'ashift' should | |
181 | have the same sector-size (2 power of 'ashift') or larger as the | |
182 | underlying disk. | |
183 | ||
184 | zpool create -f -o ashift=12 <pool> <device> | |
185 | ||
186 | To activate the compression | |
187 | ||
188 | zfs set compression=lz4 <pool> | |
189 | ||
190 | .Create a new pool with RAID-0 | |
191 | ||
192 | Minimum 1 Disk | |
193 | ||
194 | zpool create -f -o ashift=12 <pool> <device1> <device2> | |
195 | ||
196 | .Create a new pool with RAID-1 | |
197 | ||
198 | Minimum 2 Disks | |
199 | ||
200 | zpool create -f -o ashift=12 <pool> mirror <device1> <device2> | |
201 | ||
202 | .Create a new pool with RAID-10 | |
203 | ||
204 | Minimum 4 Disks | |
205 | ||
206 | zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4> | |
207 | ||
208 | .Create a new pool with RAIDZ-1 | |
209 | ||
210 | Minimum 3 Disks | |
211 | ||
212 | zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3> | |
213 | ||
214 | .Create a new pool with RAIDZ-2 | |
215 | ||
216 | Minimum 4 Disks | |
217 | ||
218 | zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4> | |
219 | ||
220 | .Create a new pool with Cache (L2ARC) | |
221 | ||
222 | It is possible to use a dedicated cache drive partition to increase | |
223 | the performance (use SSD). | |
224 | ||
225 | As '<device>' it is possible to use more devices, like it's shown in | |
226 | "Create a new pool with RAID*". | |
227 | ||
228 | zpool create -f -o ashift=12 <pool> <device> cache <cache_device> | |
229 | ||
230 | .Create a new pool with Log (ZIL) | |
231 | ||
232 | It is possible to use a dedicated cache drive partition to increase | |
233 | the performance(SSD). | |
234 | ||
235 | As '<device>' it is possible to use more devices, like it's shown in | |
236 | "Create a new pool with RAID*". | |
237 | ||
238 | zpool create -f -o ashift=12 <pool> <device> log <log_device> | |
239 | ||
240 | .Add Cache and Log to an existing pool | |
241 | ||
242 | If you have an pool without cache and log. First partition the SSD in | |
243 | 2 partition with parted or gdisk | |
244 | ||
245 | IMPORTANT: Always use GPT partition tables (gdisk or parted). | |
246 | ||
247 | The maximum size of a log device should be about half the size of | |
248 | physical memory, so this is usually quite small. The rest of the SSD | |
249 | can be used to the cache. | |
250 | ||
251 | zpool add -f <pool> log <device-part1> cache <device-part2> | |
252 | ||
253 | .Changing a failed Device | |
254 | ||
255 | zpool replace -f <pool> <old device> <new-device> | |
256 | ||
257 | ||
258 | Activate E-Mail Notification | |
259 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
260 | ||
261 | ZFS comes with an event daemon, which monitors events generated by the | |
262 | ZFS kernel module. The daemon can also send E-Mails on ZFS event like | |
263 | pool errors. | |
264 | ||
265 | To activate the daemon it is necessary to edit /etc/zfs/zed.d/zed.rc with your favored editor, and uncomment the 'ZED_EMAIL_ADDR' setting: | |
266 | ||
267 | ZED_EMAIL_ADDR="root" | |
268 | ||
269 | Please note {pve} forwards mails to 'root' to the email address | |
270 | configured for the root user. | |
271 | ||
272 | IMPORTANT: the only settings that is required is ZED_EMAIL_ADDR. All | |
273 | other settings are optional. | |
274 | ||
275 | ||
276 | Limit ZFS memory usage | |
277 | ~~~~~~~~~~~~~~~~~~~~~~ | |
278 | ||
279 | It is good to use max 50 percent of the system memory for ZFS arc to | |
280 | prevent performance shortage of the host. Use your preferred editor to | |
281 | change the configuration in /etc/modprobe.d/zfs.conf and insert: | |
282 | ||
283 | options zfs zfs_arc_max=8589934592 | |
284 | ||
285 | This example setting limits the usage to 8GB. | |
286 | ||
287 | [IMPORTANT] | |
288 | ==== | |
289 | If your root fs is ZFS you must update your initramfs every | |
290 | time this value changes. | |
291 | ||
292 | update-initramfs -u | |
293 | ==== | |
294 | ||
295 | ||
296 | .SWAP on ZFS | |
297 | ||
298 | SWAP on ZFS on Linux may generate some troubles, like blocking the | |
299 | server or generating a high IO load, often seen when starting a Backup | |
300 | to an external Storage. | |
301 | ||
302 | We strongly recommend to use enough memory, so that you normally do not | |
303 | run into low memory situations. Additionally, you can lower the | |
304 | 'swappiness' value. A good value for servers is 10: | |
305 | ||
306 | sysctl -w vm.swappiness=10 | |
307 | ||
308 | To make the swappiness persistence, open '/etc/sysctl.conf' with | |
309 | an editor of your choice and add the following line: | |
310 | ||
311 | vm.swappiness = 10 | |
312 | ||
313 | .Linux Kernel 'swappiness' parameter values | |
314 | [width="100%",cols="<m,2d",options="header"] | |
315 | |=========================================================== | |
316 | | Value | Strategy | |
317 | | vm.swappiness = 0 | The kernel will swap only to avoid | |
318 | an 'out of memory' condition | |
319 | | vm.swappiness = 1 | Minimum amount of swapping without | |
320 | disabling it entirely. | |
321 | | vm.swappiness = 10 | This value is sometimes recommended to | |
322 | improve performance when sufficient memory exists in a system. | |
323 | | vm.swappiness = 60 | The default value. | |
324 | | vm.swappiness = 100 | The kernel will swap aggressively. | |
325 | |=========================================================== |