]>
Commit | Line | Data |
---|---|---|
1 | [[chapter_pct]] | |
2 | ifdef::manvolnum[] | |
3 | pct(1) | |
4 | ====== | |
5 | :pve-toplevel: | |
6 | ||
7 | NAME | |
8 | ---- | |
9 | ||
10 | pct - Tool to manage Linux Containers (LXC) on Proxmox VE | |
11 | ||
12 | ||
13 | SYNOPSIS | |
14 | -------- | |
15 | ||
16 | include::pct.1-synopsis.adoc[] | |
17 | ||
18 | DESCRIPTION | |
19 | ----------- | |
20 | endif::manvolnum[] | |
21 | ||
22 | ifndef::manvolnum[] | |
23 | Proxmox Container Toolkit | |
24 | ========================= | |
25 | :pve-toplevel: | |
26 | endif::manvolnum[] | |
27 | ifdef::wiki[] | |
28 | :title: Linux Container | |
29 | endif::wiki[] | |
30 | ||
31 | Containers are a lightweight alternative to fully virtualized machines (VMs). | |
32 | They use the kernel of the host system that they run on, instead of emulating a | |
33 | full operating system (OS). This means that containers can access resources on | |
34 | the host system directly. | |
35 | ||
36 | The runtime costs for containers is low, usually negligible. However, there are | |
37 | some drawbacks that need be considered: | |
38 | ||
39 | * Only Linux distributions can be run in Proxmox Containers. It is not possible to run | |
40 | other operating systems like, for example, FreeBSD or Microsoft Windows | |
41 | inside a container. | |
42 | ||
43 | * For security reasons, access to host resources needs to be restricted. | |
44 | Therefore, containers run in their own separate namespaces. Additionally some | |
45 | syscalls (user space requests to the Linux kernel) are not allowed within containers. | |
46 | ||
47 | {pve} uses https://linuxcontainers.org/lxc/introduction/[Linux Containers (LXC)] as its underlying | |
48 | container technology. The ``Proxmox Container Toolkit'' (`pct`) simplifies the | |
49 | usage and management of LXC, by providing an interface that abstracts | |
50 | complex tasks. | |
51 | ||
52 | Containers are tightly integrated with {pve}. This means that they are aware of | |
53 | the cluster setup, and they can use the same network and storage resources as | |
54 | virtual machines. You can also use the {pve} firewall, or manage containers | |
55 | using the HA framework. | |
56 | ||
57 | Our primary goal is to offer an environment that provides the benefits of using a | |
58 | VM, but without the additional overhead. This means that Proxmox Containers can | |
59 | be categorized as ``System Containers'', rather than ``Application Containers''. | |
60 | ||
61 | NOTE: If you want to run application containers, for example, 'Docker' images, it | |
62 | is recommended that you run them inside a Proxmox Qemu VM. This will give you | |
63 | all the advantages of application containerization, while also providing the | |
64 | benefits that VMs offer, such as strong isolation from the host and the ability | |
65 | to live-migrate, which otherwise isn't possible with containers. | |
66 | ||
67 | ||
68 | Technology Overview | |
69 | ------------------- | |
70 | ||
71 | * LXC (https://linuxcontainers.org/) | |
72 | ||
73 | * Integrated into {pve} graphical web user interface (GUI) | |
74 | ||
75 | * Easy to use command line tool `pct` | |
76 | ||
77 | * Access via {pve} REST API | |
78 | ||
79 | * 'lxcfs' to provide containerized /proc file system | |
80 | ||
81 | * Control groups ('cgroups') for resource isolation and limitation | |
82 | ||
83 | * 'AppArmor' and 'seccomp' to improve security | |
84 | ||
85 | * Modern Linux kernels | |
86 | ||
87 | * Image based deployment (templates) | |
88 | ||
89 | * Uses {pve} xref:chapter_storage[storage library] | |
90 | ||
91 | * Container setup from host (network, DNS, storage, etc.) | |
92 | ||
93 | ||
94 | [[pct_container_images]] | |
95 | Container Images | |
96 | ---------------- | |
97 | ||
98 | Container images, sometimes also referred to as ``templates'' or | |
99 | ``appliances'', are `tar` archives which contain everything to run a container. | |
100 | ||
101 | {pve} itself provides a variety of basic templates for the most common Linux | |
102 | distributions. They can be downloaded using the GUI or the `pveam` (short for | |
103 | {pve} Appliance Manager) command line utility. | |
104 | Additionally, https://www.turnkeylinux.org/[TurnKey Linux] container templates | |
105 | are also available to download. | |
106 | ||
107 | The list of available templates is updated daily through the 'pve-daily-update' | |
108 | timer. You can also trigger an update manually by executing: | |
109 | ||
110 | ---- | |
111 | # pveam update | |
112 | ---- | |
113 | ||
114 | To view the list of available images run: | |
115 | ||
116 | ---- | |
117 | # pveam available | |
118 | ---- | |
119 | ||
120 | You can restrict this large list by specifying the `section` you are | |
121 | interested in, for example basic `system` images: | |
122 | ||
123 | .List available system images | |
124 | ---- | |
125 | # pveam available --section system | |
126 | system alpine-3.12-default_20200823_amd64.tar.xz | |
127 | system alpine-3.13-default_20210419_amd64.tar.xz | |
128 | system alpine-3.14-default_20210623_amd64.tar.xz | |
129 | system archlinux-base_20210420-1_amd64.tar.gz | |
130 | system centos-7-default_20190926_amd64.tar.xz | |
131 | system centos-8-default_20201210_amd64.tar.xz | |
132 | system debian-9.0-standard_9.7-1_amd64.tar.gz | |
133 | system debian-10-standard_10.7-1_amd64.tar.gz | |
134 | system devuan-3.0-standard_3.0_amd64.tar.gz | |
135 | system fedora-33-default_20201115_amd64.tar.xz | |
136 | system fedora-34-default_20210427_amd64.tar.xz | |
137 | system gentoo-current-default_20200310_amd64.tar.xz | |
138 | system opensuse-15.2-default_20200824_amd64.tar.xz | |
139 | system ubuntu-16.04-standard_16.04.5-1_amd64.tar.gz | |
140 | system ubuntu-18.04-standard_18.04.1-1_amd64.tar.gz | |
141 | system ubuntu-20.04-standard_20.04-1_amd64.tar.gz | |
142 | system ubuntu-20.10-standard_20.10-1_amd64.tar.gz | |
143 | system ubuntu-21.04-standard_21.04-1_amd64.tar.gz | |
144 | ---- | |
145 | ||
146 | Before you can use such a template, you need to download them into one of your | |
147 | storages. If you're unsure to which one, you can simply use the `local` named | |
148 | storage for that purpose. For clustered installations, it is preferred to use a | |
149 | shared storage so that all nodes can access those images. | |
150 | ||
151 | ---- | |
152 | # pveam download local debian-10.0-standard_10.0-1_amd64.tar.gz | |
153 | ---- | |
154 | ||
155 | You are now ready to create containers using that image, and you can list all | |
156 | downloaded images on storage `local` with: | |
157 | ||
158 | ---- | |
159 | # pveam list local | |
160 | local:vztmpl/debian-10.0-standard_10.0-1_amd64.tar.gz 219.95MB | |
161 | ---- | |
162 | ||
163 | TIP: You can also use the {pve} web interface GUI to download, list and delete | |
164 | container templates. | |
165 | ||
166 | `pct` uses them to create a new container, for example: | |
167 | ||
168 | ---- | |
169 | # pct create 999 local:vztmpl/debian-10.0-standard_10.0-1_amd64.tar.gz | |
170 | ---- | |
171 | ||
172 | The above command shows you the full {pve} volume identifiers. They include the | |
173 | storage name, and most other {pve} commands can use them. For example you can | |
174 | delete that image later with: | |
175 | ||
176 | ---- | |
177 | # pveam remove local:vztmpl/debian-10.0-standard_10.0-1_amd64.tar.gz | |
178 | ---- | |
179 | ||
180 | ||
181 | [[pct_settings]] | |
182 | Container Settings | |
183 | ------------------ | |
184 | ||
185 | [[pct_general]] | |
186 | General Settings | |
187 | ~~~~~~~~~~~~~~~~ | |
188 | ||
189 | [thumbnail="screenshot/gui-create-ct-general.png"] | |
190 | ||
191 | General settings of a container include | |
192 | ||
193 | * the *Node* : the physical server on which the container will run | |
194 | * the *CT ID*: a unique number in this {pve} installation used to identify your | |
195 | container | |
196 | * *Hostname*: the hostname of the container | |
197 | * *Resource Pool*: a logical group of containers and VMs | |
198 | * *Password*: the root password of the container | |
199 | * *SSH Public Key*: a public key for connecting to the root account over SSH | |
200 | * *Unprivileged container*: this option allows to choose at creation time | |
201 | if you want to create a privileged or unprivileged container. | |
202 | ||
203 | Unprivileged Containers | |
204 | ^^^^^^^^^^^^^^^^^^^^^^^ | |
205 | ||
206 | Unprivileged containers use a new kernel feature called user namespaces. | |
207 | The root UID 0 inside the container is mapped to an unprivileged user outside | |
208 | the container. This means that most security issues (container escape, resource | |
209 | abuse, etc.) in these containers will affect a random unprivileged user, and | |
210 | would be a generic kernel security bug rather than an LXC issue. The LXC team | |
211 | thinks unprivileged containers are safe by design. | |
212 | ||
213 | This is the default option when creating a new container. | |
214 | ||
215 | NOTE: If the container uses systemd as an init system, please be aware the | |
216 | systemd version running inside the container should be equal to or greater than | |
217 | 220. | |
218 | ||
219 | ||
220 | Privileged Containers | |
221 | ^^^^^^^^^^^^^^^^^^^^^ | |
222 | ||
223 | Security in containers is achieved by using mandatory access control 'AppArmor' | |
224 | restrictions, 'seccomp' filters and Linux kernel namespaces. The LXC team | |
225 | considers this kind of container as unsafe, and they will not consider new | |
226 | container escape exploits to be security issues worthy of a CVE and quick fix. | |
227 | That's why privileged containers should only be used in trusted environments. | |
228 | ||
229 | ||
230 | [[pct_cpu]] | |
231 | CPU | |
232 | ~~~ | |
233 | ||
234 | [thumbnail="screenshot/gui-create-ct-cpu.png"] | |
235 | ||
236 | You can restrict the number of visible CPUs inside the container using the | |
237 | `cores` option. This is implemented using the Linux 'cpuset' cgroup | |
238 | (**c**ontrol *group*). | |
239 | A special task inside `pvestatd` tries to distribute running containers among | |
240 | available CPUs periodically. | |
241 | To view the assigned CPUs run the following command: | |
242 | ||
243 | ---- | |
244 | # pct cpusets | |
245 | --------------------- | |
246 | 102: 6 7 | |
247 | 105: 2 3 4 5 | |
248 | 108: 0 1 | |
249 | --------------------- | |
250 | ---- | |
251 | ||
252 | Containers use the host kernel directly. All tasks inside a container are | |
253 | handled by the host CPU scheduler. {pve} uses the Linux 'CFS' (**C**ompletely | |
254 | **F**air **S**cheduler) scheduler by default, which has additional bandwidth | |
255 | control options. | |
256 | ||
257 | [horizontal] | |
258 | ||
259 | `cpulimit`: :: You can use this option to further limit assigned CPU time. | |
260 | Please note that this is a floating point number, so it is perfectly valid to | |
261 | assign two cores to a container, but restrict overall CPU consumption to half a | |
262 | core. | |
263 | + | |
264 | ---- | |
265 | cores: 2 | |
266 | cpulimit: 0.5 | |
267 | ---- | |
268 | ||
269 | `cpuunits`: :: This is a relative weight passed to the kernel scheduler. The | |
270 | larger the number is, the more CPU time this container gets. Number is relative | |
271 | to the weights of all the other running containers. The default is 1024. You | |
272 | can use this setting to prioritize some containers. | |
273 | ||
274 | ||
275 | [[pct_memory]] | |
276 | Memory | |
277 | ~~~~~~ | |
278 | ||
279 | [thumbnail="screenshot/gui-create-ct-memory.png"] | |
280 | ||
281 | Container memory is controlled using the cgroup memory controller. | |
282 | ||
283 | [horizontal] | |
284 | ||
285 | `memory`: :: Limit overall memory usage. This corresponds to the | |
286 | `memory.limit_in_bytes` cgroup setting. | |
287 | ||
288 | `swap`: :: Allows the container to use additional swap memory from the host | |
289 | swap space. This corresponds to the `memory.memsw.limit_in_bytes` cgroup | |
290 | setting, which is set to the sum of both value (`memory + swap`). | |
291 | ||
292 | ||
293 | [[pct_mount_points]] | |
294 | Mount Points | |
295 | ~~~~~~~~~~~~ | |
296 | ||
297 | [thumbnail="screenshot/gui-create-ct-root-disk.png"] | |
298 | ||
299 | The root mount point is configured with the `rootfs` property. You can | |
300 | configure up to 256 additional mount points. The corresponding options are | |
301 | called `mp0` to `mp255`. They can contain the following settings: | |
302 | ||
303 | include::pct-mountpoint-opts.adoc[] | |
304 | ||
305 | Currently there are three types of mount points: storage backed mount points, | |
306 | bind mounts, and device mounts. | |
307 | ||
308 | .Typical container `rootfs` configuration | |
309 | ---- | |
310 | rootfs: thin1:base-100-disk-1,size=8G | |
311 | ---- | |
312 | ||
313 | ||
314 | Storage Backed Mount Points | |
315 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
316 | ||
317 | Storage backed mount points are managed by the {pve} storage subsystem and come | |
318 | in three different flavors: | |
319 | ||
320 | - Image based: these are raw images containing a single ext4 formatted file | |
321 | system. | |
322 | - ZFS subvolumes: these are technically bind mounts, but with managed storage, | |
323 | and thus allow resizing and snapshotting. | |
324 | - Directories: passing `size=0` triggers a special case where instead of a raw | |
325 | image a directory is created. | |
326 | ||
327 | NOTE: The special option syntax `STORAGE_ID:SIZE_IN_GB` for storage backed | |
328 | mount point volumes will automatically allocate a volume of the specified size | |
329 | on the specified storage. For example, calling | |
330 | ||
331 | ---- | |
332 | pct set 100 -mp0 thin1:10,mp=/path/in/container | |
333 | ---- | |
334 | ||
335 | will allocate a 10GB volume on the storage `thin1` and replace the volume ID | |
336 | place holder `10` with the allocated volume ID, and setup the moutpoint in the | |
337 | container at `/path/in/container` | |
338 | ||
339 | ||
340 | Bind Mount Points | |
341 | ^^^^^^^^^^^^^^^^^ | |
342 | ||
343 | Bind mounts allow you to access arbitrary directories from your Proxmox VE host | |
344 | inside a container. Some potential use cases are: | |
345 | ||
346 | - Accessing your home directory in the guest | |
347 | - Accessing an USB device directory in the guest | |
348 | - Accessing an NFS mount from the host in the guest | |
349 | ||
350 | Bind mounts are considered to not be managed by the storage subsystem, so you | |
351 | cannot make snapshots or deal with quotas from inside the container. With | |
352 | unprivileged containers you might run into permission problems caused by the | |
353 | user mapping and cannot use ACLs. | |
354 | ||
355 | NOTE: The contents of bind mount points are not backed up when using `vzdump`. | |
356 | ||
357 | WARNING: For security reasons, bind mounts should only be established using | |
358 | source directories especially reserved for this purpose, e.g., a directory | |
359 | hierarchy under `/mnt/bindmounts`. Never bind mount system directories like | |
360 | `/`, `/var` or `/etc` into a container - this poses a great security risk. | |
361 | ||
362 | NOTE: The bind mount source path must not contain any symlinks. | |
363 | ||
364 | For example, to make the directory `/mnt/bindmounts/shared` accessible in the | |
365 | container with ID `100` under the path `/shared`, use a configuration line like | |
366 | `mp0: /mnt/bindmounts/shared,mp=/shared` in `/etc/pve/lxc/100.conf`. | |
367 | Alternatively, use `pct set 100 -mp0 /mnt/bindmounts/shared,mp=/shared` to | |
368 | achieve the same result. | |
369 | ||
370 | ||
371 | Device Mount Points | |
372 | ^^^^^^^^^^^^^^^^^^^ | |
373 | ||
374 | Device mount points allow to mount block devices of the host directly into the | |
375 | container. Similar to bind mounts, device mounts are not managed by {PVE}'s | |
376 | storage subsystem, but the `quota` and `acl` options will be honored. | |
377 | ||
378 | NOTE: Device mount points should only be used under special circumstances. In | |
379 | most cases a storage backed mount point offers the same performance and a lot | |
380 | more features. | |
381 | ||
382 | NOTE: The contents of device mount points are not backed up when using | |
383 | `vzdump`. | |
384 | ||
385 | ||
386 | [[pct_container_network]] | |
387 | Network | |
388 | ~~~~~~~ | |
389 | ||
390 | [thumbnail="screenshot/gui-create-ct-network.png"] | |
391 | ||
392 | You can configure up to 10 network interfaces for a single container. | |
393 | The corresponding options are called `net0` to `net9`, and they can contain the | |
394 | following setting: | |
395 | ||
396 | include::pct-network-opts.adoc[] | |
397 | ||
398 | ||
399 | [[pct_startup_and_shutdown]] | |
400 | Automatic Start and Shutdown of Containers | |
401 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
402 | ||
403 | To automatically start a container when the host system boots, select the | |
404 | option 'Start at boot' in the 'Options' panel of the container in the web | |
405 | interface or run the following command: | |
406 | ||
407 | ---- | |
408 | # pct set CTID -onboot 1 | |
409 | ---- | |
410 | ||
411 | .Start and Shutdown Order | |
412 | // use the screenshot from qemu - its the same | |
413 | [thumbnail="screenshot/gui-qemu-edit-start-order.png"] | |
414 | ||
415 | If you want to fine tune the boot order of your containers, you can use the | |
416 | following parameters: | |
417 | ||
418 | * *Start/Shutdown order*: Defines the start order priority. For example, set it | |
419 | to 1 if you want the CT to be the first to be started. (We use the reverse | |
420 | startup order for shutdown, so a container with a start order of 1 would be | |
421 | the last to be shut down) | |
422 | * *Startup delay*: Defines the interval between this container start and | |
423 | subsequent containers starts. For example, set it to 240 if you want to wait | |
424 | 240 seconds before starting other containers. | |
425 | * *Shutdown timeout*: Defines the duration in seconds {pve} should wait | |
426 | for the container to be offline after issuing a shutdown command. | |
427 | By default this value is set to 60, which means that {pve} will issue a | |
428 | shutdown request, wait 60s for the machine to be offline, and if after 60s | |
429 | the machine is still online will notify that the shutdown action failed. | |
430 | ||
431 | Please note that containers without a Start/Shutdown order parameter will | |
432 | always start after those where the parameter is set, and this parameter only | |
433 | makes sense between the machines running locally on a host, and not | |
434 | cluster-wide. | |
435 | ||
436 | If you require a delay between the host boot and the booting of the first | |
437 | container, see the section on | |
438 | xref:first_guest_boot_delay[Proxmox VE Node Management]. | |
439 | ||
440 | ||
441 | Hookscripts | |
442 | ~~~~~~~~~~~ | |
443 | ||
444 | You can add a hook script to CTs with the config property `hookscript`. | |
445 | ||
446 | ---- | |
447 | # pct set 100 -hookscript local:snippets/hookscript.pl | |
448 | ---- | |
449 | ||
450 | It will be called during various phases of the guests lifetime. For an example | |
451 | and documentation see the example script under | |
452 | `/usr/share/pve-docs/examples/guest-example-hookscript.pl`. | |
453 | ||
454 | Security Considerations | |
455 | ----------------------- | |
456 | ||
457 | Containers use the kernel of the host system. This exposes an attack surface | |
458 | for malicious users. In general, full virtual machines provide better | |
459 | isolation. This should be considered if containers are provided to unknown or | |
460 | untrusted people. | |
461 | ||
462 | To reduce the attack surface, LXC uses many security features like AppArmor, | |
463 | CGroups and kernel namespaces. | |
464 | ||
465 | AppArmor | |
466 | ~~~~~~~~ | |
467 | ||
468 | AppArmor profiles are used to restrict access to possibly dangerous actions. | |
469 | Some system calls, i.e. `mount`, are prohibited from execution. | |
470 | ||
471 | To trace AppArmor activity, use: | |
472 | ||
473 | ---- | |
474 | # dmesg | grep apparmor | |
475 | ---- | |
476 | ||
477 | Although it is not recommended, AppArmor can be disabled for a container. This | |
478 | brings security risks with it. Some syscalls can lead to privilege escalation | |
479 | when executed within a container if the system is misconfigured or if a LXC or | |
480 | Linux Kernel vulnerability exists. | |
481 | ||
482 | To disable AppArmor for a container, add the following line to the container | |
483 | configuration file located at `/etc/pve/lxc/CTID.conf`: | |
484 | ||
485 | ---- | |
486 | lxc.apparmor.profile = unconfined | |
487 | ---- | |
488 | ||
489 | WARNING: Please note that this is not recommended for production use. | |
490 | ||
491 | ||
492 | [[pct_cgroup]] | |
493 | Control Groups ('cgroup') | |
494 | ~~~~~~~~~~~~~~~~~~~~~~~~~ | |
495 | ||
496 | 'cgroup' is a kernel | |
497 | mechanism used to hierarchically organize processes and distribute system | |
498 | resources. | |
499 | ||
500 | The main resources controlled via 'cgroups' are CPU time, memory and swap | |
501 | limits, and access to device nodes. 'cgroups' are also used to "freeze" a | |
502 | container before taking snapshots. | |
503 | ||
504 | There are 2 versions of 'cgroups' currently available, | |
505 | https://www.kernel.org/doc/html/v5.11/admin-guide/cgroup-v1/index.html[legacy] | |
506 | and | |
507 | https://www.kernel.org/doc/html/v5.11/admin-guide/cgroup-v2.html['cgroupv2']. | |
508 | ||
509 | Since {pve} 7.0, the default is a pure 'cgroupv2' environment. Previously a | |
510 | "hybrid" setup was used, where resource control was mainly done in 'cgroupv1' | |
511 | with an additional 'cgroupv2' controller which could take over some subsystems | |
512 | via the 'cgroup_no_v1' kernel command line parameter. (See the | |
513 | https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html[kernel | |
514 | parameter documentation] for details.) | |
515 | ||
516 | [[pct_cgroup_compat]] | |
517 | CGroup Version Compatibility | |
518 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
519 | The main difference between pure 'cgroupv2' and the old hybrid environments | |
520 | regarding {pve} is that with 'cgroupv2' memory and swap are now controlled | |
521 | independently. The memory and swap settings for containers can map directly to | |
522 | these values, whereas previously only the memory limit and the limit of the | |
523 | *sum* of memory and swap could be limited. | |
524 | ||
525 | Another important difference is that the 'devices' controller is configured in a | |
526 | completely different way. Because of this, file system quotas are currently not | |
527 | supported in a pure 'cgroupv2' environment. | |
528 | ||
529 | 'cgroupv2' support by the container's OS is needed to run in a pure 'cgroupv2' | |
530 | environment. Containers running 'systemd' version 231 or newer support | |
531 | 'cgroupv2' footnote:[this includes all newest major versions of container | |
532 | templates shipped by {pve}], as do containers not using 'systemd' as init | |
533 | system footnote:[for example Alpine Linux]. | |
534 | ||
535 | [NOTE] | |
536 | ==== | |
537 | CentOS 7 and Ubuntu 16.10 are two prominent Linux distributions releases, | |
538 | which have a 'systemd' version that is too old to run in a 'cgroupv2' | |
539 | environment, you can either | |
540 | ||
541 | * Upgrade the whole distribution to a newer release. For the examples above, that | |
542 | could be Ubuntu 18.04 or 20.04, and CentOS 8 (or RHEL/CentOS derivatives like | |
543 | AlmaLinux or Rocky Linux). This has the benefit to get the newest bug and | |
544 | security fixes, often also new features, and moving the EOL date in the future. | |
545 | ||
546 | * Upgrade the Containers systemd version. If the distribution provides a | |
547 | backports repository this can be an easy and quick stop-gap measurement. | |
548 | ||
549 | * Move the container, or its services, to a Virtual Machine. Virtual Machines | |
550 | have a much less interaction with the host, that's why one can install | |
551 | decades old OS versions just fine there. | |
552 | ||
553 | * Switch back to the legacy 'cgroup' controller. Note that while it can be a | |
554 | valid solution, it's not a permanent one. There's a high likelihood that a | |
555 | future {pve} major release, for example 8.0, cannot support the legacy | |
556 | controller anymore. | |
557 | ==== | |
558 | ||
559 | [[pct_cgroup_change_version]] | |
560 | Changing CGroup Version | |
561 | ^^^^^^^^^^^^^^^^^^^^^^^ | |
562 | ||
563 | TIP: If file system quotas are not required and all containers support 'cgroupv2', | |
564 | it is recommended to stick to the new default. | |
565 | ||
566 | To switch back to the previous version the following kernel command line | |
567 | parameter can be used: | |
568 | ||
569 | ---- | |
570 | systemd.unified_cgroup_hierarchy=0 | |
571 | ---- | |
572 | ||
573 | See xref:sysboot_edit_kernel_cmdline[this section] on editing the kernel boot | |
574 | command line on where to add the parameter. | |
575 | ||
576 | // TODO: seccomp a bit more. | |
577 | // TODO: pve-lxc-syscalld | |
578 | ||
579 | ||
580 | Guest Operating System Configuration | |
581 | ------------------------------------ | |
582 | ||
583 | {pve} tries to detect the Linux distribution in the container, and modifies | |
584 | some files. Here is a short list of things done at container startup: | |
585 | ||
586 | set /etc/hostname:: to set the container name | |
587 | ||
588 | modify /etc/hosts:: to allow lookup of the local hostname | |
589 | ||
590 | network setup:: pass the complete network setup to the container | |
591 | ||
592 | configure DNS:: pass information about DNS servers | |
593 | ||
594 | adapt the init system:: for example, fix the number of spawned getty processes | |
595 | ||
596 | set the root password:: when creating a new container | |
597 | ||
598 | rewrite ssh_host_keys:: so that each container has unique keys | |
599 | ||
600 | randomize crontab:: so that cron does not start at the same time on all containers | |
601 | ||
602 | Changes made by {PVE} are enclosed by comment markers: | |
603 | ||
604 | ---- | |
605 | # --- BEGIN PVE --- | |
606 | <data> | |
607 | # --- END PVE --- | |
608 | ---- | |
609 | ||
610 | Those markers will be inserted at a reasonable location in the file. If such a | |
611 | section already exists, it will be updated in place and will not be moved. | |
612 | ||
613 | Modification of a file can be prevented by adding a `.pve-ignore.` file for it. | |
614 | For instance, if the file `/etc/.pve-ignore.hosts` exists then the `/etc/hosts` | |
615 | file will not be touched. This can be a simple empty file created via: | |
616 | ||
617 | ---- | |
618 | # touch /etc/.pve-ignore.hosts | |
619 | ---- | |
620 | ||
621 | Most modifications are OS dependent, so they differ between different | |
622 | distributions and versions. You can completely disable modifications by | |
623 | manually setting the `ostype` to `unmanaged`. | |
624 | ||
625 | OS type detection is done by testing for certain files inside the | |
626 | container. {pve} first checks the `/etc/os-release` file | |
627 | footnote:[/etc/os-release replaces the multitude of per-distribution | |
628 | release files https://manpages.debian.org/stable/systemd/os-release.5.en.html]. | |
629 | If that file is not present, or it does not contain a clearly recognizable | |
630 | distribution identifier the following distribution specific release files are | |
631 | checked. | |
632 | ||
633 | Ubuntu:: inspect /etc/lsb-release (`DISTRIB_ID=Ubuntu`) | |
634 | ||
635 | Debian:: test /etc/debian_version | |
636 | ||
637 | Fedora:: test /etc/fedora-release | |
638 | ||
639 | RedHat or CentOS:: test /etc/redhat-release | |
640 | ||
641 | ArchLinux:: test /etc/arch-release | |
642 | ||
643 | Alpine:: test /etc/alpine-release | |
644 | ||
645 | Gentoo:: test /etc/gentoo-release | |
646 | ||
647 | NOTE: Container start fails if the configured `ostype` differs from the auto | |
648 | detected type. | |
649 | ||
650 | ||
651 | [[pct_container_storage]] | |
652 | Container Storage | |
653 | ----------------- | |
654 | ||
655 | The {pve} LXC container storage model is more flexible than traditional | |
656 | container storage models. A container can have multiple mount points. This | |
657 | makes it possible to use the best suited storage for each application. | |
658 | ||
659 | For example the root file system of the container can be on slow and cheap | |
660 | storage while the database can be on fast and distributed storage via a second | |
661 | mount point. See section <<pct_mount_points, Mount Points>> for further | |
662 | details. | |
663 | ||
664 | Any storage type supported by the {pve} storage library can be used. This means | |
665 | that containers can be stored on local (for example `lvm`, `zfs` or directory), | |
666 | shared external (like `iSCSI`, `NFS`) or even distributed storage systems like | |
667 | Ceph. Advanced storage features like snapshots or clones can be used if the | |
668 | underlying storage supports them. The `vzdump` backup tool can use snapshots to | |
669 | provide consistent container backups. | |
670 | ||
671 | Furthermore, local devices or local directories can be mounted directly using | |
672 | 'bind mounts'. This gives access to local resources inside a container with | |
673 | practically zero overhead. Bind mounts can be used as an easy way to share data | |
674 | between containers. | |
675 | ||
676 | ||
677 | FUSE Mounts | |
678 | ~~~~~~~~~~~ | |
679 | ||
680 | WARNING: Because of existing issues in the Linux kernel's freezer subsystem the | |
681 | usage of FUSE mounts inside a container is strongly advised against, as | |
682 | containers need to be frozen for suspend or snapshot mode backups. | |
683 | ||
684 | If FUSE mounts cannot be replaced by other mounting mechanisms or storage | |
685 | technologies, it is possible to establish the FUSE mount on the Proxmox host | |
686 | and use a bind mount point to make it accessible inside the container. | |
687 | ||
688 | ||
689 | Using Quotas Inside Containers | |
690 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
691 | ||
692 | Quotas allow to set limits inside a container for the amount of disk space that | |
693 | each user can use. | |
694 | ||
695 | NOTE: This currently requires the use of legacy 'cgroups'. | |
696 | ||
697 | NOTE: This only works on ext4 image based storage types and currently only | |
698 | works with privileged containers. | |
699 | ||
700 | Activating the `quota` option causes the following mount options to be used for | |
701 | a mount point: | |
702 | `usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0` | |
703 | ||
704 | This allows quotas to be used like on any other system. You can initialize the | |
705 | `/aquota.user` and `/aquota.group` files by running: | |
706 | ||
707 | ---- | |
708 | # quotacheck -cmug / | |
709 | # quotaon / | |
710 | ---- | |
711 | ||
712 | Then edit the quotas using the `edquota` command. Refer to the documentation of | |
713 | the distribution running inside the container for details. | |
714 | ||
715 | NOTE: You need to run the above commands for every mount point by passing the | |
716 | mount point's path instead of just `/`. | |
717 | ||
718 | ||
719 | Using ACLs Inside Containers | |
720 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
721 | ||
722 | The standard Posix **A**ccess **C**ontrol **L**ists are also available inside | |
723 | containers. ACLs allow you to set more detailed file ownership than the | |
724 | traditional user/group/others model. | |
725 | ||
726 | ||
727 | Backup of Container mount points | |
728 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
729 | ||
730 | To include a mount point in backups, enable the `backup` option for it in the | |
731 | container configuration. For an existing mount point `mp0` | |
732 | ||
733 | ---- | |
734 | mp0: guests:subvol-100-disk-1,mp=/root/files,size=8G | |
735 | ---- | |
736 | ||
737 | add `backup=1` to enable it. | |
738 | ||
739 | ---- | |
740 | mp0: guests:subvol-100-disk-1,mp=/root/files,size=8G,backup=1 | |
741 | ---- | |
742 | ||
743 | NOTE: When creating a new mount point in the GUI, this option is enabled by | |
744 | default. | |
745 | ||
746 | To disable backups for a mount point, add `backup=0` in the way described | |
747 | above, or uncheck the *Backup* checkbox on the GUI. | |
748 | ||
749 | Replication of Containers mount points | |
750 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
751 | ||
752 | By default, additional mount points are replicated when the Root Disk is | |
753 | replicated. If you want the {pve} storage replication mechanism to skip a mount | |
754 | point, you can set the *Skip replication* option for that mount point. | |
755 | As of {pve} 5.0, replication requires a storage of type `zfspool`. Adding a | |
756 | mount point to a different type of storage when the container has replication | |
757 | configured requires to have *Skip replication* enabled for that mount point. | |
758 | ||
759 | ||
760 | Backup and Restore | |
761 | ------------------ | |
762 | ||
763 | ||
764 | Container Backup | |
765 | ~~~~~~~~~~~~~~~~ | |
766 | ||
767 | It is possible to use the `vzdump` tool for container backup. Please refer to | |
768 | the `vzdump` manual page for details. | |
769 | ||
770 | ||
771 | Restoring Container Backups | |
772 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
773 | ||
774 | Restoring container backups made with `vzdump` is possible using the `pct | |
775 | restore` command. By default, `pct restore` will attempt to restore as much of | |
776 | the backed up container configuration as possible. It is possible to override | |
777 | the backed up configuration by manually setting container options on the | |
778 | command line (see the `pct` manual page for details). | |
779 | ||
780 | NOTE: `pvesm extractconfig` can be used to view the backed up configuration | |
781 | contained in a vzdump archive. | |
782 | ||
783 | There are two basic restore modes, only differing by their handling of mount | |
784 | points: | |
785 | ||
786 | ||
787 | ``Simple'' Restore Mode | |
788 | ^^^^^^^^^^^^^^^^^^^^^^^ | |
789 | ||
790 | If neither the `rootfs` parameter nor any of the optional `mpX` parameters are | |
791 | explicitly set, the mount point configuration from the backed up configuration | |
792 | file is restored using the following steps: | |
793 | ||
794 | . Extract mount points and their options from backup | |
795 | . Create volumes for storage backed mount points (on storage provided with the | |
796 | `storage` parameter, or default local storage if unset) | |
797 | . Extract files from backup archive | |
798 | . Add bind and device mount points to restored configuration (limited to root | |
799 | user) | |
800 | ||
801 | NOTE: Since bind and device mount points are never backed up, no files are | |
802 | restored in the last step, but only the configuration options. The assumption | |
803 | is that such mount points are either backed up with another mechanism (e.g., | |
804 | NFS space that is bind mounted into many containers), or not intended to be | |
805 | backed up at all. | |
806 | ||
807 | This simple mode is also used by the container restore operations in the web | |
808 | interface. | |
809 | ||
810 | ||
811 | ``Advanced'' Restore Mode | |
812 | ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
813 | ||
814 | By setting the `rootfs` parameter (and optionally, any combination of `mpX` | |
815 | parameters), the `pct restore` command is automatically switched into an | |
816 | advanced mode. This advanced mode completely ignores the `rootfs` and `mpX` | |
817 | configuration options contained in the backup archive, and instead only uses | |
818 | the options explicitly provided as parameters. | |
819 | ||
820 | This mode allows flexible configuration of mount point settings at restore | |
821 | time, for example: | |
822 | ||
823 | * Set target storages, volume sizes and other options for each mount point | |
824 | individually | |
825 | * Redistribute backed up files according to new mount point scheme | |
826 | * Restore to device and/or bind mount points (limited to root user) | |
827 | ||
828 | ||
829 | Managing Containers with `pct` | |
830 | ------------------------------ | |
831 | ||
832 | The ``Proxmox Container Toolkit'' (`pct`) is the command line tool to manage | |
833 | {pve} containers. It enables you to create or destroy containers, as well as | |
834 | control the container execution (start, stop, reboot, migrate, etc.). It can be | |
835 | used to set parameters in the config file of a container, for example the | |
836 | network configuration or memory limits. | |
837 | ||
838 | CLI Usage Examples | |
839 | ~~~~~~~~~~~~~~~~~~ | |
840 | ||
841 | Create a container based on a Debian template (provided you have already | |
842 | downloaded the template via the web interface) | |
843 | ||
844 | ---- | |
845 | # pct create 100 /var/lib/vz/template/cache/debian-10.0-standard_10.0-1_amd64.tar.gz | |
846 | ---- | |
847 | ||
848 | Start container 100 | |
849 | ||
850 | ---- | |
851 | # pct start 100 | |
852 | ---- | |
853 | ||
854 | Start a login session via getty | |
855 | ||
856 | ---- | |
857 | # pct console 100 | |
858 | ---- | |
859 | ||
860 | Enter the LXC namespace and run a shell as root user | |
861 | ||
862 | ---- | |
863 | # pct enter 100 | |
864 | ---- | |
865 | ||
866 | Display the configuration | |
867 | ||
868 | ---- | |
869 | # pct config 100 | |
870 | ---- | |
871 | ||
872 | Add a network interface called `eth0`, bridged to the host bridge `vmbr0`, set | |
873 | the address and gateway, while it's running | |
874 | ||
875 | ---- | |
876 | # pct set 100 -net0 name=eth0,bridge=vmbr0,ip=192.168.15.147/24,gw=192.168.15.1 | |
877 | ---- | |
878 | ||
879 | Reduce the memory of the container to 512MB | |
880 | ||
881 | ---- | |
882 | # pct set 100 -memory 512 | |
883 | ---- | |
884 | ||
885 | Destroying a container always removes it from Access Control Lists and it always | |
886 | removes the firewall configuration of the container. You have to activate | |
887 | '--purge', if you want to additionally remove the container from replication jobs, | |
888 | backup jobs and HA resource configurations. | |
889 | ||
890 | ---- | |
891 | # pct destroy 100 --purge | |
892 | ---- | |
893 | ||
894 | ||
895 | ||
896 | Obtaining Debugging Logs | |
897 | ~~~~~~~~~~~~~~~~~~~~~~~~ | |
898 | ||
899 | In case `pct start` is unable to start a specific container, it might be | |
900 | helpful to collect debugging output by passing the `--debug` flag (replace `CTID` with | |
901 | the container's CTID): | |
902 | ||
903 | ---- | |
904 | # pct start CTID --debug | |
905 | ---- | |
906 | ||
907 | Alternatively, you can use the following `lxc-start` command, which will save | |
908 | the debug log to the file specified by the `-o` output option: | |
909 | ||
910 | ---- | |
911 | # lxc-start -n CTID -F -l DEBUG -o /tmp/lxc-CTID.log | |
912 | ---- | |
913 | ||
914 | This command will attempt to start the container in foreground mode, to stop | |
915 | the container run `pct shutdown CTID` or `pct stop CTID` in a second terminal. | |
916 | ||
917 | The collected debug log is written to `/tmp/lxc-CTID.log`. | |
918 | ||
919 | NOTE: If you have changed the container's configuration since the last start | |
920 | attempt with `pct start`, you need to run `pct start` at least once to also | |
921 | update the configuration used by `lxc-start`. | |
922 | ||
923 | [[pct_migration]] | |
924 | Migration | |
925 | --------- | |
926 | ||
927 | If you have a cluster, you can migrate your Containers with | |
928 | ||
929 | ---- | |
930 | # pct migrate <ctid> <target> | |
931 | ---- | |
932 | ||
933 | This works as long as your Container is offline. If it has local volumes or | |
934 | mount points defined, the migration will copy the content over the network to | |
935 | the target host if the same storage is defined there. | |
936 | ||
937 | Running containers cannot live-migrated due to technical limitations. You can | |
938 | do a restart migration, which shuts down, moves and then starts a container | |
939 | again on the target node. As containers are very lightweight, this results | |
940 | normally only in a downtime of some hundreds of milliseconds. | |
941 | ||
942 | A restart migration can be done through the web interface or by using the | |
943 | `--restart` flag with the `pct migrate` command. | |
944 | ||
945 | A restart migration will shut down the Container and kill it after the | |
946 | specified timeout (the default is 180 seconds). Then it will migrate the | |
947 | Container like an offline migration and when finished, it starts the Container | |
948 | on the target node. | |
949 | ||
950 | [[pct_configuration]] | |
951 | Configuration | |
952 | ------------- | |
953 | ||
954 | The `/etc/pve/lxc/<CTID>.conf` file stores container configuration, where | |
955 | `<CTID>` is the numeric ID of the given container. Like all other files stored | |
956 | inside `/etc/pve/`, they get automatically replicated to all other cluster | |
957 | nodes. | |
958 | ||
959 | NOTE: CTIDs < 100 are reserved for internal purposes, and CTIDs need to be | |
960 | unique cluster wide. | |
961 | ||
962 | .Example Container Configuration | |
963 | ---- | |
964 | ostype: debian | |
965 | arch: amd64 | |
966 | hostname: www | |
967 | memory: 512 | |
968 | swap: 512 | |
969 | net0: bridge=vmbr0,hwaddr=66:64:66:64:64:36,ip=dhcp,name=eth0,type=veth | |
970 | rootfs: local:107/vm-107-disk-1.raw,size=7G | |
971 | ---- | |
972 | ||
973 | The configuration files are simple text files. You can edit them using a normal | |
974 | text editor, for example, `vi` or `nano`. | |
975 | This is sometimes useful to do small corrections, but keep in mind that you | |
976 | need to restart the container to apply such changes. | |
977 | ||
978 | For that reason, it is usually better to use the `pct` command to generate and | |
979 | modify those files, or do the whole thing using the GUI. | |
980 | Our toolkit is smart enough to instantaneously apply most changes to running | |
981 | containers. This feature is called ``hot plug'', and there is no need to restart | |
982 | the container in that case. | |
983 | ||
984 | In cases where a change cannot be hot-plugged, it will be registered as a | |
985 | pending change (shown in red color in the GUI). | |
986 | They will only be applied after rebooting the container. | |
987 | ||
988 | ||
989 | File Format | |
990 | ~~~~~~~~~~~ | |
991 | ||
992 | The container configuration file uses a simple colon separated key/value | |
993 | format. Each line has the following format: | |
994 | ||
995 | ----- | |
996 | # this is a comment | |
997 | OPTION: value | |
998 | ----- | |
999 | ||
1000 | Blank lines in those files are ignored, and lines starting with a `#` character | |
1001 | are treated as comments and are also ignored. | |
1002 | ||
1003 | It is possible to add low-level, LXC style configuration directly, for example: | |
1004 | ||
1005 | ---- | |
1006 | lxc.init_cmd: /sbin/my_own_init | |
1007 | ---- | |
1008 | ||
1009 | or | |
1010 | ||
1011 | ---- | |
1012 | lxc.init_cmd = /sbin/my_own_init | |
1013 | ---- | |
1014 | ||
1015 | The settings are passed directly to the LXC low-level tools. | |
1016 | ||
1017 | ||
1018 | [[pct_snapshots]] | |
1019 | Snapshots | |
1020 | ~~~~~~~~~ | |
1021 | ||
1022 | When you create a snapshot, `pct` stores the configuration at snapshot time | |
1023 | into a separate snapshot section within the same configuration file. For | |
1024 | example, after creating a snapshot called ``testsnapshot'', your configuration | |
1025 | file will look like this: | |
1026 | ||
1027 | .Container configuration with snapshot | |
1028 | ---- | |
1029 | memory: 512 | |
1030 | swap: 512 | |
1031 | parent: testsnaphot | |
1032 | ... | |
1033 | ||
1034 | [testsnaphot] | |
1035 | memory: 512 | |
1036 | swap: 512 | |
1037 | snaptime: 1457170803 | |
1038 | ... | |
1039 | ---- | |
1040 | ||
1041 | There are a few snapshot related properties like `parent` and `snaptime`. The | |
1042 | `parent` property is used to store the parent/child relationship between | |
1043 | snapshots. `snaptime` is the snapshot creation time stamp (Unix epoch). | |
1044 | ||
1045 | ||
1046 | [[pct_options]] | |
1047 | Options | |
1048 | ~~~~~~~ | |
1049 | ||
1050 | include::pct.conf.5-opts.adoc[] | |
1051 | ||
1052 | ||
1053 | Locks | |
1054 | ----- | |
1055 | ||
1056 | Container migrations, snapshots and backups (`vzdump`) set a lock to prevent | |
1057 | incompatible concurrent actions on the affected container. Sometimes you need | |
1058 | to remove such a lock manually (e.g., after a power failure). | |
1059 | ||
1060 | ---- | |
1061 | # pct unlock <CTID> | |
1062 | ---- | |
1063 | ||
1064 | CAUTION: Only do this if you are sure the action which set the lock is no | |
1065 | longer running. | |
1066 | ||
1067 | ||
1068 | ifdef::manvolnum[] | |
1069 | ||
1070 | Files | |
1071 | ------ | |
1072 | ||
1073 | `/etc/pve/lxc/<CTID>.conf`:: | |
1074 | ||
1075 | Configuration file for the container '<CTID>'. | |
1076 | ||
1077 | ||
1078 | include::pve-copyright.adoc[] | |
1079 | endif::manvolnum[] |