]>
Commit | Line | Data |
---|---|---|
0c6b782f DM |
1 | ifdef::manvolnum[] |
2 | PVE({manvolnum}) | |
3 | ================ | |
38fd0958 | 4 | include::attributes.txt[] |
0c6b782f DM |
5 | |
6 | NAME | |
7 | ---- | |
8 | ||
9 | pct - Tool to manage Linux Containers (LXC) on Proxmox VE | |
10 | ||
11 | ||
12 | SYNOPSYS | |
13 | -------- | |
14 | ||
15 | include::pct.1-synopsis.adoc[] | |
16 | ||
17 | DESCRIPTION | |
18 | ----------- | |
19 | endif::manvolnum[] | |
20 | ||
21 | ifndef::manvolnum[] | |
22 | Proxmox Container Toolkit | |
23 | ========================= | |
38fd0958 | 24 | include::attributes.txt[] |
0c6b782f DM |
25 | endif::manvolnum[] |
26 | ||
4a2ae9ed DM |
27 | |
28 | Containers are a lightweight alternative to fully virtualized | |
29 | VMs. Instead of emulating a complete Operating System (OS), containers | |
30 | simply use the OS of the host they run on. This implies that all | |
31 | containers use the same kernel, and that they can access resources | |
32 | from the host directly. | |
33 | ||
34 | This is great because containers do not waste CPU power nor memory due | |
35 | to kernel emulation. Container run-time costs are close to zero and | |
36 | usually negligible. But there are also some drawbacks you need to | |
37 | consider: | |
38 | ||
39 | * You can only run Linux based OS inside containers, i.e. it is not | |
a8e99754 | 40 | possible to run FreeBSD or MS Windows inside. |
4a2ae9ed | 41 | |
a8e99754 | 42 | * For security reasons, access to host resources needs to be |
4a2ae9ed | 43 | restricted. This is done with AppArmor, SecComp filters and other |
a8e99754 | 44 | kernel features. Be prepared that some syscalls are not allowed |
4a2ae9ed DM |
45 | inside containers. |
46 | ||
47 | {pve} uses https://linuxcontainers.org/[LXC] as underlying container | |
48 | technology. We consider LXC as low-level library, which provides | |
a8e99754 | 49 | countless options. It would be too difficult to use those tools |
4a2ae9ed DM |
50 | directly. Instead, we provide a small wrapper called `pct`, the |
51 | "Proxmox Container Toolkit". | |
52 | ||
a8e99754 | 53 | The toolkit is tightly coupled with {pve}. That means that it is aware |
4a2ae9ed DM |
54 | of the cluster setup, and it can use the same network and storage |
55 | resources as fully virtualized VMs. You can even use the {pve} | |
56 | firewall, or manage containers using the HA framework. | |
57 | ||
58 | Our primary goal is to offer an environment as one would get from a | |
59 | VM, but without the additional overhead. We call this "System | |
60 | Containers". | |
61 | ||
70a42028 DM |
62 | NOTE: If you want to run micro-containers (with docker, rct, ...), it |
63 | is best to run them inside a VM. | |
4a2ae9ed DM |
64 | |
65 | ||
66 | Security Considerations | |
67 | ----------------------- | |
68 | ||
69 | Containers use the same kernel as the host, so there is a big attack | |
70 | surface for malicious users. You should consider this fact if you | |
71 | provide containers to totally untrusted people. In general, fully | |
a8e99754 | 72 | virtualized VMs provide better isolation. |
4a2ae9ed DM |
73 | |
74 | The good news is that LXC uses many kernel security features like | |
75 | AppArmor, CGroups and PID and user namespaces, which makes containers | |
76 | usage quite secure. We distinguish two types of containers: | |
77 | ||
78 | Privileged containers | |
79 | ~~~~~~~~~~~~~~~~~~~~~ | |
80 | ||
81 | Security is done by dropping capabilities, using mandatory access | |
82 | control (AppArmor), SecComp filters and namespaces. The LXC team | |
83 | considers this kind of container as unsafe, and they will not consider | |
84 | new container escape exploits to be security issues worthy of a CVE | |
85 | and quick fix. So you should use this kind of containers only inside a | |
86 | trusted environment, or when no untrusted task is running as root in | |
87 | the container. | |
88 | ||
89 | Unprivileged containers | |
90 | ~~~~~~~~~~~~~~~~~~~~~~~ | |
91 | ||
a8e99754 | 92 | This kind of containers use a new kernel feature called user |
4a2ae9ed DM |
93 | namespaces. The root uid 0 inside the container is mapped to an |
94 | unprivileged user outside the container. This means that most security | |
95 | issues (container escape, resource abuse, ...) in those containers | |
96 | will affect a random unprivileged user, and so would be a generic | |
a8e99754 | 97 | kernel security bug rather than an LXC issue. The LXC team thinks |
4a2ae9ed DM |
98 | unprivileged containers are safe by design. |
99 | ||
7fc230db DM |
100 | |
101 | Configuration | |
102 | ------------- | |
103 | ||
166e63d6 FG |
104 | The '/etc/pve/lxc/<CTID>.conf' file stores container configuration, |
105 | where '<CTID>' is the numeric ID of the given container. Like all | |
106 | other files stored inside '/etc/pve/', they get automatically | |
107 | replicated to all other cluster nodes. | |
108 | ||
109 | NOTE: CTIDs < 100 are reserved for internal purposes, and CTIDs need to be | |
110 | unique cluster wide. | |
7fc230db | 111 | |
105bc8f1 DM |
112 | .Example Container Configuration |
113 | ---- | |
114 | ostype: debian | |
115 | arch: amd64 | |
116 | hostname: www | |
117 | memory: 512 | |
118 | swap: 512 | |
119 | net0: bridge=vmbr0,hwaddr=66:64:66:64:64:36,ip=dhcp,name=eth0,type=veth | |
120 | rootfs: local:107/vm-107-disk-1.raw,size=7G | |
121 | ---- | |
122 | ||
7fc230db | 123 | Those configuration files are simple text files, and you can edit them |
55fb2a21 DM |
124 | using a normal text editor ('vi', 'nano', ...). This is sometimes |
125 | useful to do small corrections, but keep in mind that you need to | |
126 | restart the container to apply such changes. | |
127 | ||
128 | For that reason, it is usually better to use the 'pct' command to | |
129 | generate and modify those files, or do the whole thing using the GUI. | |
130 | Our toolkit is smart enough to instantaneously apply most changes to | |
105bc8f1 DM |
131 | running containers. This feature is called "hot plug", and there is no |
132 | need to restart the container in that case. | |
7fc230db DM |
133 | |
134 | File Format | |
135 | ~~~~~~~~~~~ | |
136 | ||
137 | Container configuration files use a simple colon separated key/value | |
138 | format. Each line has the following format: | |
139 | ||
140 | # this is a comment | |
141 | OPTION: value | |
142 | ||
143 | Blank lines in those files are ignored, and lines starting with a '#' | |
144 | character are treated as comments and are also ignored. | |
145 | ||
146 | It is possible to add low-level, LXC style configuration directly, for | |
147 | example: | |
148 | ||
149 | lxc.init_cmd: /sbin/my_own_init | |
150 | ||
151 | or | |
152 | ||
153 | lxc.init_cmd = /sbin/my_own_init | |
154 | ||
155 | Those settings are directly passed to the LXC low-level tools. | |
156 | ||
105bc8f1 DM |
157 | Snapshots |
158 | ~~~~~~~~~ | |
159 | ||
160 | When you create a snapshot, 'pct' stores the configuration at snapshot | |
161 | time into a separate snapshot section within the same configuration | |
162 | file. For example, after creating a snapshot called 'testsnapshot', | |
163 | your configuration file will look like this: | |
164 | ||
165 | .Container Configuration with Snapshot | |
166 | ---- | |
167 | memory: 512 | |
168 | swap: 512 | |
169 | parent: testsnaphot | |
170 | ... | |
171 | ||
172 | [testsnaphot] | |
173 | memory: 512 | |
174 | swap: 512 | |
175 | snaptime: 1457170803 | |
176 | ... | |
177 | ---- | |
178 | ||
a8e99754 FG |
179 | There are a few snapshot related properties like 'parent' and |
180 | 'snaptime'. The 'parent' property is used to store the parent/child | |
105bc8f1 DM |
181 | relationship between snapshots. 'snaptime' is the snapshot creation |
182 | time stamp (unix epoch). | |
7fc230db | 183 | |
3f13c1c3 DM |
184 | Guest Operating System Configuration |
185 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
186 | ||
187 | We normally try to detect the operating system type inside the | |
188 | container, and then modify some files inside the container to make | |
189 | them work as expected. Here is a short list of things we do at | |
190 | container startup: | |
191 | ||
192 | set /etc/hostname:: to set the container name | |
193 | ||
a8e99754 | 194 | modify /etc/hosts:: to allow lookup of the local hostname |
3f13c1c3 DM |
195 | |
196 | network setup:: pass the complete network setup to the container | |
197 | ||
198 | configure DNS:: pass information about DNS servers | |
199 | ||
a8e99754 | 200 | adapt the init system:: for example, fix the number of spawned getty processes |
3f13c1c3 DM |
201 | |
202 | set the root password:: when creating a new container | |
203 | ||
204 | rewrite ssh_host_keys:: so that each container has unique keys | |
205 | ||
a8e99754 | 206 | randomize crontab:: so that cron does not start at the same time on all containers |
3f13c1c3 | 207 | |
25535d34 WB |
208 | Changes made by {PVE} are enclosed by comment markers: |
209 | ||
37638f59 DM |
210 | ---- |
211 | # --- BEGIN PVE --- | |
212 | <data> | |
213 | # --- END PVE --- | |
214 | ---- | |
25535d34 | 215 | |
37638f59 DM |
216 | Those markers will be inserted at a reasonable location in the |
217 | file. If such a section already exists, it will be updated in place | |
218 | and will not be moved. | |
25535d34 | 219 | |
37638f59 DM |
220 | Modification of a file can be prevented by adding a `.pve-ignore.` |
221 | file for it. For instance, if the file `/etc/.pve-ignore.hosts` | |
222 | exists then the `/etc/hosts` file will not be touched. This can be a | |
223 | simple empty file creatd via: | |
25535d34 WB |
224 | |
225 | # touch /etc/.pve-ignore.hosts | |
226 | ||
37638f59 DM |
227 | Most modifications are OS dependent, so they differ between different |
228 | distributions and versions. You can completely disable modifications | |
229 | by manually setting the 'ostype' to 'unmanaged'. | |
3f13c1c3 DM |
230 | |
231 | OS type detection is done by testing for certain files inside the | |
232 | container: | |
233 | ||
234 | Ubuntu:: inspect /etc/lsb-release ('DISTRIB_ID=Ubuntu') | |
235 | ||
236 | Debian:: test /etc/debian_version | |
237 | ||
238 | Fedora:: test /etc/fedora-release | |
239 | ||
240 | RedHat or CentOS:: test /etc/redhat-release | |
241 | ||
242 | ArchLinux:: test /etc/arch-release | |
243 | ||
244 | Alpine:: test /etc/alpine-release | |
245 | ||
a8e99754 | 246 | NOTE: Container start fails if the configured 'ostype' differs from the auto |
3f13c1c3 DM |
247 | detected type. |
248 | ||
a7f36905 DM |
249 | Options |
250 | ~~~~~~~ | |
251 | ||
252 | include::pct.conf.5-opts.adoc[] | |
253 | ||
d61bab51 DM |
254 | |
255 | Container Images | |
256 | ---------------- | |
257 | ||
a8e99754 FG |
258 | Container Images, sometimes also referred to as "templates" or |
259 | "appliances", are 'tar' archives which contain everything to run a | |
d61bab51 DM |
260 | container. You can think of it as a tidy container backup. Like most |
261 | modern container toolkits, 'pct' uses those images when you create a | |
262 | new container, for example: | |
263 | ||
264 | pct create 999 local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz | |
265 | ||
266 | Proxmox itself ships a set of basic templates for most common | |
267 | operating systems, and you can download them using the 'pveam' (short | |
268 | for {pve} Appliance Manager) command line utility. You can also | |
269 | download https://www.turnkeylinux.org/[TurnKey Linux] containers using | |
270 | that tool (or the graphical user interface). | |
271 | ||
3a6fa247 DM |
272 | Our image repositories contain a list of available images, and there |
273 | is a cron job run each day to download that list. You can trigger that | |
274 | update manually with: | |
275 | ||
276 | pveam update | |
277 | ||
278 | After that you can view the list of available images using: | |
279 | ||
280 | pveam available | |
281 | ||
282 | You can restrict this large list by specifying the 'section' you are | |
283 | interested in, for example basic 'system' images: | |
284 | ||
285 | .List available system images | |
286 | ---- | |
287 | # pveam available --section system | |
288 | system archlinux-base_2015-24-29-1_x86_64.tar.gz | |
289 | system centos-7-default_20160205_amd64.tar.xz | |
290 | system debian-6.0-standard_6.0-7_amd64.tar.gz | |
291 | system debian-7.0-standard_7.0-3_amd64.tar.gz | |
292 | system debian-8.0-standard_8.0-1_amd64.tar.gz | |
293 | system ubuntu-12.04-standard_12.04-1_amd64.tar.gz | |
294 | system ubuntu-14.04-standard_14.04-1_amd64.tar.gz | |
295 | system ubuntu-15.04-standard_15.04-1_amd64.tar.gz | |
296 | system ubuntu-15.10-standard_15.10-1_amd64.tar.gz | |
297 | ---- | |
298 | ||
a8e99754 | 299 | Before you can use such a template, you need to download them into one |
3a6fa247 DM |
300 | of your storages. You can simply use storage 'local' for that |
301 | purpose. For clustered installations, it is preferred to use a shared | |
302 | storage so that all nodes can access those images. | |
303 | ||
304 | pveam download local debian-8.0-standard_8.0-1_amd64.tar.gz | |
305 | ||
24f73a63 DM |
306 | You are now ready to create containers using that image, and you can |
307 | list all downloaded images on storage 'local' with: | |
308 | ||
309 | ---- | |
310 | # pveam list local | |
311 | local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz 190.20MB | |
312 | ---- | |
313 | ||
a8e99754 | 314 | The above command shows you the full {pve} volume identifiers. They include |
24f73a63 DM |
315 | the storage name, and most other {pve} commands can use them. For |
316 | examply you can delete that image later with: | |
317 | ||
318 | pveam remove local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz | |
3a6fa247 | 319 | |
d61bab51 | 320 | |
70a42028 DM |
321 | Container Storage |
322 | ----------------- | |
323 | ||
324 | Traditional containers use a very simple storage model, only allowing | |
325 | a single mount point, the root file system. This was further | |
326 | restricted to specific file system types like 'ext4' and 'nfs'. | |
327 | Additional mounts are often done by user provided scripts. This turend | |
a8e99754 | 328 | out to be complex and error prone, so we try to avoid that now. |
70a42028 DM |
329 | |
330 | Our new LXC based container model is more flexible regarding | |
331 | storage. First, you can have more than a single mount point. This | |
332 | allows you to choose a suitable storage for each application. For | |
333 | example, you can use a relatively slow (and thus cheap) storage for | |
334 | the container root file system. Then you can use a second mount point | |
335 | to mount a very fast, distributed storage for your database | |
336 | application. | |
337 | ||
338 | The second big improvement is that you can use any storage type | |
339 | supported by the {pve} storage library. That means that you can store | |
340 | your containers on local 'lvmthin' or 'zfs', shared 'iSCSI' storage, | |
a8e99754 | 341 | or even on distributed storage systems like 'ceph'. It also enables us |
70a42028 | 342 | to use advanced storage features like snapshots and clones. 'vzdump' |
a8e99754 | 343 | can also use the snapshot feature to provide consistent container |
70a42028 DM |
344 | backups. |
345 | ||
346 | Last but not least, you can also mount local devices directly, or | |
347 | mount local directories using bind mounts. That way you can access | |
348 | local storage inside containers with zero overhead. Such bind mounts | |
a8e99754 | 349 | also provide an easy way to share data between different containers. |
70a42028 | 350 | |
eeecce95 | 351 | |
9e44e493 DM |
352 | Mount Points |
353 | ~~~~~~~~~~~~ | |
eeecce95 | 354 | |
9e44e493 DM |
355 | Beside the root directory the container can also have additional mount points. |
356 | Currently there are basically three types of mount points: storage backed | |
357 | mount points, bind mounts and device mounts. | |
358 | ||
359 | Storage backed mount points are managed by the {pve} storage subsystem and come | |
eeecce95 WB |
360 | in three different flavors: |
361 | ||
362 | - Image based: These are raw images containing a single ext4 formatted file | |
363 | system. | |
364 | - ZFS Subvolumes: These are technically bind mounts, but with managed storage, | |
365 | and thus allow resizing and snapshotting. | |
366 | - Directories: passing `size=0` triggers a special case where instead of a raw | |
367 | image a directory is created. | |
368 | ||
369 | Bind mounts are considered to not be managed by the storage subsystem, so you | |
370 | cannot make snapshots or deal with quotas from inside the container, and with | |
371 | unprivileged containers you might run into permission problems caused by the | |
372 | user mapping, and cannot use ACLs from inside an unprivileged container. | |
373 | ||
374 | Similarly device mounts are not managed by the storage, but for these the | |
375 | `quota` and `acl` options will be honored. | |
376 | ||
22a74065 FG |
377 | WARNING: Because of existing issues in the Linux kernel's freezer |
378 | subsystem the usage of FUSE mounts inside a container is strongly | |
379 | advised against, as containers need to be frozen for suspend or | |
380 | snapshot mode backups. If FUSE mounts cannot be replaced by other | |
381 | mounting mechanisms or storage technologies, it is possible to | |
382 | establish the FUSE mount on the Proxmox host and use a bind | |
9e44e493 | 383 | mount point to make it accessible inside the container. |
eeecce95 | 384 | |
6b707f2c FG |
385 | WARNING: For security reasons, bind mounts should only be established |
386 | using source directories especially reserved for this purpose, e.g., a | |
387 | directory hierarchy under `/mnt/bindmounts`. Never bind mount system | |
388 | directories like `/`, `/var` or `/etc` into a container - this poses a | |
389 | great security risk. The bind mount source path must not contain any symlinks. | |
390 | ||
9e44e493 | 391 | The root mount point is configured with the 'rootfs' property, and you can |
fe154a4f DM |
392 | configure up to 10 additional mount points. The corresponding options |
393 | are called 'mp0' to 'mp9', and they can contain the following setting: | |
394 | ||
395 | include::pct-mountpoint-opts.adoc[] | |
396 | ||
397 | .Typical Container 'rootfs' configuration | |
398 | ---- | |
399 | rootfs: thin1:base-100-disk-1,size=8G | |
400 | ---- | |
401 | ||
d6ed3622 | 402 | Using quotas inside containers |
04c569f6 | 403 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
d6ed3622 | 404 | |
9e44e493 DM |
405 | Quotas allow to set limits inside a container for the amount of disk |
406 | space that each user can use. This only works on ext4 image based | |
407 | storage types and currently does not work with unprivileged | |
408 | containers. | |
d6ed3622 | 409 | |
9e44e493 DM |
410 | Activating the `quota` option causes the following mount options to be |
411 | used for a mount point: | |
412 | `usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0` | |
d6ed3622 | 413 | |
9e44e493 DM |
414 | This allows quotas to be used like you would on any other system. You |
415 | can initialize the `/aquota.user` and `/aquota.group` files by running | |
d6ed3622 | 416 | |
9e44e493 DM |
417 | ---- |
418 | quotacheck -cmug / | |
419 | quotaon / | |
420 | ---- | |
d6ed3622 | 421 | |
166e63d6 FG |
422 | and edit the quotas via the `edquota` command. Refer to the documentation |
423 | of the distribution running inside the container for details. | |
424 | ||
9e44e493 DM |
425 | NOTE: You need to run the above commands for every mount point by passing |
426 | the mount point's path instead of just `/`. | |
427 | ||
d6ed3622 | 428 | |
6c60aebf | 429 | Using ACLs inside containers |
04c569f6 | 430 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
6c60aebf EK |
431 | |
432 | The standard Posix Access Control Lists are also available inside containers. | |
433 | ACLs allow you to set more detailed file ownership than the traditional user/ | |
434 | group/others model. | |
d6ed3622 | 435 | |
04c569f6 DM |
436 | |
437 | Container Network | |
438 | ----------------- | |
439 | ||
bac8c385 DM |
440 | You can configure up to 10 network interfaces for a single |
441 | container. The corresponding options are called 'net0' to 'net9', and | |
442 | they can contain the following setting: | |
443 | ||
444 | include::pct-network-opts.adoc[] | |
04c569f6 DM |
445 | |
446 | ||
447 | Managing Containers with 'pct' | |
448 | ------------------------------ | |
449 | ||
450 | 'pct' is the tool to manage Linux Containers on {pve}. You can create | |
451 | and destroy containers, and control execution (start, stop, migrate, | |
452 | ...). You can use pct to set parameters in the associated config file, | |
453 | like network configuration or memory limits. | |
454 | ||
455 | CLI Usage Examples | |
456 | ~~~~~~~~~~~~~~~~~~ | |
457 | ||
458 | Create a container based on a Debian template (provided you have | |
459 | already downloaded the template via the webgui) | |
460 | ||
461 | pct create 100 /var/lib/vz/template/cache/debian-8.0-standard_8.0-1_amd64.tar.gz | |
462 | ||
463 | Start container 100 | |
464 | ||
465 | pct start 100 | |
466 | ||
467 | Start a login session via getty | |
468 | ||
469 | pct console 100 | |
470 | ||
471 | Enter the LXC namespace and run a shell as root user | |
472 | ||
473 | pct enter 100 | |
474 | ||
475 | Display the configuration | |
476 | ||
477 | pct config 100 | |
478 | ||
479 | Add a network interface called eth0, bridged to the host bridge vmbr0, | |
480 | set the address and gateway, while it's running | |
481 | ||
482 | pct set 100 -net0 name=eth0,bridge=vmbr0,ip=192.168.15.147/24,gw=192.168.15.1 | |
483 | ||
484 | Reduce the memory of the container to 512MB | |
485 | ||
0585f29a DM |
486 | pct set 100 -memory 512 |
487 | ||
04c569f6 DM |
488 | |
489 | Files | |
490 | ------ | |
491 | ||
492 | '/etc/pve/lxc/<CTID>.conf':: | |
493 | ||
494 | Configuration file for the container '<CTID>'. | |
495 | ||
496 | ||
0c6b782f DM |
497 | Container Advantages |
498 | -------------------- | |
499 | ||
500 | - Simple, and fully integrated into {pve}. Setup looks similar to a normal | |
501 | VM setup. | |
502 | ||
503 | * Storage (ZFS, LVM, NFS, Ceph, ...) | |
504 | ||
505 | * Network | |
506 | ||
507 | * Authentification | |
508 | ||
509 | * Cluster | |
510 | ||
511 | - Fast: minimal overhead, as fast as bare metal | |
512 | ||
513 | - High density (perfect for idle workloads) | |
514 | ||
515 | - REST API | |
516 | ||
517 | - Direct hardware access | |
518 | ||
519 | ||
520 | Technology Overview | |
521 | ------------------- | |
522 | ||
523 | - Integrated into {pve} graphical user interface (GUI) | |
524 | ||
525 | - LXC (https://linuxcontainers.org/) | |
526 | ||
527 | - cgmanager for cgroup management | |
528 | ||
529 | - lxcfs to provive containerized /proc file system | |
530 | ||
531 | - apparmor | |
532 | ||
533 | - CRIU: for live migration (planned) | |
534 | ||
11f340ff | 535 | - We use latest available kernels (4.4.X) |
0c6b782f | 536 | |
a8e99754 | 537 | - Image based deployment (templates) |
0c6b782f DM |
538 | |
539 | - Container setup from host (Network, DNS, Storage, ...) | |
540 | ||
541 | ||
542 | ifdef::manvolnum[] | |
543 | include::pve-copyright.adoc[] | |
544 | endif::manvolnum[] | |
545 | ||
546 | ||
547 | ||
548 | ||
549 | ||
550 | ||
551 |