X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=pct.adoc;h=c5ed243b19a05ef9760d81c2499b74d02f4bfc11;hp=77c33e1d42c89a6d4915c636492a084922fc48cd;hb=0f7778ace592f7fc3012af391fc918ef5db40061;hpb=d61bab519caa23eb762408b4b105a030d816abb8 diff --git a/pct.adoc b/pct.adoc index 77c33e1..c5ed243 100644 --- a/pct.adoc +++ b/pct.adoc @@ -1,7 +1,8 @@ +[[chapter_pct]] ifdef::manvolnum[] -PVE({manvolnum}) -================ -include::attributes.txt[] +pct(1) +====== +:pve-toplevel: NAME ---- @@ -9,7 +10,7 @@ NAME pct - Tool to manage Linux Containers (LXC) on Proxmox VE -SYNOPSYS +SYNOPSIS -------- include::pct.1-synopsis.adoc[] @@ -21,196 +22,615 @@ endif::manvolnum[] ifndef::manvolnum[] Proxmox Container Toolkit ========================= -include::attributes.txt[] +:pve-toplevel: endif::manvolnum[] +ifdef::wiki[] +:title: Linux Container +endif::wiki[] +Containers are a lightweight alternative to fully virtualized machines (VMs). +They use the kernel of the host system that they run on, instead of emulating a +full operating system (OS). This means that containers can access resources on +the host system directly. -Containers are a lightweight alternative to fully virtualized -VMs. Instead of emulating a complete Operating System (OS), containers -simply use the OS of the host they run on. This implies that all -containers use the same kernel, and that they can access resources -from the host directly. +The runtime costs for containers is low, usually negligible. However, there are +some drawbacks that need be considered: -This is great because containers do not waste CPU power nor memory due -to kernel emulation. Container run-time costs are close to zero and -usually negligible. But there are also some drawbacks you need to -consider: +* Only Linux distributions can be run in Proxmox Containers. It is not possible to run + other operating systems like, for example, FreeBSD or Microsoft Windows + inside a container. -* You can only run Linux based OS inside containers, i.e. it is not - possible to run Free BSD or MS Windows inside. +* For security reasons, access to host resources needs to be restricted. + Therefore, containers run in their own separate namespaces. Additionally some + syscalls (user space requests to the Linux kernel) are not allowed within containers. -* For security reasons, access to host resources need to be - restricted. This is done with AppArmor, SecComp filters and other - kernel feature. Be prepared that some syscalls are not allowed - inside containers. +{pve} uses https://linuxcontainers.org/lxc/introduction/[Linux Containers (LXC)] as its underlying +container technology. The ``Proxmox Container Toolkit'' (`pct`) simplifies the +usage and management of LXC, by providing an interface that abstracts +complex tasks. -{pve} uses https://linuxcontainers.org/[LXC] as underlying container -technology. We consider LXC as low-level library, which provides -countless options. It would be to difficult to use those tools -directly. Instead, we provide a small wrapper called `pct`, the -"Proxmox Container Toolkit". +Containers are tightly integrated with {pve}. This means that they are aware of +the cluster setup, and they can use the same network and storage resources as +virtual machines. You can also use the {pve} firewall, or manage containers +using the HA framework. -The toolkit it tightly coupled with {pve}. That means that it is aware -of the cluster setup, and it can use the same network and storage -resources as fully virtualized VMs. You can even use the {pve} -firewall, or manage containers using the HA framework. +Our primary goal is to offer an environment that provides the benefits of using a +VM, but without the additional overhead. This means that Proxmox Containers can +be categorized as ``System Containers'', rather than ``Application Containers''. -Our primary goal is to offer an environment as one would get from a -VM, but without the additional overhead. We call this "System -Containers". +NOTE: If you want to run application containers, for example, 'Docker' images, it +is recommended that you run them inside a Proxmox Qemu VM. This will give you +all the advantages of application containerization, while also providing the +benefits that VMs offer, such as strong isolation from the host and the ability +to live-migrate, which otherwise isn't possible with containers. -NOTE: If you want to run micro-containers (with docker, rct, ...), it -is best to run them inside a VM. +Technology Overview +------------------- -Security Considerations ------------------------ +* LXC (https://linuxcontainers.org/) -Containers use the same kernel as the host, so there is a big attack -surface for malicious users. You should consider this fact if you -provide containers to totally untrusted people. In general, fully -virtualized VM provides better isolation. +* Integrated into {pve} graphical web user interface (GUI) -The good news is that LXC uses many kernel security features like -AppArmor, CGroups and PID and user namespaces, which makes containers -usage quite secure. We distinguish two types of containers: +* Easy to use command line tool `pct` -Privileged containers -~~~~~~~~~~~~~~~~~~~~~ +* Access via {pve} REST API -Security is done by dropping capabilities, using mandatory access -control (AppArmor), SecComp filters and namespaces. The LXC team -considers this kind of container as unsafe, and they will not consider -new container escape exploits to be security issues worthy of a CVE -and quick fix. So you should use this kind of containers only inside a -trusted environment, or when no untrusted task is running as root in -the container. +* 'lxcfs' to provide containerized /proc file system -Unprivileged containers -~~~~~~~~~~~~~~~~~~~~~~~ +* Control groups ('cgroups') for resource isolation and limitation -This kind of containers use a new kernel feature, called user -namespaces. The root uid 0 inside the container is mapped to an -unprivileged user outside the container. This means that most security -issues (container escape, resource abuse, ...) in those containers -will affect a random unprivileged user, and so would be a generic -kernel security bug rather than a LXC issue. LXC people think -unprivileged containers are safe by design. +* 'AppArmor' and 'seccomp' to improve security +* Modern Linux kernels -Configuration -------------- +* Image based deployment (templates) -The '/etc/pve/lxc/.conf' files stores container configuration, -where '' is the numeric ID of the given container. Note that -CTIDs < 100 are reserved for internal purposes, and CTIDs need to be -cluster wide unique. Files are stored inside '/etc/pve/', so they get -automatically replicated to all other cluster nodes. +* Uses {pve} xref:chapter_storage[storage library] + +* Container setup from host (network, DNS, storage, etc.) + + +[[pct_container_images]] +Container Images +---------------- + +Container images, sometimes also referred to as ``templates'' or +``appliances'', are `tar` archives which contain everything to run a container. + +{pve} itself provides a variety of basic templates for the most common Linux +distributions. They can be downloaded using the GUI or the `pveam` (short for +{pve} Appliance Manager) command line utility. +Additionally, https://www.turnkeylinux.org/[TurnKey Linux] container templates +are also available to download. + +The list of available templates is updated daily through the 'pve-daily-update' +timer. You can also trigger an update manually by executing: -.Example Container Configuration ---- -ostype: debian -arch: amd64 -hostname: www -memory: 512 -swap: 512 -net0: bridge=vmbr0,hwaddr=66:64:66:64:64:36,ip=dhcp,name=eth0,type=veth -rootfs: local:107/vm-107-disk-1.raw,size=7G +# pveam update ---- -Those configuration files are simple text files, and you can edit them -using a normal text editor ('vi', 'nano', ...). This is sometimes -useful to do small corrections, but keep in mind that you need to -restart the container to apply such changes. +To view the list of available images run: -For that reason, it is usually better to use the 'pct' command to -generate and modify those files, or do the whole thing using the GUI. -Our toolkit is smart enough to instantaneously apply most changes to -running containers. This feature is called "hot plug", and there is no -need to restart the container in that case. +---- +# pveam available +---- -File Format +You can restrict this large list by specifying the `section` you are +interested in, for example basic `system` images: + +.List available system images +---- +# pveam available --section system +system alpine-3.12-default_20200823_amd64.tar.xz +system alpine-3.13-default_20210419_amd64.tar.xz +system alpine-3.14-default_20210623_amd64.tar.xz +system archlinux-base_20210420-1_amd64.tar.gz +system centos-7-default_20190926_amd64.tar.xz +system centos-8-default_20201210_amd64.tar.xz +system debian-9.0-standard_9.7-1_amd64.tar.gz +system debian-10-standard_10.7-1_amd64.tar.gz +system devuan-3.0-standard_3.0_amd64.tar.gz +system fedora-33-default_20201115_amd64.tar.xz +system fedora-34-default_20210427_amd64.tar.xz +system gentoo-current-default_20200310_amd64.tar.xz +system opensuse-15.2-default_20200824_amd64.tar.xz +system ubuntu-16.04-standard_16.04.5-1_amd64.tar.gz +system ubuntu-18.04-standard_18.04.1-1_amd64.tar.gz +system ubuntu-20.04-standard_20.04-1_amd64.tar.gz +system ubuntu-20.10-standard_20.10-1_amd64.tar.gz +system ubuntu-21.04-standard_21.04-1_amd64.tar.gz +---- + +Before you can use such a template, you need to download them into one of your +storages. If you're unsure to which one, you can simply use the `local` named +storage for that purpose. For clustered installations, it is preferred to use a +shared storage so that all nodes can access those images. + +---- +# pveam download local debian-10.0-standard_10.0-1_amd64.tar.gz +---- + +You are now ready to create containers using that image, and you can list all +downloaded images on storage `local` with: + +---- +# pveam list local +local:vztmpl/debian-10.0-standard_10.0-1_amd64.tar.gz 219.95MB +---- + +TIP: You can also use the {pve} web interface GUI to download, list and delete +container templates. + +`pct` uses them to create a new container, for example: + +---- +# pct create 999 local:vztmpl/debian-10.0-standard_10.0-1_amd64.tar.gz +---- + +The above command shows you the full {pve} volume identifiers. They include the +storage name, and most other {pve} commands can use them. For example you can +delete that image later with: + +---- +# pveam remove local:vztmpl/debian-10.0-standard_10.0-1_amd64.tar.gz +---- + + +[[pct_settings]] +Container Settings +------------------ + +[[pct_general]] +General Settings +~~~~~~~~~~~~~~~~ + +[thumbnail="screenshot/gui-create-ct-general.png"] + +General settings of a container include + +* the *Node* : the physical server on which the container will run +* the *CT ID*: a unique number in this {pve} installation used to identify your + container +* *Hostname*: the hostname of the container +* *Resource Pool*: a logical group of containers and VMs +* *Password*: the root password of the container +* *SSH Public Key*: a public key for connecting to the root account over SSH +* *Unprivileged container*: this option allows to choose at creation time + if you want to create a privileged or unprivileged container. + +Unprivileged Containers +^^^^^^^^^^^^^^^^^^^^^^^ + +Unprivileged containers use a new kernel feature called user namespaces. +The root UID 0 inside the container is mapped to an unprivileged user outside +the container. This means that most security issues (container escape, resource +abuse, etc.) in these containers will affect a random unprivileged user, and +would be a generic kernel security bug rather than an LXC issue. The LXC team +thinks unprivileged containers are safe by design. + +This is the default option when creating a new container. + +NOTE: If the container uses systemd as an init system, please be aware the +systemd version running inside the container should be equal to or greater than +220. + + +Privileged Containers +^^^^^^^^^^^^^^^^^^^^^ + +Security in containers is achieved by using mandatory access control 'AppArmor' +restrictions, 'seccomp' filters and Linux kernel namespaces. The LXC team +considers this kind of container as unsafe, and they will not consider new +container escape exploits to be security issues worthy of a CVE and quick fix. +That's why privileged containers should only be used in trusted environments. + + +[[pct_cpu]] +CPU +~~~ + +[thumbnail="screenshot/gui-create-ct-cpu.png"] + +You can restrict the number of visible CPUs inside the container using the +`cores` option. This is implemented using the Linux 'cpuset' cgroup +(**c**ontrol *group*). +A special task inside `pvestatd` tries to distribute running containers among +available CPUs periodically. +To view the assigned CPUs run the following command: + +---- +# pct cpusets + --------------------- + 102: 6 7 + 105: 2 3 4 5 + 108: 0 1 + --------------------- +---- + +Containers use the host kernel directly. All tasks inside a container are +handled by the host CPU scheduler. {pve} uses the Linux 'CFS' (**C**ompletely +**F**air **S**cheduler) scheduler by default, which has additional bandwidth +control options. + +[horizontal] + +`cpulimit`: :: You can use this option to further limit assigned CPU time. +Please note that this is a floating point number, so it is perfectly valid to +assign two cores to a container, but restrict overall CPU consumption to half a +core. ++ +---- +cores: 2 +cpulimit: 0.5 +---- + +`cpuunits`: :: This is a relative weight passed to the kernel scheduler. The +larger the number is, the more CPU time this container gets. Number is relative +to the weights of all the other running containers. The default is 1024. You +can use this setting to prioritize some containers. + + +[[pct_memory]] +Memory +~~~~~~ + +[thumbnail="screenshot/gui-create-ct-memory.png"] + +Container memory is controlled using the cgroup memory controller. + +[horizontal] + +`memory`: :: Limit overall memory usage. This corresponds to the +`memory.limit_in_bytes` cgroup setting. + +`swap`: :: Allows the container to use additional swap memory from the host +swap space. This corresponds to the `memory.memsw.limit_in_bytes` cgroup +setting, which is set to the sum of both value (`memory + swap`). + + +[[pct_mount_points]] +Mount Points +~~~~~~~~~~~~ + +[thumbnail="screenshot/gui-create-ct-root-disk.png"] + +The root mount point is configured with the `rootfs` property. You can +configure up to 256 additional mount points. The corresponding options are +called `mp0` to `mp255`. They can contain the following settings: + +include::pct-mountpoint-opts.adoc[] + +Currently there are three types of mount points: storage backed mount points, +bind mounts, and device mounts. + +.Typical container `rootfs` configuration +---- +rootfs: thin1:base-100-disk-1,size=8G +---- + + +Storage Backed Mount Points +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Storage backed mount points are managed by the {pve} storage subsystem and come +in three different flavors: + +- Image based: these are raw images containing a single ext4 formatted file + system. +- ZFS subvolumes: these are technically bind mounts, but with managed storage, + and thus allow resizing and snapshotting. +- Directories: passing `size=0` triggers a special case where instead of a raw + image a directory is created. + +NOTE: The special option syntax `STORAGE_ID:SIZE_IN_GB` for storage backed +mount point volumes will automatically allocate a volume of the specified size +on the specified storage. For example, calling + +---- +pct set 100 -mp0 thin1:10,mp=/path/in/container +---- + +will allocate a 10GB volume on the storage `thin1` and replace the volume ID +place holder `10` with the allocated volume ID, and setup the moutpoint in the +container at `/path/in/container` + + +Bind Mount Points +^^^^^^^^^^^^^^^^^ + +Bind mounts allow you to access arbitrary directories from your Proxmox VE host +inside a container. Some potential use cases are: + +- Accessing your home directory in the guest +- Accessing an USB device directory in the guest +- Accessing an NFS mount from the host in the guest + +Bind mounts are considered to not be managed by the storage subsystem, so you +cannot make snapshots or deal with quotas from inside the container. With +unprivileged containers you might run into permission problems caused by the +user mapping and cannot use ACLs. + +NOTE: The contents of bind mount points are not backed up when using `vzdump`. + +WARNING: For security reasons, bind mounts should only be established using +source directories especially reserved for this purpose, e.g., a directory +hierarchy under `/mnt/bindmounts`. Never bind mount system directories like +`/`, `/var` or `/etc` into a container - this poses a great security risk. + +NOTE: The bind mount source path must not contain any symlinks. + +For example, to make the directory `/mnt/bindmounts/shared` accessible in the +container with ID `100` under the path `/shared`, use a configuration line like +`mp0: /mnt/bindmounts/shared,mp=/shared` in `/etc/pve/lxc/100.conf`. +Alternatively, use `pct set 100 -mp0 /mnt/bindmounts/shared,mp=/shared` to +achieve the same result. + + +Device Mount Points +^^^^^^^^^^^^^^^^^^^ + +Device mount points allow to mount block devices of the host directly into the +container. Similar to bind mounts, device mounts are not managed by {PVE}'s +storage subsystem, but the `quota` and `acl` options will be honored. + +NOTE: Device mount points should only be used under special circumstances. In +most cases a storage backed mount point offers the same performance and a lot +more features. + +NOTE: The contents of device mount points are not backed up when using +`vzdump`. + + +[[pct_container_network]] +Network +~~~~~~~ + +[thumbnail="screenshot/gui-create-ct-network.png"] + +You can configure up to 10 network interfaces for a single container. +The corresponding options are called `net0` to `net9`, and they can contain the +following setting: + +include::pct-network-opts.adoc[] + + +[[pct_startup_and_shutdown]] +Automatic Start and Shutdown of Containers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To automatically start a container when the host system boots, select the +option 'Start at boot' in the 'Options' panel of the container in the web +interface or run the following command: + +---- +# pct set CTID -onboot 1 +---- + +.Start and Shutdown Order +// use the screenshot from qemu - its the same +[thumbnail="screenshot/gui-qemu-edit-start-order.png"] + +If you want to fine tune the boot order of your containers, you can use the +following parameters: + +* *Start/Shutdown order*: Defines the start order priority. For example, set it + to 1 if you want the CT to be the first to be started. (We use the reverse + startup order for shutdown, so a container with a start order of 1 would be + the last to be shut down) +* *Startup delay*: Defines the interval between this container start and + subsequent containers starts. For example, set it to 240 if you want to wait + 240 seconds before starting other containers. +* *Shutdown timeout*: Defines the duration in seconds {pve} should wait + for the container to be offline after issuing a shutdown command. + By default this value is set to 60, which means that {pve} will issue a + shutdown request, wait 60s for the machine to be offline, and if after 60s + the machine is still online will notify that the shutdown action failed. + +Please note that containers without a Start/Shutdown order parameter will +always start after those where the parameter is set, and this parameter only +makes sense between the machines running locally on a host, and not +cluster-wide. + +If you require a delay between the host boot and the booting of the first +container, see the section on +xref:first_guest_boot_delay[Proxmox VE Node Management]. + + +Hookscripts ~~~~~~~~~~~ -Container configuration files use a simple colon separated key/value -format. Each line has the following format: +You can add a hook script to CTs with the config property `hookscript`. - # this is a comment - OPTION: value +---- +# pct set 100 -hookscript local:snippets/hookscript.pl +---- -Blank lines in those files are ignored, and lines starting with a '#' -character are treated as comments and are also ignored. +It will be called during various phases of the guests lifetime. For an example +and documentation see the example script under +`/usr/share/pve-docs/examples/guest-example-hookscript.pl`. -It is possible to add low-level, LXC style configuration directly, for -example: +Security Considerations +----------------------- - lxc.init_cmd: /sbin/my_own_init +Containers use the kernel of the host system. This exposes an attack surface +for malicious users. In general, full virtual machines provide better +isolation. This should be considered if containers are provided to unknown or +untrusted people. -or +To reduce the attack surface, LXC uses many security features like AppArmor, +CGroups and kernel namespaces. - lxc.init_cmd = /sbin/my_own_init +AppArmor +~~~~~~~~ -Those settings are directly passed to the LXC low-level tools. +AppArmor profiles are used to restrict access to possibly dangerous actions. +Some system calls, i.e. `mount`, are prohibited from execution. -Snapshots -~~~~~~~~~ +To trace AppArmor activity, use: + +---- +# dmesg | grep apparmor +---- -When you create a snapshot, 'pct' stores the configuration at snapshot -time into a separate snapshot section within the same configuration -file. For example, after creating a snapshot called 'testsnapshot', -your configuration file will look like this: +Although it is not recommended, AppArmor can be disabled for a container. This +brings security risks with it. Some syscalls can lead to privilege escalation +when executed within a container if the system is misconfigured or if a LXC or +Linux Kernel vulnerability exists. + +To disable AppArmor for a container, add the following line to the container +configuration file located at `/etc/pve/lxc/CTID.conf`: -.Container Configuration with Snapshot ---- -memory: 512 -swap: 512 -parent: testsnaphot -... +lxc.apparmor.profile = unconfined +---- -[testsnaphot] -memory: 512 -swap: 512 -snaptime: 1457170803 -... +WARNING: Please note that this is not recommended for production use. + + +[[pct_cgroup]] +Control Groups ('cgroup') +~~~~~~~~~~~~~~~~~~~~~~~~~ + +'cgroup' is a kernel +mechanism used to hierarchically organize processes and distribute system +resources. + +The main resources controlled via 'cgroups' are CPU time, memory and swap +limits, and access to device nodes. 'cgroups' are also used to "freeze" a +container before taking snapshots. + +There are 2 versions of 'cgroups' currently available, +https://www.kernel.org/doc/html/v5.11/admin-guide/cgroup-v1/index.html[legacy] +and +https://www.kernel.org/doc/html/v5.11/admin-guide/cgroup-v2.html['cgroupv2']. + +Since {pve} 7.0, the default is a pure 'cgroupv2' environment. Previously a +"hybrid" setup was used, where resource control was mainly done in 'cgroupv1' +with an additional 'cgroupv2' controller which could take over some subsystems +via the 'cgroup_no_v1' kernel command line parameter. (See the +https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html[kernel +parameter documentation] for details.) + +[[pct_cgroup_compat]] +CGroup Version Compatibility +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The main difference between pure 'cgroupv2' and the old hybrid environments +regarding {pve} is that with 'cgroupv2' memory and swap are now controlled +independently. The memory and swap settings for containers can map directly to +these values, whereas previously only the memory limit and the limit of the +*sum* of memory and swap could be limited. + +Another important difference is that the 'devices' controller is configured in a +completely different way. Because of this, file system quotas are currently not +supported in a pure 'cgroupv2' environment. + +'cgroupv2' support by the container's OS is needed to run in a pure 'cgroupv2' +environment. Containers running 'systemd' version 231 or newer support +'cgroupv2' footnote:[this includes all newest major versions of container +templates shipped by {pve}], as do containers not using 'systemd' as init +system footnote:[for example Alpine Linux]. + +[NOTE] +==== +CentOS 7 and Ubuntu 16.10 are two prominent Linux distributions releases, +which have a 'systemd' version that is too old to run in a 'cgroupv2' +environment, you can either + +* Upgrade the whole distribution to a newer release. For the examples above, that + could be Ubuntu 18.04 or 20.04, and CentOS 8 (or RHEL/CentOS derivatives like + AlmaLinux or Rocky Linux). This has the benefit to get the newest bug and + security fixes, often also new features, and moving the EOL date in the future. + +* Upgrade the Containers systemd version. If the distribution provides a + backports repository this can be an easy and quick stop-gap measurement. + +* Move the container, or its services, to a Virtual Machine. Virtual Machines + have a much less interaction with the host, that's why one can install + decades old OS versions just fine there. + +* Switch back to the legacy 'cgroup' controller. Note that while it can be a + valid solution, it's not a permanent one. There's a high likelihood that a + future {pve} major release, for example 8.0, cannot support the legacy + controller anymore. +==== + +[[pct_cgroup_change_version]] +Changing CGroup Version +^^^^^^^^^^^^^^^^^^^^^^^ + +TIP: If file system quotas are not required and all containers support 'cgroupv2', +it is recommended to stick to the new default. + +To switch back to the previous version the following kernel command line +parameter can be used: + +---- +systemd.unified_cgroup_hierarchy=0 ---- -There are a view snapshot related properties like 'parent' and -'snaptime'. They 'parent' property is used to store the parent/child -relationship between snapshots. 'snaptime' is the snapshot creation -time stamp (unix epoch). +See xref:sysboot_edit_kernel_cmdline[this section] on editing the kernel boot +command line on where to add the parameter. + +// TODO: seccomp a bit more. +// TODO: pve-lxc-syscalld + Guest Operating System Configuration -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------------------ -We normally try to detect the operating system type inside the -container, and then modify some files inside the container to make -them work as expected. Here is a short list of things we do at -container startup: +{pve} tries to detect the Linux distribution in the container, and modifies +some files. Here is a short list of things done at container startup: set /etc/hostname:: to set the container name -modify /etc/hosts:: allow to lookup the local hostname +modify /etc/hosts:: to allow lookup of the local hostname network setup:: pass the complete network setup to the container configure DNS:: pass information about DNS servers -adopt the init system:: for example, fix the number os spawned getty processes +adapt the init system:: for example, fix the number of spawned getty processes set the root password:: when creating a new container rewrite ssh_host_keys:: so that each container has unique keys -randomize crontab:: so that cron does not start at same time on all containers +randomize crontab:: so that cron does not start at the same time on all containers + +Changes made by {PVE} are enclosed by comment markers: + +---- +# --- BEGIN PVE --- + +# --- END PVE --- +---- + +Those markers will be inserted at a reasonable location in the file. If such a +section already exists, it will be updated in place and will not be moved. + +Modification of a file can be prevented by adding a `.pve-ignore.` file for it. +For instance, if the file `/etc/.pve-ignore.hosts` exists then the `/etc/hosts` +file will not be touched. This can be a simple empty file created via: + +---- +# touch /etc/.pve-ignore.hosts +---- -Above task depends on the OS type, so the implementation is different -for each OS type. You can also disable any modifications by manually -setting the 'ostype' to 'unmanaged'. +Most modifications are OS dependent, so they differ between different +distributions and versions. You can completely disable modifications by +manually setting the `ostype` to `unmanaged`. OS type detection is done by testing for certain files inside the -container: +container. {pve} first checks the `/etc/os-release` file +footnote:[/etc/os-release replaces the multitude of per-distribution +release files https://manpages.debian.org/stable/systemd/os-release.5.en.html]. +If that file is not present, or it does not contain a clearly recognizable +distribution identifier the following distribution specific release files are +checked. -Ubuntu:: inspect /etc/lsb-release ('DISTRIB_ID=Ubuntu') +Ubuntu:: inspect /etc/lsb-release (`DISTRIB_ID=Ubuntu`) Debian:: test /etc/debian_version @@ -222,160 +642,438 @@ ArchLinux:: test /etc/arch-release Alpine:: test /etc/alpine-release -NOTE: Container start fails is configured 'ostype' differs from auto +Gentoo:: test /etc/gentoo-release + +NOTE: Container start fails if the configured `ostype` differs from the auto detected type. -Container Images ----------------- +[[pct_container_storage]] +Container Storage +----------------- -Container Images, somtimes also referred as "templates" or -"appliances", are 'tar' archives which contains everything to run a -container. You can think of it as a tidy container backup. Like most -modern container toolkits, 'pct' uses those images when you create a -new container, for example: +The {pve} LXC container storage model is more flexible than traditional +container storage models. A container can have multiple mount points. This +makes it possible to use the best suited storage for each application. - pct create 999 local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz +For example the root file system of the container can be on slow and cheap +storage while the database can be on fast and distributed storage via a second +mount point. See section <> for further +details. -Proxmox itself ships a set of basic templates for most common -operating systems, and you can download them using the 'pveam' (short -for {pve} Appliance Manager) command line utility. You can also -download https://www.turnkeylinux.org/[TurnKey Linux] containers using -that tool (or the graphical user interface). +Any storage type supported by the {pve} storage library can be used. This means +that containers can be stored on local (for example `lvm`, `zfs` or directory), +shared external (like `iSCSI`, `NFS`) or even distributed storage systems like +Ceph. Advanced storage features like snapshots or clones can be used if the +underlying storage supports them. The `vzdump` backup tool can use snapshots to +provide consistent container backups. +Furthermore, local devices or local directories can be mounted directly using +'bind mounts'. This gives access to local resources inside a container with +practically zero overhead. Bind mounts can be used as an easy way to share data +between containers. -Container Storage ------------------ -Traditional containers use a very simple storage model, only allowing -a single mount point, the root file system. This was further -restricted to specific file system types like 'ext4' and 'nfs'. -Additional mounts are often done by user provided scripts. This turend -out to be complex and error prone, so we trie to avoid that now. - -Our new LXC based container model is more flexible regarding -storage. First, you can have more than a single mount point. This -allows you to choose a suitable storage for each application. For -example, you can use a relatively slow (and thus cheap) storage for -the container root file system. Then you can use a second mount point -to mount a very fast, distributed storage for your database -application. - -The second big improvement is that you can use any storage type -supported by the {pve} storage library. That means that you can store -your containers on local 'lvmthin' or 'zfs', shared 'iSCSI' storage, -or even on distributed storage systems like 'ceph'. And it enables us -to use advanced storage features like snapshots and clones. 'vzdump' -can also use the snapshots feature to provide consistent container -backups. - -Last but not least, you can also mount local devices directly, or -mount local directories using bind mounts. That way you can access -local storage inside containers with zero overhead. Such bind mounts -also provides an easy way to share data between different containers. - - -Managing Containers with 'pct' +FUSE Mounts +~~~~~~~~~~~ + +WARNING: Because of existing issues in the Linux kernel's freezer subsystem the +usage of FUSE mounts inside a container is strongly advised against, as +containers need to be frozen for suspend or snapshot mode backups. + +If FUSE mounts cannot be replaced by other mounting mechanisms or storage +technologies, it is possible to establish the FUSE mount on the Proxmox host +and use a bind mount point to make it accessible inside the container. + + +Using Quotas Inside Containers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Quotas allow to set limits inside a container for the amount of disk space that +each user can use. + +NOTE: This currently requires the use of legacy 'cgroups'. + +NOTE: This only works on ext4 image based storage types and currently only +works with privileged containers. + +Activating the `quota` option causes the following mount options to be used for +a mount point: +`usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0` + +This allows quotas to be used like on any other system. You can initialize the +`/aquota.user` and `/aquota.group` files by running: + +---- +# quotacheck -cmug / +# quotaon / +---- + +Then edit the quotas using the `edquota` command. Refer to the documentation of +the distribution running inside the container for details. + +NOTE: You need to run the above commands for every mount point by passing the +mount point's path instead of just `/`. + + +Using ACLs Inside Containers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The standard Posix **A**ccess **C**ontrol **L**ists are also available inside +containers. ACLs allow you to set more detailed file ownership than the +traditional user/group/others model. + + +Backup of Container mount points +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To include a mount point in backups, enable the `backup` option for it in the +container configuration. For an existing mount point `mp0` + +---- +mp0: guests:subvol-100-disk-1,mp=/root/files,size=8G +---- + +add `backup=1` to enable it. + +---- +mp0: guests:subvol-100-disk-1,mp=/root/files,size=8G,backup=1 +---- + +NOTE: When creating a new mount point in the GUI, this option is enabled by +default. + +To disable backups for a mount point, add `backup=0` in the way described +above, or uncheck the *Backup* checkbox on the GUI. + +Replication of Containers mount points +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By default, additional mount points are replicated when the Root Disk is +replicated. If you want the {pve} storage replication mechanism to skip a mount +point, you can set the *Skip replication* option for that mount point. +As of {pve} 5.0, replication requires a storage of type `zfspool`. Adding a +mount point to a different type of storage when the container has replication +configured requires to have *Skip replication* enabled for that mount point. + + +Backup and Restore +------------------ + + +Container Backup +~~~~~~~~~~~~~~~~ + +It is possible to use the `vzdump` tool for container backup. Please refer to +the `vzdump` manual page for details. + + +Restoring Container Backups +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Restoring container backups made with `vzdump` is possible using the `pct +restore` command. By default, `pct restore` will attempt to restore as much of +the backed up container configuration as possible. It is possible to override +the backed up configuration by manually setting container options on the +command line (see the `pct` manual page for details). + +NOTE: `pvesm extractconfig` can be used to view the backed up configuration +contained in a vzdump archive. + +There are two basic restore modes, only differing by their handling of mount +points: + + +``Simple'' Restore Mode +^^^^^^^^^^^^^^^^^^^^^^^ + +If neither the `rootfs` parameter nor any of the optional `mpX` parameters are +explicitly set, the mount point configuration from the backed up configuration +file is restored using the following steps: + +. Extract mount points and their options from backup +. Create volumes for storage backed mount points (on storage provided with the + `storage` parameter, or default local storage if unset) +. Extract files from backup archive +. Add bind and device mount points to restored configuration (limited to root + user) + +NOTE: Since bind and device mount points are never backed up, no files are +restored in the last step, but only the configuration options. The assumption +is that such mount points are either backed up with another mechanism (e.g., +NFS space that is bind mounted into many containers), or not intended to be +backed up at all. + +This simple mode is also used by the container restore operations in the web +interface. + + +``Advanced'' Restore Mode +^^^^^^^^^^^^^^^^^^^^^^^^^ + +By setting the `rootfs` parameter (and optionally, any combination of `mpX` +parameters), the `pct restore` command is automatically switched into an +advanced mode. This advanced mode completely ignores the `rootfs` and `mpX` +configuration options contained in the backup archive, and instead only uses +the options explicitly provided as parameters. + +This mode allows flexible configuration of mount point settings at restore +time, for example: + +* Set target storages, volume sizes and other options for each mount point + individually +* Redistribute backed up files according to new mount point scheme +* Restore to device and/or bind mount points (limited to root user) + + +Managing Containers with `pct` ------------------------------ -'pct' is the tool to manage Linux Containers on {pve}. You can create -and destroy containers, and control execution (start, stop, migrate, -...). You can use pct to set parameters in the associated config file, -like network configuration or memory. +The ``Proxmox Container Toolkit'' (`pct`) is the command line tool to manage +{pve} containers. It enables you to create or destroy containers, as well as +control the container execution (start, stop, reboot, migrate, etc.). It can be +used to set parameters in the config file of a container, for example the +network configuration or memory limits. CLI Usage Examples ------------------- +~~~~~~~~~~~~~~~~~~ -Create a container based on a Debian template (provided you downloaded -the template via the webgui before) +Create a container based on a Debian template (provided you have already +downloaded the template via the web interface) - pct create 100 /var/lib/vz/template/cache/debian-8.0-standard_8.0-1_amd64.tar.gz +---- +# pct create 100 /var/lib/vz/template/cache/debian-10.0-standard_10.0-1_amd64.tar.gz +---- Start container 100 - pct start 100 +---- +# pct start 100 +---- Start a login session via getty - pct console 100 +---- +# pct console 100 +---- Enter the LXC namespace and run a shell as root user - pct enter 100 +---- +# pct enter 100 +---- Display the configuration - pct config 100 +---- +# pct config 100 +---- -Add a network interface called eth0, bridged to the host bridge vmbr0, -set the address and gateway, while it's running +Add a network interface called `eth0`, bridged to the host bridge `vmbr0`, set +the address and gateway, while it's running - pct set 100 -net0 name=eth0,bridge=vmbr0,ip=192.168.15.147/24,gw=192.168.15.1 +---- +# pct set 100 -net0 name=eth0,bridge=vmbr0,ip=192.168.15.147/24,gw=192.168.15.1 +---- Reduce the memory of the container to 512MB - pct set -memory 512 100 +---- +# pct set 100 -memory 512 +---- -Files ------- +Destroying a container always removes it from Access Control Lists and it always +removes the firewall configuration of the container. You have to activate +'--purge', if you want to additionally remove the container from replication jobs, +backup jobs and HA resource configurations. -'/etc/pve/lxc/.conf':: +---- +# pct destroy 100 --purge +---- -Configuration file for the container ''. -Container Advantages --------------------- +Obtaining Debugging Logs +~~~~~~~~~~~~~~~~~~~~~~~~ + +In case `pct start` is unable to start a specific container, it might be +helpful to collect debugging output by passing the `--debug` flag (replace `CTID` with +the container's CTID): + +---- +# pct start CTID --debug +---- + +Alternatively, you can use the following `lxc-start` command, which will save +the debug log to the file specified by the `-o` output option: + +---- +# lxc-start -n CTID -F -l DEBUG -o /tmp/lxc-CTID.log +---- -- Simple, and fully integrated into {pve}. Setup looks similar to a normal - VM setup. +This command will attempt to start the container in foreground mode, to stop +the container run `pct shutdown CTID` or `pct stop CTID` in a second terminal. - * Storage (ZFS, LVM, NFS, Ceph, ...) +The collected debug log is written to `/tmp/lxc-CTID.log`. - * Network +NOTE: If you have changed the container's configuration since the last start +attempt with `pct start`, you need to run `pct start` at least once to also +update the configuration used by `lxc-start`. - * Authentification +[[pct_migration]] +Migration +--------- - * Cluster +If you have a cluster, you can migrate your Containers with -- Fast: minimal overhead, as fast as bare metal +---- +# pct migrate +---- -- High density (perfect for idle workloads) +This works as long as your Container is offline. If it has local volumes or +mount points defined, the migration will copy the content over the network to +the target host if the same storage is defined there. -- REST API +Running containers cannot live-migrated due to technical limitations. You can +do a restart migration, which shuts down, moves and then starts a container +again on the target node. As containers are very lightweight, this results +normally only in a downtime of some hundreds of milliseconds. -- Direct hardware access +A restart migration can be done through the web interface or by using the +`--restart` flag with the `pct migrate` command. +A restart migration will shut down the Container and kill it after the +specified timeout (the default is 180 seconds). Then it will migrate the +Container like an offline migration and when finished, it starts the Container +on the target node. -Technology Overview -------------------- +[[pct_configuration]] +Configuration +------------- -- Integrated into {pve} graphical user interface (GUI) +The `/etc/pve/lxc/.conf` file stores container configuration, where +`` is the numeric ID of the given container. Like all other files stored +inside `/etc/pve/`, they get automatically replicated to all other cluster +nodes. -- LXC (https://linuxcontainers.org/) +NOTE: CTIDs < 100 are reserved for internal purposes, and CTIDs need to be +unique cluster wide. -- cgmanager for cgroup management +.Example Container Configuration +---- +ostype: debian +arch: amd64 +hostname: www +memory: 512 +swap: 512 +net0: bridge=vmbr0,hwaddr=66:64:66:64:64:36,ip=dhcp,name=eth0,type=veth +rootfs: local:107/vm-107-disk-1.raw,size=7G +---- -- lxcfs to provive containerized /proc file system +The configuration files are simple text files. You can edit them using a normal +text editor, for example, `vi` or `nano`. +This is sometimes useful to do small corrections, but keep in mind that you +need to restart the container to apply such changes. -- apparmor +For that reason, it is usually better to use the `pct` command to generate and +modify those files, or do the whole thing using the GUI. +Our toolkit is smart enough to instantaneously apply most changes to running +containers. This feature is called ``hot plug'', and there is no need to restart +the container in that case. -- CRIU: for live migration (planned) +In cases where a change cannot be hot-plugged, it will be registered as a +pending change (shown in red color in the GUI). +They will only be applied after rebooting the container. -- We use latest available kernels (4.2.X) -- image based deployment (templates) +File Format +~~~~~~~~~~~ -- Container setup from host (Network, DNS, Storage, ...) +The container configuration file uses a simple colon separated key/value +format. Each line has the following format: +----- +# this is a comment +OPTION: value +----- -ifdef::manvolnum[] -include::pve-copyright.adoc[] -endif::manvolnum[] +Blank lines in those files are ignored, and lines starting with a `#` character +are treated as comments and are also ignored. +It is possible to add low-level, LXC style configuration directly, for example: +---- +lxc.init_cmd: /sbin/my_own_init +---- +or +---- +lxc.init_cmd = /sbin/my_own_init +---- +The settings are passed directly to the LXC low-level tools. +[[pct_snapshots]] +Snapshots +~~~~~~~~~ + +When you create a snapshot, `pct` stores the configuration at snapshot time +into a separate snapshot section within the same configuration file. For +example, after creating a snapshot called ``testsnapshot'', your configuration +file will look like this: + +.Container configuration with snapshot +---- +memory: 512 +swap: 512 +parent: testsnaphot +... + +[testsnaphot] +memory: 512 +swap: 512 +snaptime: 1457170803 +... +---- + +There are a few snapshot related properties like `parent` and `snaptime`. The +`parent` property is used to store the parent/child relationship between +snapshots. `snaptime` is the snapshot creation time stamp (Unix epoch). + + +[[pct_options]] +Options +~~~~~~~ + +include::pct.conf.5-opts.adoc[] + + +Locks +----- + +Container migrations, snapshots and backups (`vzdump`) set a lock to prevent +incompatible concurrent actions on the affected container. Sometimes you need +to remove such a lock manually (e.g., after a power failure). + +---- +# pct unlock +---- + +CAUTION: Only do this if you are sure the action which set the lock is no +longer running. + + +ifdef::manvolnum[] + +Files +------ + +`/etc/pve/lxc/.conf`:: + +Configuration file for the container ''. + + +include::pve-copyright.adoc[] +endif::manvolnum[]