X-Git-Url: https://git.proxmox.com/?a=blobdiff_plain;f=pct.adoc;h=0be738b59b89c33a443de654910109200cc4ac22;hb=0a1739bd15ba88b0a384366994f1b8afe5073676;hp=6735406b86be901dded26a0bb063344b83cbaa44;hpb=037822517265cc453044d64f032ca8c3c872d321;p=pve-docs.git diff --git a/pct.adoc b/pct.adoc index 6735406..0be738b 100644 --- a/pct.adoc +++ b/pct.adoc @@ -28,148 +28,62 @@ ifdef::wiki[] :title: Linux Container endif::wiki[] -Containers are a lightweight alternative to fully virtualized -VMs. Instead of emulating a complete Operating System (OS), containers -simply use the OS of the host they run on. This implies that all -containers use the same kernel, and that they can access resources -from the host directly. - -This is great because containers do not waste CPU power nor memory due -to kernel emulation. Container run-time costs are close to zero and -usually negligible. But there are also some drawbacks you need to -consider: - -* You can only run Linux based OS inside containers, i.e. it is not - possible to run FreeBSD or MS Windows inside. - -* For security reasons, access to host resources needs to be - restricted. This is done with AppArmor, SecComp filters and other - kernel features. Be prepared that some syscalls are not allowed - inside containers. - -{pve} uses https://linuxcontainers.org/[LXC] as underlying container -technology. We consider LXC as low-level library, which provides -countless options. It would be too difficult to use those tools -directly. Instead, we provide a small wrapper called `pct`, the -"Proxmox Container Toolkit". - -The toolkit is tightly coupled with {pve}. That means that it is aware -of the cluster setup, and it can use the same network and storage -resources as fully virtualized VMs. You can even use the {pve} -firewall, or manage containers using the HA framework. - -Our primary goal is to offer an environment as one would get from a -VM, but without the additional overhead. We call this "System -Containers". - -NOTE: If you want to run micro-containers (with docker, rkt, ...), it -is best to run them inside a VM. - - -Technology Overview -------------------- - -* LXC (https://linuxcontainers.org/) - -* Integrated into {pve} graphical user interface (GUI) - -* Easy to use command line tool `pct` - -* Access via {pve} REST API - -* lxcfs to provide containerized /proc file system - -* AppArmor/Seccomp to improve security - -* CRIU: for live migration (planned) - -* Use latest available kernels (4.4.X) - -* Image based deployment (templates) - -* Use {pve} storage library +Containers are a lightweight alternative to fully virtualized machines (VMs). +They use the kernel of the host system that they run on, instead of emulating a +full operating system (OS). This means that containers can access resources on +the host system directly. -* Container setup from host (network, DNS, storage, ...) +The runtime costs for containers is low, usually negligible. However, there are +some drawbacks that need be considered: +* Only Linux distributions can be run in containers. It is not possible to run + other Operating Systems like, for example, FreeBSD or Microsoft Windows + inside a container. -Security Considerations ------------------------ - -Containers use the same kernel as the host, so there is a big attack -surface for malicious users. You should consider this fact if you -provide containers to totally untrusted people. In general, fully -virtualized VMs provide better isolation. - -The good news is that LXC uses many kernel security features like -AppArmor, CGroups and PID and user namespaces, which makes containers -usage quite secure. - -Guest Operating System Configuration ------------------------------------- +* For security reasons, access to host resources needs to be restricted. + Containers run in their own separate namespaces. Additionally some syscalls + are not allowed within containers. -We normally try to detect the operating system type inside the -container, and then modify some files inside the container to make -them work as expected. Here is a short list of things we do at -container startup: +{pve} uses https://linuxcontainers.org/[Linux Containers (LXC)] as underlying +container technology. The ``Proxmox Container Toolkit'' (`pct`) simplifies the +usage and management of LXC containers. -set /etc/hostname:: to set the container name - -modify /etc/hosts:: to allow lookup of the local hostname +Containers are tightly integrated with {pve}. This means that they are aware of +the cluster setup, and they can use the same network and storage resources as +virtual machines. You can also use the {pve} firewall, or manage containers +using the HA framework. -network setup:: pass the complete network setup to the container - -configure DNS:: pass information about DNS servers - -adapt the init system:: for example, fix the number of spawned getty processes - -set the root password:: when creating a new container - -rewrite ssh_host_keys:: so that each container has unique keys +Our primary goal is to offer an environment as one would get from a VM, but +without the additional overhead. We call this ``System Containers''. -randomize crontab:: so that cron does not start at the same time on all containers +NOTE: If you want to run micro-containers, for example, 'Docker' or 'rkt', it +is best to run them inside a VM. -Changes made by {PVE} are enclosed by comment markers: ----- -# --- BEGIN PVE --- - -# --- END PVE --- ----- +Technology Overview +------------------- -Those markers will be inserted at a reasonable location in the -file. If such a section already exists, it will be updated in place -and will not be moved. +* LXC (https://linuxcontainers.org/) -Modification of a file can be prevented by adding a `.pve-ignore.` -file for it. For instance, if the file `/etc/.pve-ignore.hosts` -exists then the `/etc/hosts` file will not be touched. This can be a -simple empty file creatd via: +* Integrated into {pve} graphical web user interface (GUI) - # touch /etc/.pve-ignore.hosts +* Easy to use command line tool `pct` -Most modifications are OS dependent, so they differ between different -distributions and versions. You can completely disable modifications -by manually setting the `ostype` to `unmanaged`. +* Access via {pve} REST API -OS type detection is done by testing for certain files inside the -container: +* 'lxcfs' to provide containerized /proc file system -Ubuntu:: inspect /etc/lsb-release (`DISTRIB_ID=Ubuntu`) +* Control groups ('cgroups') for resource isolation and limitation -Debian:: test /etc/debian_version +* 'AppArmor' and 'seccomp' to improve security -Fedora:: test /etc/fedora-release +* Modern Linux kernels -RedHat or CentOS:: test /etc/redhat-release +* Image based deployment (templates) -ArchLinux:: test /etc/arch-release +* Uses {pve} xref:chapter_storage[storage library] -Alpine:: test /etc/alpine-release - -Gentoo:: test /etc/gentoo-release - -NOTE: Container start fails if the configured `ostype` differs from the auto -detected type. +* Container setup from host (network, DNS, storage, etc.) [[pct_container_images]] @@ -177,28 +91,26 @@ Container Images ---------------- Container images, sometimes also referred to as ``templates'' or -``appliances'', are `tar` archives which contain everything to run a -container. You can think of it as a tidy container backup. Like most -modern container toolkits, `pct` uses those images when you create a -new container, for example: - - pct create 999 local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz +``appliances'', are `tar` archives which contain everything to run a container. -{pve} itself ships a set of basic templates for most common -operating systems, and you can download them using the `pveam` (short -for {pve} Appliance Manager) command line utility. You can also -download https://www.turnkeylinux.org/[TurnKey Linux] containers using -that tool (or the graphical user interface). +{pve} itself provides a variety of basic templates for the most common Linux +distributions. They can be downloaded using the GUI or the `pveam` (short for +{pve} Appliance Manager) command line utility. +Additionally, https://www.turnkeylinux.org/[TurnKey Linux] container templates +are also available to download. -Our image repositories contain a list of available images, and there -is a cron job run each day to download that list. You can trigger that -update manually with: +The list of available templates is updated daily through the 'pve-daily-update' +timer. You can also trigger an update manually by executing: - pveam update +---- +# pveam update +---- -After that you can view the list of available images using: +To view the list of available images run: - pveam available +---- +# pveam available +---- You can restrict this large list by specifying the `section` you are interested in, for example basic `system` images: @@ -206,138 +118,59 @@ interested in, for example basic `system` images: .List available system images ---- # pveam available --section system -system archlinux-base_2015-24-29-1_x86_64.tar.gz -system centos-7-default_20160205_amd64.tar.xz -system debian-6.0-standard_6.0-7_amd64.tar.gz -system debian-7.0-standard_7.0-3_amd64.tar.gz -system debian-8.0-standard_8.0-1_amd64.tar.gz -system ubuntu-12.04-standard_12.04-1_amd64.tar.gz -system ubuntu-14.04-standard_14.04-1_amd64.tar.gz -system ubuntu-15.04-standard_15.04-1_amd64.tar.gz -system ubuntu-15.10-standard_15.10-1_amd64.tar.gz +system alpine-3.10-default_20190626_amd64.tar.xz +system alpine-3.9-default_20190224_amd64.tar.xz +system archlinux-base_20190924-1_amd64.tar.gz +system centos-6-default_20191016_amd64.tar.xz +system centos-7-default_20190926_amd64.tar.xz +system centos-8-default_20191016_amd64.tar.xz +system debian-10.0-standard_10.0-1_amd64.tar.gz +system debian-8.0-standard_8.11-1_amd64.tar.gz +system debian-9.0-standard_9.7-1_amd64.tar.gz +system fedora-30-default_20190718_amd64.tar.xz +system fedora-31-default_20191029_amd64.tar.xz +system gentoo-current-default_20190718_amd64.tar.xz +system opensuse-15.0-default_20180907_amd64.tar.xz +system opensuse-15.1-default_20190719_amd64.tar.xz +system ubuntu-16.04-standard_16.04.5-1_amd64.tar.gz +system ubuntu-18.04-standard_18.04.1-1_amd64.tar.gz +system ubuntu-19.04-standard_19.04-1_amd64.tar.gz +system ubuntu-19.10-standard_19.10-1_amd64.tar.gz ---- -Before you can use such a template, you need to download them into one -of your storages. You can simply use storage `local` for that -purpose. For clustered installations, it is preferred to use a shared -storage so that all nodes can access those images. +Before you can use such a template, you need to download them into one of your +storages. If you're unsure to which one, you can simply use the `local` named +storage for that purpose. For clustered installations, it is preferred to use a +shared storage so that all nodes can access those images. - pveam download local debian-8.0-standard_8.0-1_amd64.tar.gz +---- +# pveam download local debian-10.0-standard_10.0-1_amd64.tar.gz +---- -You are now ready to create containers using that image, and you can -list all downloaded images on storage `local` with: +You are now ready to create containers using that image, and you can list all +downloaded images on storage `local` with: ---- # pveam list local -local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz 190.20MB +local:vztmpl/debian-10.0-standard_10.0-1_amd64.tar.gz 219.95MB ---- -The above command shows you the full {pve} volume identifiers. They include -the storage name, and most other {pve} commands can use them. For -example you can delete that image later with: - - pveam remove local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz +TIP: You can also use the {pve} web interface GUI to download, list and delete +container templates. - -[[pct_container_storage]] -Container Storage ------------------ - -Traditional containers use a very simple storage model, only allowing -a single mount point, the root file system. This was further -restricted to specific file system types like `ext4` and `nfs`. -Additional mounts are often done by user provided scripts. This turned -out to be complex and error prone, so we try to avoid that now. - -Our new LXC based container model is more flexible regarding -storage. First, you can have more than a single mount point. This -allows you to choose a suitable storage for each application. For -example, you can use a relatively slow (and thus cheap) storage for -the container root file system. Then you can use a second mount point -to mount a very fast, distributed storage for your database -application. See section <> for further -details. - -The second big improvement is that you can use any storage type -supported by the {pve} storage library. That means that you can store -your containers on local `lvmthin` or `zfs`, shared `iSCSI` storage, -or even on distributed storage systems like `ceph`. It also enables us -to use advanced storage features like snapshots and clones. `vzdump` -can also use the snapshot feature to provide consistent container -backups. - -Last but not least, you can also mount local devices directly, or -mount local directories using bind mounts. That way you can access -local storage inside containers with zero overhead. Such bind mounts -also provide an easy way to share data between different containers. - - -FUSE Mounts -~~~~~~~~~~~ - -WARNING: Because of existing issues in the Linux kernel's freezer -subsystem the usage of FUSE mounts inside a container is strongly -advised against, as containers need to be frozen for suspend or -snapshot mode backups. - -If FUSE mounts cannot be replaced by other mounting mechanisms or storage -technologies, it is possible to establish the FUSE mount on the Proxmox host -and use a bind mount point to make it accessible inside the container. - - -Using Quotas Inside Containers -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Quotas allow to set limits inside a container for the amount of disk -space that each user can use. This only works on ext4 image based -storage types and currently does not work with unprivileged -containers. - -Activating the `quota` option causes the following mount options to be -used for a mount point: -`usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0` - -This allows quotas to be used like you would on any other system. You -can initialize the `/aquota.user` and `/aquota.group` files by running +`pct` uses them to create a new container, for example: ---- -quotacheck -cmug / -quotaon / +# pct create 999 local:vztmpl/debian-10.0-standard_10.0-1_amd64.tar.gz ---- -and edit the quotas via the `edquota` command. Refer to the documentation -of the distribution running inside the container for details. - -NOTE: You need to run the above commands for every mount point by passing -the mount point's path instead of just `/`. - - -Using ACLs Inside Containers -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The standard Posix **A**ccess **C**ontrol **L**ists are also available inside containers. -ACLs allow you to set more detailed file ownership than the traditional user/ -group/others model. - - -Backup of Containers mount points -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -By default additional mount points besides the RootDisk mount point are not -included in backups. You can reverse this default behavior by setting the -* Backup* option on a mount point. -// see PVE::VZDump::LXC::prepare() +The above command shows you the full {pve} volume identifiers. They include the +storage name, and most other {pve} commands can use them. For example you can +delete that image later with: -Replication of Containers mount points -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -By default additional mount points are replicated when the RootDisk -is replicated. If you want the {pve} storage replication mechanism to skip a - mount point when starting a replication job, you can set the -*Skip replication* option on that mount point. + -As of {pve} 5.0, replication requires a storage of type `zfspool`, so adding a - mount point to a different type of storage when the container has replication - configured requires to *Skip replication* for that mount point. +---- +# pveam remove local:vztmpl/debian-10.0-standard_10.0-1_amd64.tar.gz +---- [[pct_settings]] @@ -348,58 +181,59 @@ Container Settings General Settings ~~~~~~~~~~~~~~~~ -[thumbnail="gui-create-ct-general.png"] +[thumbnail="screenshot/gui-create-ct-general.png"] General settings of a container include * the *Node* : the physical server on which the container will run -* the *CT ID*: a unique number in this {pve} installation used to identify your container +* the *CT ID*: a unique number in this {pve} installation used to identify your + container * *Hostname*: the hostname of the container * *Resource Pool*: a logical group of containers and VMs * *Password*: the root password of the container * *SSH Public Key*: a public key for connecting to the root account over SSH * *Unprivileged container*: this option allows to choose at creation time -if you want to create a privileged or unprivileged container. + if you want to create a privileged or unprivileged container. +Unprivileged Containers +^^^^^^^^^^^^^^^^^^^^^^^ -Privileged Containers -^^^^^^^^^^^^^^^^^^^^^ +Unprivileged containers use a new kernel feature called user namespaces. +The root UID 0 inside the container is mapped to an unprivileged user outside +the container. This means that most security issues (container escape, resource +abuse, etc.) in these containers will affect a random unprivileged user, and +would be a generic kernel security bug rather than an LXC issue. The LXC team +thinks unprivileged containers are safe by design. -Security is done by dropping capabilities, using mandatory access -control (AppArmor), SecComp filters and namespaces. The LXC team -considers this kind of container as unsafe, and they will not consider -new container escape exploits to be security issues worthy of a CVE -and quick fix. So you should use this kind of containers only inside a -trusted environment, or when no untrusted task is running as root in -the container. +This is the default option when creating a new container. +NOTE: If the container uses systemd as an init system, please be aware the +systemd version running inside the container should be equal to or greater than +220. -Unprivileged Containers -^^^^^^^^^^^^^^^^^^^^^^^ -This kind of containers use a new kernel feature called user -namespaces. The root UID 0 inside the container is mapped to an -unprivileged user outside the container. This means that most security -issues (container escape, resource abuse, ...) in those containers -will affect a random unprivileged user, and so would be a generic -kernel security bug rather than an LXC issue. The LXC team thinks -unprivileged containers are safe by design. +Privileged Containers +^^^^^^^^^^^^^^^^^^^^^ + +Security in containers is achieved by using mandatory access control 'AppArmor' +restrictions, 'seccomp' filters and Linux kernel namespaces. The LXC team +considers this kind of container as unsafe, and they will not consider new +container escape exploits to be security issues worthy of a CVE and quick fix. +That's why privileged containers should only be used in trusted environments. -NOTE: If the container uses systemd as an init system, please be -aware the systemd version running inside the container should be equal -or greater than 220. [[pct_cpu]] CPU ~~~ -[thumbnail="gui-create-ct-cpu.png"] +[thumbnail="screenshot/gui-create-ct-cpu.png"] -You can restrict the number of visible CPUs inside the container using -the `cores` option. This is implemented using the Linux 'cpuset' -cgroup (**c**ontrol *group*). A special task inside `pvestatd` tries -to distribute running containers among available CPUs. You can view -the assigned CPUs using the following command: +You can restrict the number of visible CPUs inside the container using the +`cores` option. This is implemented using the Linux 'cpuset' cgroup +(**c**ontrol *group*). +A special task inside `pvestatd` tries to distribute running containers among +available CPUs periodically. +To view the assigned CPUs run the following command: ---- # pct cpusets @@ -410,63 +244,61 @@ the assigned CPUs using the following command: --------------------- ---- -Containers use the host kernel directly, so all task inside a -container are handled by the host CPU scheduler. {pve} uses the Linux -'CFS' (**C**ompletely **F**air **S**cheduler) scheduler by default, -which has additional bandwidth control options. +Containers use the host kernel directly. All tasks inside a container are +handled by the host CPU scheduler. {pve} uses the Linux 'CFS' (**C**ompletely +**F**air **S**cheduler) scheduler by default, which has additional bandwidth +control options. [horizontal] -`cpulimit`: :: You can use this option to further limit assigned CPU -time. Please note that this is a floating point number, so it is -perfectly valid to assign two cores to a container, but restrict -overall CPU consumption to half a core. +`cpulimit`: :: You can use this option to further limit assigned CPU time. +Please note that this is a floating point number, so it is perfectly valid to +assign two cores to a container, but restrict overall CPU consumption to half a +core. + ---- cores: 2 cpulimit: 0.5 ---- -`cpuunits`: :: This is a relative weight passed to the kernel -scheduler. The larger the number is, the more CPU time this container -gets. Number is relative to the weights of all the other running -containers. The default is 1024. You can use this setting to -prioritize some containers. +`cpuunits`: :: This is a relative weight passed to the kernel scheduler. The +larger the number is, the more CPU time this container gets. Number is relative +to the weights of all the other running containers. The default is 1024. You +can use this setting to prioritize some containers. [[pct_memory]] Memory ~~~~~~ -[thumbnail="gui-create-ct-memory.png"] +[thumbnail="screenshot/gui-create-ct-memory.png"] Container memory is controlled using the cgroup memory controller. [horizontal] -`memory`: :: Limit overall memory usage. This corresponds -to the `memory.limit_in_bytes` cgroup setting. +`memory`: :: Limit overall memory usage. This corresponds to the +`memory.limit_in_bytes` cgroup setting. -`swap`: :: Allows the container to use additional swap memory from the -host swap space. This corresponds to the `memory.memsw.limit_in_bytes` -cgroup setting, which is set to the sum of both value (`memory + -swap`). +`swap`: :: Allows the container to use additional swap memory from the host +swap space. This corresponds to the `memory.memsw.limit_in_bytes` cgroup +setting, which is set to the sum of both value (`memory + swap`). [[pct_mount_points]] Mount Points ~~~~~~~~~~~~ -[thumbnail="gui-create-ct-root-disk.png"] +[thumbnail="screenshot/gui-create-ct-root-disk.png"] -The root mount point is configured with the `rootfs` property, and you can -configure up to 10 additional mount points. The corresponding options -are called `mp0` to `mp9`, and they can contain the following setting: +The root mount point is configured with the `rootfs` property. You can +configure up to 256 additional mount points. The corresponding options are +called `mp0` to `mp255`. They can contain the following settings: include::pct-mountpoint-opts.adoc[] -Currently there are basically three types of mount points: storage backed -mount points, bind mounts and device mounts. +Currently there are three types of mount points: storage backed mount points, +bind mounts, and device mounts. .Typical container `rootfs` configuration ---- @@ -489,10 +321,15 @@ in three different flavors: NOTE: The special option syntax `STORAGE_ID:SIZE_IN_GB` for storage backed mount point volumes will automatically allocate a volume of the specified size -on the specified storage. E.g., calling -`pct set 100 -mp0 thin1:10,mp=/path/in/container` will allocate a 10GB volume -on the storage `thin1` and replace the volume ID place holder `10` with the -allocated volume ID. +on the specified storage. For example, calling + +---- +pct set 100 -mp0 thin1:10,mp=/path/in/container +---- + +will allocate a 10GB volume on the storage `thin1` and replace the volume ID +place holder `10` with the allocated volume ID, and setup the moutpoint in the +container at `/path/in/container` Bind Mount Points @@ -512,11 +349,10 @@ user mapping and cannot use ACLs. NOTE: The contents of bind mount points are not backed up when using `vzdump`. -WARNING: For security reasons, bind mounts should only be established -using source directories especially reserved for this purpose, e.g., a -directory hierarchy under `/mnt/bindmounts`. Never bind mount system -directories like `/`, `/var` or `/etc` into a container - this poses a -great security risk. +WARNING: For security reasons, bind mounts should only be established using +source directories especially reserved for this purpose, e.g., a directory +hierarchy under `/mnt/bindmounts`. Never bind mount system directories like +`/`, `/var` or `/etc` into a container - this poses a great security risk. NOTE: The bind mount source path must not contain any symlinks. @@ -538,18 +374,19 @@ NOTE: Device mount points should only be used under special circumstances. In most cases a storage backed mount point offers the same performance and a lot more features. -NOTE: The contents of device mount points are not backed up when using `vzdump`. +NOTE: The contents of device mount points are not backed up when using +`vzdump`. [[pct_container_network]] Network ~~~~~~~ -[thumbnail="gui-create-ct-network.png"] +[thumbnail="screenshot/gui-create-ct-network.png"] -You can configure up to 10 network interfaces for a single -container. The corresponding options are called `net0` to `net9`, and -they can contain the following setting: +You can configure up to 10 network interfaces for a single container. +The corresponding options are called `net0` to `net9`, and they can contain the +following setting: include::pct-network-opts.adoc[] @@ -558,38 +395,271 @@ include::pct-network-opts.adoc[] Automatic Start and Shutdown of Containers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -After creating your containers, you probably want them to start automatically -when the host system boots. For this you need to select the option 'Start at -boot' from the 'Options' Tab of your container in the web interface, or set it with -the following command: +To automatically start a container when the host system boots, select the +option 'Start at boot' in the 'Options' panel of the container in the web +interface or run the following command: - pct set -onboot 1 +---- +# pct set CTID -onboot 1 +---- .Start and Shutdown Order // use the screenshot from qemu - its the same -[thumbnail="gui-qemu-edit-start-order.png"] - -If you want to fine tune the boot order of your containers, you can use the following -parameters : - -* *Start/Shutdown order*: Defines the start order priority. E.g. set it to 1 if -you want the CT to be the first to be started. (We use the reverse startup -order for shutdown, so a container with a start order of 1 would be the last to -be shut down) -* *Startup delay*: Defines the interval between this container start and subsequent -containers starts . E.g. set it to 240 if you want to wait 240 seconds before starting -other containers. +[thumbnail="screenshot/gui-qemu-edit-start-order.png"] + +If you want to fine tune the boot order of your containers, you can use the +following parameters: + +* *Start/Shutdown order*: Defines the start order priority. For example, set it + to 1 if you want the CT to be the first to be started. (We use the reverse + startup order for shutdown, so a container with a start order of 1 would be + the last to be shut down) +* *Startup delay*: Defines the interval between this container start and + subsequent containers starts. For example, set it to 240 if you want to wait + 240 seconds before starting other containers. * *Shutdown timeout*: Defines the duration in seconds {pve} should wait -for the container to be offline after issuing a shutdown command. -By default this value is set to 60, which means that {pve} will issue a -shutdown request, wait 60s for the machine to be offline, and if after 60s -the machine is still online will notify that the shutdown action failed. + for the container to be offline after issuing a shutdown command. + By default this value is set to 60, which means that {pve} will issue a + shutdown request, wait 60s for the machine to be offline, and if after 60s + the machine is still online will notify that the shutdown action failed. -Please note that containers without a Start/Shutdown order parameter will always -start after those where the parameter is set, and this parameter only +Please note that containers without a Start/Shutdown order parameter will +always start after those where the parameter is set, and this parameter only makes sense between the machines running locally on a host, and not cluster-wide. +Hookscripts +~~~~~~~~~~~ + +You can add a hook script to CTs with the config property `hookscript`. + +---- +# pct set 100 -hookscript local:snippets/hookscript.pl +---- + +It will be called during various phases of the guests lifetime. For an example +and documentation see the example script under +`/usr/share/pve-docs/examples/guest-example-hookscript.pl`. + +Security Considerations +----------------------- + +Containers use the kernel of the host system. This exposes an attack surface +for malicious users. In general, full virtual machines provide better +isolation. This should be considered if containers are provided to unknown or +untrusted people. + +To reduce the attack surface, LXC uses many security features like AppArmor, +CGroups and kernel namespaces. + +AppArmor +~~~~~~~~ + +AppArmor profiles are used to restrict access to possibly dangerous actions. +Some system calls, i.e. `mount`, are prohibited from execution. + +To trace AppArmor activity, use: + +---- +# dmesg | grep apparmor +---- + +Although it is not recommended, AppArmor can be disabled for a container. This +brings security risks with it. Some syscalls can lead to privilege escalation +when executed within a container if the system is misconfigured or if a LXC or +Linux Kernel vulnerability exists. + +To disable AppArmor for a container, add the following line to the container +configuration file located at `/etc/pve/lxc/CTID.conf`: + +---- +lxc.apparmor.profile = unconfined +---- + +WARNING: Please note that this is not recommended for production use. + + +// TODO: describe cgroups + seccomp a bit more. +// TODO: pve-lxc-syscalld + + +Guest Operating System Configuration +------------------------------------ + +{pve} tries to detect the Linux distribution in the container, and modifies +some files. Here is a short list of things done at container startup: + +set /etc/hostname:: to set the container name + +modify /etc/hosts:: to allow lookup of the local hostname + +network setup:: pass the complete network setup to the container + +configure DNS:: pass information about DNS servers + +adapt the init system:: for example, fix the number of spawned getty processes + +set the root password:: when creating a new container + +rewrite ssh_host_keys:: so that each container has unique keys + +randomize crontab:: so that cron does not start at the same time on all containers + +Changes made by {PVE} are enclosed by comment markers: + +---- +# --- BEGIN PVE --- + +# --- END PVE --- +---- + +Those markers will be inserted at a reasonable location in the file. If such a +section already exists, it will be updated in place and will not be moved. + +Modification of a file can be prevented by adding a `.pve-ignore.` file for it. +For instance, if the file `/etc/.pve-ignore.hosts` exists then the `/etc/hosts` +file will not be touched. This can be a simple empty file created via: + +---- +# touch /etc/.pve-ignore.hosts +---- + +Most modifications are OS dependent, so they differ between different +distributions and versions. You can completely disable modifications by +manually setting the `ostype` to `unmanaged`. + +OS type detection is done by testing for certain files inside the +container. {pve} first checks the `/etc/os-release` file +footnote:[/etc/os-release replaces the multitude of per-distribution +release files https://manpages.debian.org/stable/systemd/os-release.5.en.html]. +If that file is not present, or it does not contain a clearly recognizable +distribution identifier the following distribution specific release files are +checked. + +Ubuntu:: inspect /etc/lsb-release (`DISTRIB_ID=Ubuntu`) + +Debian:: test /etc/debian_version + +Fedora:: test /etc/fedora-release + +RedHat or CentOS:: test /etc/redhat-release + +ArchLinux:: test /etc/arch-release + +Alpine:: test /etc/alpine-release + +Gentoo:: test /etc/gentoo-release + +NOTE: Container start fails if the configured `ostype` differs from the auto +detected type. + + +[[pct_container_storage]] +Container Storage +----------------- + +The {pve} LXC container storage model is more flexible than traditional +container storage models. A container can have multiple mount points. This +makes it possible to use the best suited storage for each application. + +For example the root file system of the container can be on slow and cheap +storage while the database can be on fast and distributed storage via a second +mount point. See section <> for further +details. + +Any storage type supported by the {pve} storage library can be used. This means +that containers can be stored on local (for example `lvm`, `zfs` or directory), +shared external (like `iSCSI`, `NFS`) or even distributed storage systems like +Ceph. Advanced storage features like snapshots or clones can be used if the +underlying storage supports them. The `vzdump` backup tool can use snapshots to +provide consistent container backups. + +Furthermore, local devices or local directories can be mounted directly using +'bind mounts'. This gives access to local resources inside a container with +practically zero overhead. Bind mounts can be used as an easy way to share data +between containers. + + +FUSE Mounts +~~~~~~~~~~~ + +WARNING: Because of existing issues in the Linux kernel's freezer subsystem the +usage of FUSE mounts inside a container is strongly advised against, as +containers need to be frozen for suspend or snapshot mode backups. + +If FUSE mounts cannot be replaced by other mounting mechanisms or storage +technologies, it is possible to establish the FUSE mount on the Proxmox host +and use a bind mount point to make it accessible inside the container. + + +Using Quotas Inside Containers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Quotas allow to set limits inside a container for the amount of disk space that +each user can use. + +NOTE: This only works on ext4 image based storage types and currently only +works with privileged containers. + +Activating the `quota` option causes the following mount options to be used for +a mount point: +`usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0` + +This allows quotas to be used like on any other system. You can initialize the +`/aquota.user` and `/aquota.group` files by running: + +---- +# quotacheck -cmug / +# quotaon / +---- + +Then edit the quotas using the `edquota` command. Refer to the documentation of +the distribution running inside the container for details. + +NOTE: You need to run the above commands for every mount point by passing the +mount point's path instead of just `/`. + + +Using ACLs Inside Containers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The standard Posix **A**ccess **C**ontrol **L**ists are also available inside +containers. ACLs allow you to set more detailed file ownership than the +traditional user/group/others model. + + +Backup of Container mount points +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To include a mount point in backups, enable the `backup` option for it in the +container configuration. For an existing mount point `mp0` + +---- +mp0: guests:subvol-100-disk-1,mp=/root/files,size=8G +---- + +add `backup=1` to enable it. + +---- +mp0: guests:subvol-100-disk-1,mp=/root/files,size=8G,backup=1 +---- + +NOTE: When creating a new mount point in the GUI, this option is enabled by +default. + +To disable backups for a mount point, add `backup=0` in the way described +above, or uncheck the *Backup* checkbox on the GUI. + +Replication of Containers mount points +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By default, additional mount points are replicated when the Root Disk is +replicated. If you want the {pve} storage replication mechanism to skip a mount +point, you can set the *Skip replication* option for that mount point. +As of {pve} 5.0, replication requires a storage of type `zfspool`. Adding a +mount point to a different type of storage when the container has replication +configured requires to have *Skip replication* enabled for that mount point. + Backup and Restore ------------------ @@ -598,18 +668,18 @@ Backup and Restore Container Backup ~~~~~~~~~~~~~~~~ -It is possible to use the `vzdump` tool for container backup. Please -refer to the `vzdump` manual page for details. +It is possible to use the `vzdump` tool for container backup. Please refer to +the `vzdump` manual page for details. Restoring Container Backups ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Restoring container backups made with `vzdump` is possible using the -`pct restore` command. By default, `pct restore` will attempt to restore as much -of the backed up container configuration as possible. It is possible to override -the backed up configuration by manually setting container options on the command -line (see the `pct` manual page for details). +Restoring container backups made with `vzdump` is possible using the `pct +restore` command. By default, `pct restore` will attempt to restore as much of +the backed up container configuration as possible. It is possible to override +the backed up configuration by manually setting container options on the +command line (see the `pct` manual page for details). NOTE: `pvesm extractconfig` can be used to view the backed up configuration contained in a vzdump archive. @@ -621,15 +691,16 @@ points: ``Simple'' Restore Mode ^^^^^^^^^^^^^^^^^^^^^^^ -If neither the `rootfs` parameter nor any of the optional `mpX` parameters -are explicitly set, the mount point configuration from the backed up -configuration file is restored using the following steps: +If neither the `rootfs` parameter nor any of the optional `mpX` parameters are +explicitly set, the mount point configuration from the backed up configuration +file is restored using the following steps: . Extract mount points and their options from backup . Create volumes for storage backed mount points (on storage provided with the -`storage` parameter, or default local storage if unset) + `storage` parameter, or default local storage if unset) . Extract files from backup archive -. Add bind and device mount points to restored configuration (limited to root user) +. Add bind and device mount points to restored configuration (limited to root + user) NOTE: Since bind and device mount points are never backed up, no files are restored in the last step, but only the configuration options. The assumption @@ -647,14 +718,14 @@ interface. By setting the `rootfs` parameter (and optionally, any combination of `mpX` parameters), the `pct restore` command is automatically switched into an advanced mode. This advanced mode completely ignores the `rootfs` and `mpX` -configuration options contained in the backup archive, and instead only -uses the options explicitly provided as parameters. +configuration options contained in the backup archive, and instead only uses +the options explicitly provided as parameters. -This mode allows flexible configuration of mount point settings at restore time, -for example: +This mode allows flexible configuration of mount point settings at restore +time, for example: * Set target storages, volume sizes and other options for each mount point -individually + individually * Redistribute backed up files according to new mount point scheme * Restore to device and/or bind mount points (limited to root user) @@ -662,44 +733,58 @@ individually Managing Containers with `pct` ------------------------------ -`pct` is the tool to manage Linux Containers on {pve}. You can create -and destroy containers, and control execution (start, stop, migrate, -...). You can use pct to set parameters in the associated config file, -like network configuration or memory limits. - +The ``Proxmox Container Toolkit'' (`pct`) is the command line tool to manage +{pve} containers. It enables you to create or destroy containers, as well as +control the container execution (start, stop, reboot, migrate, etc.). It can be +used to set parameters in the config file of a container, for example the +network configuration or memory limits. CLI Usage Examples ~~~~~~~~~~~~~~~~~~ -Create a container based on a Debian template (provided you have -already downloaded the template via the web interface) +Create a container based on a Debian template (provided you have already +downloaded the template via the web interface) - pct create 100 /var/lib/vz/template/cache/debian-8.0-standard_8.0-1_amd64.tar.gz +---- +# pct create 100 /var/lib/vz/template/cache/debian-10.0-standard_10.0-1_amd64.tar.gz +---- Start container 100 - pct start 100 +---- +# pct start 100 +---- Start a login session via getty - pct console 100 +---- +# pct console 100 +---- Enter the LXC namespace and run a shell as root user - pct enter 100 +---- +# pct enter 100 +---- Display the configuration - pct config 100 +---- +# pct config 100 +---- -Add a network interface called `eth0`, bridged to the host bridge `vmbr0`, -set the address and gateway, while it's running +Add a network interface called `eth0`, bridged to the host bridge `vmbr0`, set +the address and gateway, while it's running - pct set 100 -net0 name=eth0,bridge=vmbr0,ip=192.168.15.147/24,gw=192.168.15.1 +---- +# pct set 100 -net0 name=eth0,bridge=vmbr0,ip=192.168.15.147/24,gw=192.168.15.1 +---- Reduce the memory of the container to 512MB - pct set 100 -memory 512 +---- +# pct set 100 -memory 512 +---- Obtaining Debugging Logs @@ -709,9 +794,12 @@ In case `pct start` is unable to start a specific container, it might be helpful to collect debugging output by running `lxc-start` (replace `ID` with the container's ID): - lxc-start -n ID -F -l DEBUG -o /tmp/lxc-ID.log +---- +# lxc-start -n ID -F -l DEBUG -o /tmp/lxc-ID.log +---- -This command will attempt to start the container in foreground mode, to stop the container run `pct shutdown ID` or `pct stop ID` in a second terminal. +This command will attempt to start the container in foreground mode, to stop +the container run `pct shutdown ID` or `pct stop ID` in a second terminal. The collected debug log is written to `/tmp/lxc-ID.log`. @@ -725,29 +813,35 @@ Migration If you have a cluster, you can migrate your Containers with - pct migrate +---- +# pct migrate +---- This works as long as your Container is offline. If it has local volumes or -mountpoints defined, the migration will copy the content over the network to -the target host if there is the same storage defined. +mount points defined, the migration will copy the content over the network to +the target host if the same storage is defined there. -If you want to migrate online Containers, the only way is to use -restart migration. This can be initiated with the -restart flag and the optional --timeout parameter. +Running containers cannot live-migrated due to technical limitations. You can +do a restart migration, which shuts down, moves and then starts a container +again on the target node. As containers are very lightweight, this results +normally only in a downtime of some hundreds of milliseconds. -A restart migration will shut down the Container and kill it after the specified -timeout (the default is 180 seconds). Then it will migrate the Container -like an offline migration and when finished, it starts the Container on the -target node. +A restart migration can be done through the web interface or by using the +`--restart` flag with the `pct migrate` command. + +A restart migration will shut down the Container and kill it after the +specified timeout (the default is 180 seconds). Then it will migrate the +Container like an offline migration and when finished, it starts the Container +on the target node. [[pct_configuration]] Configuration ------------- -The `/etc/pve/lxc/.conf` file stores container configuration, -where `` is the numeric ID of the given container. Like all -other files stored inside `/etc/pve/`, they get automatically -replicated to all other cluster nodes. +The `/etc/pve/lxc/.conf` file stores container configuration, where +`` is the numeric ID of the given container. Like all other files stored +inside `/etc/pve/`, they get automatically replicated to all other cluster +nodes. NOTE: CTIDs < 100 are reserved for internal purposes, and CTIDs need to be unique cluster wide. @@ -763,22 +857,26 @@ net0: bridge=vmbr0,hwaddr=66:64:66:64:64:36,ip=dhcp,name=eth0,type=veth rootfs: local:107/vm-107-disk-1.raw,size=7G ---- -Those configuration files are simple text files, and you can edit them -using a normal text editor (`vi`, `nano`, ...). This is sometimes -useful to do small corrections, but keep in mind that you need to -restart the container to apply such changes. +The configuration files are simple text files. You can edit them using a normal +text editor, for example, `vi` or `nano`. +This is sometimes useful to do small corrections, but keep in mind that you +need to restart the container to apply such changes. + +For that reason, it is usually better to use the `pct` command to generate and +modify those files, or do the whole thing using the GUI. +Our toolkit is smart enough to instantaneously apply most changes to running +containers. This feature is called ``hot plug'', and there is no need to restart +the container in that case. -For that reason, it is usually better to use the `pct` command to -generate and modify those files, or do the whole thing using the GUI. -Our toolkit is smart enough to instantaneously apply most changes to -running containers. This feature is called "hot plug", and there is no -need to restart the container in that case. +In cases where a change cannot be hot-plugged, it will be registered as a +pending change (shown in red color in the GUI). +They will only be applied after rebooting the container. File Format ~~~~~~~~~~~ -Container configuration files use a simple colon separated key/value +The container configuration file uses a simple colon separated key/value format. Each line has the following format: ----- @@ -786,29 +884,32 @@ format. Each line has the following format: OPTION: value ----- -Blank lines in those files are ignored, and lines starting with a `#` -character are treated as comments and are also ignored. +Blank lines in those files are ignored, and lines starting with a `#` character +are treated as comments and are also ignored. -It is possible to add low-level, LXC style configuration directly, for -example: +It is possible to add low-level, LXC style configuration directly, for example: - lxc.init_cmd: /sbin/my_own_init +---- +lxc.init_cmd: /sbin/my_own_init +---- or - lxc.init_cmd = /sbin/my_own_init +---- +lxc.init_cmd = /sbin/my_own_init +---- -Those settings are directly passed to the LXC low-level tools. +The settings are passed directly to the LXC low-level tools. [[pct_snapshots]] Snapshots ~~~~~~~~~ -When you create a snapshot, `pct` stores the configuration at snapshot -time into a separate snapshot section within the same configuration -file. For example, after creating a snapshot called ``testsnapshot'', -your configuration file will look like this: +When you create a snapshot, `pct` stores the configuration at snapshot time +into a separate snapshot section within the same configuration file. For +example, after creating a snapshot called ``testsnapshot'', your configuration +file will look like this: .Container configuration with snapshot ---- @@ -824,10 +925,9 @@ snaptime: 1457170803 ... ---- -There are a few snapshot related properties like `parent` and -`snaptime`. The `parent` property is used to store the parent/child -relationship between snapshots. `snaptime` is the snapshot creation -time stamp (Unix epoch). +There are a few snapshot related properties like `parent` and `snaptime`. The +`parent` property is used to store the parent/child relationship between +snapshots. `snaptime` is the snapshot creation time stamp (Unix epoch). [[pct_options]] @@ -840,14 +940,16 @@ include::pct.conf.5-opts.adoc[] Locks ----- -Container migrations, snapshots and backups (`vzdump`) set a lock to -prevent incompatible concurrent actions on the affected container. Sometimes -you need to remove such a lock manually (e.g., after a power failure). +Container migrations, snapshots and backups (`vzdump`) set a lock to prevent +incompatible concurrent actions on the affected container. Sometimes you need +to remove such a lock manually (e.g., after a power failure). - pct unlock +---- +# pct unlock +---- -CAUTION: Only do that if you are sure the action which set the lock is -no longer running. +CAUTION: Only do this if you are sure the action which set the lock is no +longer running. ifdef::manvolnum[] @@ -862,10 +964,3 @@ Configuration file for the container ''. include::pve-copyright.adoc[] endif::manvolnum[] - - - - - - -