The runtime costs for containers is low, usually negligible. However, there are
some drawbacks that need be considered:
-* Only Linux distributions can be run in containers.It is not possible to run
- other Operating Systems like, for example, FreeBSD or Microsoft Windows
+* Only Linux distributions can be run in Proxmox Containers. It is not possible to run
+ other operating systems like, for example, FreeBSD or Microsoft Windows
inside a container.
* For security reasons, access to host resources needs to be restricted.
- Containers run in their own separate namespaces. Additionally some syscalls
- are not allowed within containers.
+ Therefore, containers run in their own separate namespaces. Additionally some
+ syscalls (user space requests to the Linux kernel) are not allowed within containers.
-{pve} uses https://linuxcontainers.org/[Linux Containers (LXC)] as underlying
+{pve} uses https://linuxcontainers.org/lxc/introduction/[Linux Containers (LXC)] as its underlying
container technology. The ``Proxmox Container Toolkit'' (`pct`) simplifies the
-usage and management of LXC containers.
+usage and management of LXC, by providing an interface that abstracts
+complex tasks.
Containers are tightly integrated with {pve}. This means that they are aware of
the cluster setup, and they can use the same network and storage resources as
virtual machines. You can also use the {pve} firewall, or manage containers
using the HA framework.
-Our primary goal is to offer an environment as one would get from a VM, but
-without the additional overhead. We call this ``System Containers''.
+Our primary goal is to offer an environment that provides the benefits of using a
+VM, but without the additional overhead. This means that Proxmox Containers can
+be categorized as ``System Containers'', rather than ``Application Containers''.
-NOTE: If you want to run micro-containers, for example, 'Docker' or 'rkt', it
-is best to run them inside a VM.
+NOTE: If you want to run application containers, for example, 'Docker' images, it
+is recommended that you run them inside a Proxmox Qemu VM. This will give you
+all the advantages of application containerization, while also providing the
+benefits that VMs offer, such as strong isolation from the host and the ability
+to live-migrate, which otherwise isn't possible with containers.
Technology Overview
Container images, sometimes also referred to as ``templates'' or
``appliances'', are `tar` archives which contain everything to run a container.
-`pct` uses them to create a new container, for example:
-
-----
-# pct create 999 local:vztmpl/debian-10.0-standard_10.0-1_amd64.tar.gz
-----
{pve} itself provides a variety of basic templates for the most common Linux
distributions. They can be downloaded using the GUI or the `pveam` (short for
Additionally, https://www.turnkeylinux.org/[TurnKey Linux] container templates
are also available to download.
-The list of available templates is updated daily via cron. To trigger it
-manually:
+The list of available templates is updated daily through the 'pve-daily-update'
+timer. You can also trigger an update manually by executing:
----
# pveam update
.List available system images
----
# pveam available --section system
-system alpine-3.10-default_20190626_amd64.tar.xz
-system alpine-3.9-default_20190224_amd64.tar.xz
-system archlinux-base_20190924-1_amd64.tar.gz
-system centos-6-default_20191016_amd64.tar.xz
+system alpine-3.12-default_20200823_amd64.tar.xz
+system alpine-3.13-default_20210419_amd64.tar.xz
+system alpine-3.14-default_20210623_amd64.tar.xz
+system archlinux-base_20210420-1_amd64.tar.gz
system centos-7-default_20190926_amd64.tar.xz
-system centos-8-default_20191016_amd64.tar.xz
-system debian-10.0-standard_10.0-1_amd64.tar.gz
-system debian-8.0-standard_8.11-1_amd64.tar.gz
+system centos-8-default_20201210_amd64.tar.xz
system debian-9.0-standard_9.7-1_amd64.tar.gz
-system fedora-30-default_20190718_amd64.tar.xz
-system fedora-31-default_20191029_amd64.tar.xz
-system gentoo-current-default_20190718_amd64.tar.xz
-system opensuse-15.0-default_20180907_amd64.tar.xz
-system opensuse-15.1-default_20190719_amd64.tar.xz
+system debian-10-standard_10.7-1_amd64.tar.gz
+system devuan-3.0-standard_3.0_amd64.tar.gz
+system fedora-33-default_20201115_amd64.tar.xz
+system fedora-34-default_20210427_amd64.tar.xz
+system gentoo-current-default_20200310_amd64.tar.xz
+system opensuse-15.2-default_20200824_amd64.tar.xz
system ubuntu-16.04-standard_16.04.5-1_amd64.tar.gz
system ubuntu-18.04-standard_18.04.1-1_amd64.tar.gz
-system ubuntu-19.04-standard_19.04-1_amd64.tar.gz
-system ubuntu-19.10-standard_19.10-1_amd64.tar.gz
+system ubuntu-20.04-standard_20.04-1_amd64.tar.gz
+system ubuntu-20.10-standard_20.10-1_amd64.tar.gz
+system ubuntu-21.04-standard_21.04-1_amd64.tar.gz
----
Before you can use such a template, you need to download them into one of your
-storages. You can simply use storage `local` for that purpose. For clustered
-installations, it is preferred to use a shared storage so that all nodes can
-access those images.
+storages. If you're unsure to which one, you can simply use the `local` named
+storage for that purpose. For clustered installations, it is preferred to use a
+shared storage so that all nodes can access those images.
----
# pveam download local debian-10.0-standard_10.0-1_amd64.tar.gz
local:vztmpl/debian-10.0-standard_10.0-1_amd64.tar.gz 219.95MB
----
-The above command shows you the full {pve} volume identifiers. They include the
-storage name, and most other {pve} commands can use them. For example you can
-delete that image later with:
-
-----
-# pveam remove local:vztmpl/debian-10.0-standard_10.0-1_amd64.tar.gz
-----
-
-[[pct_container_storage]]
-Container Storage
------------------
-
-The {pve} LXC container storage model is more flexible than traditional
-container storage models. A container can have multiple mount points. This
-makes it possible to use the best suited storage for each application.
-
-For example the root file system of the container can be on slow and cheap
-storage while the database can be on fast and distributed storage via a second
-mount point. See section <<pct_mount_points, Mount Points>> for further
-details.
-
-Any storage type supported by the {pve} storage library can be used. This means
-that containers can be stored on local (for example `lvm`, `zfs` or directory),
-shared external (like `iSCSI`, `NFS`) or even distributed storage systems like
-Ceph. Advanced storage features like snapshots or clones can be used if the
-underlying storage supports them. The `vzdump` backup tool can use snapshots to
-provide consistent container backups.
-
-Furthermore, local devices or local directories can be mounted directly using
-'bind mounts'. This gives access to local resources inside a container with
-practically zero overhead. Bind mounts can be used as an easy way to share data
-between containers.
-
-
-FUSE Mounts
-~~~~~~~~~~~
-
-WARNING: Because of existing issues in the Linux kernel's freezer subsystem the
-usage of FUSE mounts inside a container is strongly advised against, as
-containers need to be frozen for suspend or snapshot mode backups.
-
-If FUSE mounts cannot be replaced by other mounting mechanisms or storage
-technologies, it is possible to establish the FUSE mount on the Proxmox host
-and use a bind mount point to make it accessible inside the container.
-
-
-Using Quotas Inside Containers
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Quotas allow to set limits inside a container for the amount of disk space that
-each user can use.
-
-NOTE: This only works on ext4 image based storage types and currently only
-works with privileged containers.
-
-Activating the `quota` option causes the following mount options to be used for
-a mount point:
-`usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0`
-
-This allows quotas to be used like on any other system. You can initialize the
-`/aquota.user` and `/aquota.group` files by running:
-
-----
-# quotacheck -cmug /
-# quotaon /
-----
-
-Then edit the quotas using the `edquota` command. Refer to the documentation of
-the distribution running inside the container for details.
-
-NOTE: You need to run the above commands for every mount point by passing the
-mount point's path instead of just `/`.
-
-
-Using ACLs Inside Containers
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+TIP: You can also use the {pve} web interface GUI to download, list and delete
+container templates.
-The standard Posix **A**ccess **C**ontrol **L**ists are also available inside
-containers. ACLs allow you to set more detailed file ownership than the
-traditional user/group/others model.
-
-
-Backup of Container mount points
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-To include a mount point in backups, enable the `backup` option for it in the
-container configuration. For an existing mount point `mp0`
+`pct` uses them to create a new container, for example:
----
-mp0: guests:subvol-100-disk-1,mp=/root/files,size=8G
+# pct create 999 local:vztmpl/debian-10.0-standard_10.0-1_amd64.tar.gz
----
-add `backup=1` to enable it.
+The above command shows you the full {pve} volume identifiers. They include the
+storage name, and most other {pve} commands can use them. For example you can
+delete that image later with:
----
-mp0: guests:subvol-100-disk-1,mp=/root/files,size=8G,backup=1
+# pveam remove local:vztmpl/debian-10.0-standard_10.0-1_amd64.tar.gz
----
-NOTE: When creating a new mount point in the GUI, this option is enabled by
-default.
-
-To disable backups for a mount point, add `backup=0` in the way described
-above, or uncheck the *Backup* checkbox on the GUI.
-
-Replication of Containers mount points
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-By default, additional mount points are replicated when the Root Disk is
-replicated. If you want the {pve} storage replication mechanism to skip a mount
-point, you can set the *Skip replication* option for that mount point.
-As of {pve} 5.0, replication requires a storage of type `zfspool`. Adding a
-mount point to a different type of storage when the container has replication
-configured requires to have *Skip replication* enabled for that mount point.
[[pct_settings]]
Container Settings
Containers use the kernel of the host system. This exposes an attack surface
for malicious users. In general, full virtual machines provide better
-isolation. This should be considered if containers are provided to unkown or
+isolation. This should be considered if containers are provided to unknown or
untrusted people.
To reduce the attack surface, LXC uses many security features like AppArmor,
configuration file located at `/etc/pve/lxc/CTID.conf`:
----
-lxc.apparmor_profile = unconfined
+lxc.apparmor.profile = unconfined
----
WARNING: Please note that this is not recommended for production use.
-// TODO: describe cgroups + seccomp a bit more.
+[[pct_cgroup]]
+Control Groups ('cgroup')
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+'cgroup' is a kernel
+mechanism used to hierarchically organize processes and distribute system
+resources.
+
+The main resources controlled via 'cgroups' are CPU time, memory and swap
+limits, and access to device nodes. 'cgroups' are also used to "freeze" a
+container before taking snapshots.
+
+There are 2 versions of 'cgroups' currently available,
+https://www.kernel.org/doc/html/v5.11/admin-guide/cgroup-v1/index.html[legacy]
+and
+https://www.kernel.org/doc/html/v5.11/admin-guide/cgroup-v2.html['cgroupv2'].
+
+Since {pve} 7.0, the default is a pure 'cgroupv2' environment. Previously a
+"hybrid" setup was used, where resource control was mainly done in 'cgroupv1'
+with an additional 'cgroupv2' controller which could take over some subsystems
+via the 'cgroup_no_v1' kernel command line parameter. (See the
+https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html[kernel
+parameter documentation] for details.)
+
+[[pct_cgroup_compat]]
+CGroup Version Compatibility
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The main difference between pure 'cgroupv2' and the old hybrid environments
+regarding {pve} is that with 'cgroupv2' memory and swap are now controlled
+independently. The memory and swap settings for containers can map directly to
+these values, whereas previously only the memory limit and the limit of the
+*sum* of memory and swap could be limited.
+
+Another important difference is that the 'devices' controller is configured in a
+completely different way. Because of this, file system quotas are currently not
+supported in a pure 'cgroupv2' environment.
+
+'cgroupv2' support by the container's OS is needed to run in a pure 'cgroupv2'
+environment. Containers running 'systemd' version 231 or newer support
+'cgroupv2' footnote:[this includes all newest major versions of container
+templates shipped by {pve}], as do containers not using 'systemd' as init
+system footnote:[for example Alpine Linux].
+
+[NOTE]
+====
+CentOS 7 and Ubuntu 16.10 are two prominent Linux distributions releases,
+which have a 'systemd' version that is too old to run in a 'cgroupv2'
+environment, you can either
+
+* Upgrade the whole distribution to a newer release. For the examples above, that
+ could be Ubuntu 18.04 or 20.04, and CentOS 8 (or RHEL/CentOS derivatives like
+ AlmaLinux or Rocky Linux). This has the benefit to get the newest bug and
+ security fixes, often also new features, and moving the EOL date in the future.
+
+* Upgrade the Containers systemd version. If the distribution provides a
+ backports repository this can be an easy and quick stop-gap measurement.
+
+* Move the container, or its services, to a Virtual Machine. Virtual Machines
+ have a much less interaction with the host, that's why one can install
+ decades old OS versions just fine there.
+
+* Switch back to the legacy 'cgroup' controller. Note that while it can be a
+ valid solution, it's not a permanent one. There's a high likelihood that a
+ future {pve} major release, for example 8.0, cannot support the legacy
+ controller anymore.
+====
+
+[[pct_cgroup_change_version]]
+Changing CGroup Version
+^^^^^^^^^^^^^^^^^^^^^^^
+
+TIP: If file system quotas are not required and all containers support 'cgroupv2',
+it is recommended to stick to the new default.
+
+To switch back to the previous version the following kernel command line
+parameter can be used:
+
+----
+systemd.unified_cgroup_hierarchy=0
+----
+
+See xref:sysboot_edit_kernel_cmdline[this section] on editing the kernel boot
+command line on where to add the parameter.
+
+// TODO: seccomp a bit more.
// TODO: pve-lxc-syscalld
detected type.
+[[pct_container_storage]]
+Container Storage
+-----------------
+
+The {pve} LXC container storage model is more flexible than traditional
+container storage models. A container can have multiple mount points. This
+makes it possible to use the best suited storage for each application.
+
+For example the root file system of the container can be on slow and cheap
+storage while the database can be on fast and distributed storage via a second
+mount point. See section <<pct_mount_points, Mount Points>> for further
+details.
+
+Any storage type supported by the {pve} storage library can be used. This means
+that containers can be stored on local (for example `lvm`, `zfs` or directory),
+shared external (like `iSCSI`, `NFS`) or even distributed storage systems like
+Ceph. Advanced storage features like snapshots or clones can be used if the
+underlying storage supports them. The `vzdump` backup tool can use snapshots to
+provide consistent container backups.
+
+Furthermore, local devices or local directories can be mounted directly using
+'bind mounts'. This gives access to local resources inside a container with
+practically zero overhead. Bind mounts can be used as an easy way to share data
+between containers.
+
+
+FUSE Mounts
+~~~~~~~~~~~
+
+WARNING: Because of existing issues in the Linux kernel's freezer subsystem the
+usage of FUSE mounts inside a container is strongly advised against, as
+containers need to be frozen for suspend or snapshot mode backups.
+
+If FUSE mounts cannot be replaced by other mounting mechanisms or storage
+technologies, it is possible to establish the FUSE mount on the Proxmox host
+and use a bind mount point to make it accessible inside the container.
+
+
+Using Quotas Inside Containers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Quotas allow to set limits inside a container for the amount of disk space that
+each user can use.
+
+NOTE: This currently requires the use of legacy 'cgroups'.
+
+NOTE: This only works on ext4 image based storage types and currently only
+works with privileged containers.
+
+Activating the `quota` option causes the following mount options to be used for
+a mount point:
+`usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0`
+
+This allows quotas to be used like on any other system. You can initialize the
+`/aquota.user` and `/aquota.group` files by running:
+
+----
+# quotacheck -cmug /
+# quotaon /
+----
+
+Then edit the quotas using the `edquota` command. Refer to the documentation of
+the distribution running inside the container for details.
+
+NOTE: You need to run the above commands for every mount point by passing the
+mount point's path instead of just `/`.
+
+
+Using ACLs Inside Containers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The standard Posix **A**ccess **C**ontrol **L**ists are also available inside
+containers. ACLs allow you to set more detailed file ownership than the
+traditional user/group/others model.
+
+
+Backup of Container mount points
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To include a mount point in backups, enable the `backup` option for it in the
+container configuration. For an existing mount point `mp0`
+
+----
+mp0: guests:subvol-100-disk-1,mp=/root/files,size=8G
+----
+
+add `backup=1` to enable it.
+
+----
+mp0: guests:subvol-100-disk-1,mp=/root/files,size=8G,backup=1
+----
+
+NOTE: When creating a new mount point in the GUI, this option is enabled by
+default.
+
+To disable backups for a mount point, add `backup=0` in the way described
+above, or uncheck the *Backup* checkbox on the GUI.
+
+Replication of Containers mount points
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+By default, additional mount points are replicated when the Root Disk is
+replicated. If you want the {pve} storage replication mechanism to skip a mount
+point, you can set the *Skip replication* option for that mount point.
+As of {pve} 5.0, replication requires a storage of type `zfspool`. Adding a
+mount point to a different type of storage when the container has replication
+configured requires to have *Skip replication* enabled for that mount point.
+
+
Backup and Restore
------------------
# pct set 100 -memory 512
----
+Destroying a container always removes it from Access Control Lists and it always
+removes the firewall configuration of the container. You have to activate
+'--purge', if you want to additionally remove the container from replication jobs,
+backup jobs and HA resource configurations.
+
+----
+# pct destroy 100 --purge
+----
+
+
Obtaining Debugging Logs
~~~~~~~~~~~~~~~~~~~~~~~~
In case `pct start` is unable to start a specific container, it might be
-helpful to collect debugging output by running `lxc-start` (replace `ID` with
-the container's ID):
+helpful to collect debugging output by passing the `--debug` flag (replace `CTID` with
+the container's CTID):
+
+----
+# pct start CTID --debug
+----
+
+Alternatively, you can use the following `lxc-start` command, which will save
+the debug log to the file specified by the `-o` output option:
----
-# lxc-start -n ID -F -l DEBUG -o /tmp/lxc-ID.log
+# lxc-start -n CTID -F -l DEBUG -o /tmp/lxc-CTID.log
----
This command will attempt to start the container in foreground mode, to stop
-the container run `pct shutdown ID` or `pct stop ID` in a second terminal.
+the container run `pct shutdown CTID` or `pct stop CTID` in a second terminal.
-The collected debug log is written to `/tmp/lxc-ID.log`.
+The collected debug log is written to `/tmp/lxc-CTID.log`.
NOTE: If you have changed the container's configuration since the last start
attempt with `pct start`, you need to run `pct start` at least once to also
mount points defined, the migration will copy the content over the network to
the target host if the same storage is defined there.
-Running containers cannot live-migrated due to techincal limitations. You can
+Running containers cannot live-migrated due to technical limitations. You can
do a restart migration, which shuts down, moves and then starts a container
again on the target node. As containers are very lightweight, this results
normally only in a downtime of some hundreds of milliseconds.