X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=pct.adoc;h=2cb4bbe8c4eacf0da871ef030ecce624a672415a;hp=da7db1c60936d96131962a751866d57adc3db519;hb=a45c999b4586734621bbc968d67f87390739b270;hpb=5f09af76d7282a043be8fa5439349272f506cf02 diff --git a/pct.adoc b/pct.adoc index da7db1c..2cb4bbe 100644 --- a/pct.adoc +++ b/pct.adoc @@ -1,8 +1,7 @@ +[[chapter_pct]] ifdef::manvolnum[] -PVE({manvolnum}) -================ -include::attributes.txt[] - +pct(1) +====== :pve-toplevel: NAME @@ -23,11 +22,10 @@ endif::manvolnum[] ifndef::manvolnum[] Proxmox Container Toolkit ========================= -include::attributes.txt[] +:pve-toplevel: endif::manvolnum[] - ifdef::wiki[] -:pve-toplevel: +:title: Linux Container endif::wiki[] Containers are a lightweight alternative to fully virtualized @@ -68,133 +66,46 @@ NOTE: If you want to run micro-containers (with docker, rkt, ...), it is best to run them inside a VM. -Security Considerations ------------------------ - -Containers use the same kernel as the host, so there is a big attack -surface for malicious users. You should consider this fact if you -provide containers to totally untrusted people. In general, fully -virtualized VMs provide better isolation. - -The good news is that LXC uses many kernel security features like -AppArmor, CGroups and PID and user namespaces, which makes containers -usage quite secure. We distinguish two types of containers: - - -Privileged Containers -~~~~~~~~~~~~~~~~~~~~~ - -Security is done by dropping capabilities, using mandatory access -control (AppArmor), SecComp filters and namespaces. The LXC team -considers this kind of container as unsafe, and they will not consider -new container escape exploits to be security issues worthy of a CVE -and quick fix. So you should use this kind of containers only inside a -trusted environment, or when no untrusted task is running as root in -the container. - - -Unprivileged Containers -~~~~~~~~~~~~~~~~~~~~~~~ - -This kind of containers use a new kernel feature called user -namespaces. The root UID 0 inside the container is mapped to an -unprivileged user outside the container. This means that most security -issues (container escape, resource abuse, ...) in those containers -will affect a random unprivileged user, and so would be a generic -kernel security bug rather than an LXC issue. The LXC team thinks -unprivileged containers are safe by design. - - -Configuration -------------- - -The `/etc/pve/lxc/.conf` file stores container configuration, -where `` is the numeric ID of the given container. Like all -other files stored inside `/etc/pve/`, they get automatically -replicated to all other cluster nodes. - -NOTE: CTIDs < 100 are reserved for internal purposes, and CTIDs need to be -unique cluster wide. - -.Example Container Configuration ----- -ostype: debian -arch: amd64 -hostname: www -memory: 512 -swap: 512 -net0: bridge=vmbr0,hwaddr=66:64:66:64:64:36,ip=dhcp,name=eth0,type=veth -rootfs: local:107/vm-107-disk-1.raw,size=7G ----- - -Those configuration files are simple text files, and you can edit them -using a normal text editor (`vi`, `nano`, ...). This is sometimes -useful to do small corrections, but keep in mind that you need to -restart the container to apply such changes. - -For that reason, it is usually better to use the `pct` command to -generate and modify those files, or do the whole thing using the GUI. -Our toolkit is smart enough to instantaneously apply most changes to -running containers. This feature is called "hot plug", and there is no -need to restart the container in that case. - - -File Format -~~~~~~~~~~~ +Technology Overview +------------------- -Container configuration files use a simple colon separated key/value -format. Each line has the following format: +* LXC (https://linuxcontainers.org/) ------ -# this is a comment -OPTION: value ------ +* Integrated into {pve} graphical user interface (GUI) -Blank lines in those files are ignored, and lines starting with a `#` -character are treated as comments and are also ignored. +* Easy to use command line tool `pct` -It is possible to add low-level, LXC style configuration directly, for -example: +* Access via {pve} REST API - lxc.init_cmd: /sbin/my_own_init +* lxcfs to provide containerized /proc file system -or +* AppArmor/Seccomp to improve security - lxc.init_cmd = /sbin/my_own_init +* CRIU: for live migration (planned) -Those settings are directly passed to the LXC low-level tools. +* Runs on modern Linux kernels +* Image based deployment (templates) -Snapshots -~~~~~~~~~ +* Use {pve} storage library -When you create a snapshot, `pct` stores the configuration at snapshot -time into a separate snapshot section within the same configuration -file. For example, after creating a snapshot called ``testsnapshot'', -your configuration file will look like this: +* Container setup from host (network, DNS, storage, ...) -.Container configuration with snapshot ----- -memory: 512 -swap: 512 -parent: testsnaphot -... -[testsnaphot] -memory: 512 -swap: 512 -snaptime: 1457170803 -... ----- +Security Considerations +----------------------- -There are a few snapshot related properties like `parent` and -`snaptime`. The `parent` property is used to store the parent/child -relationship between snapshots. `snaptime` is the snapshot creation -time stamp (Unix epoch). +Containers use the same kernel as the host, so there is a big attack +surface for malicious users. You should consider this fact if you +provide containers to totally untrusted people. In general, fully +virtualized VMs provide better isolation. +The good news is that LXC uses many kernel security features like +AppArmor, CGroups and PID and user namespaces, which makes containers +usage quite secure. Guest Operating System Configuration -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------------------ We normally try to detect the operating system type inside the container, and then modify some files inside the container to make @@ -232,7 +143,7 @@ and will not be moved. Modification of a file can be prevented by adding a `.pve-ignore.` file for it. For instance, if the file `/etc/.pve-ignore.hosts` exists then the `/etc/hosts` file will not be touched. This can be a -simple empty file creatd via: +simple empty file created via: # touch /etc/.pve-ignore.hosts @@ -261,12 +172,7 @@ NOTE: Container start fails if the configured `ostype` differs from the auto detected type. -Options -~~~~~~~ - -include::pct.conf.5-opts.adoc[] - - +[[pct_container_images]] Container Images ---------------- @@ -333,6 +239,7 @@ example you can delete that image later with: pveam remove local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz +[[pct_container_storage]] Container Storage ----------------- @@ -348,7 +255,8 @@ allows you to choose a suitable storage for each application. For example, you can use a relatively slow (and thus cheap) storage for the container root file system. Then you can use a second mount point to mount a very fast, distributed storage for your database -application. +application. See section <> for further +details. The second big improvement is that you can use any storage type supported by the {pve} storage library. That means that you can store @@ -364,9 +272,193 @@ local storage inside containers with zero overhead. Such bind mounts also provide an easy way to share data between different containers. +FUSE Mounts +~~~~~~~~~~~ + +WARNING: Because of existing issues in the Linux kernel's freezer +subsystem the usage of FUSE mounts inside a container is strongly +advised against, as containers need to be frozen for suspend or +snapshot mode backups. + +If FUSE mounts cannot be replaced by other mounting mechanisms or storage +technologies, it is possible to establish the FUSE mount on the Proxmox host +and use a bind mount point to make it accessible inside the container. + + +Using Quotas Inside Containers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Quotas allow to set limits inside a container for the amount of disk +space that each user can use. This only works on ext4 image based +storage types and currently does not work with unprivileged +containers. + +Activating the `quota` option causes the following mount options to be +used for a mount point: +`usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0` + +This allows quotas to be used like you would on any other system. You +can initialize the `/aquota.user` and `/aquota.group` files by running + +---- +quotacheck -cmug / +quotaon / +---- + +and edit the quotas via the `edquota` command. Refer to the documentation +of the distribution running inside the container for details. + +NOTE: You need to run the above commands for every mount point by passing +the mount point's path instead of just `/`. + + +Using ACLs Inside Containers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The standard Posix **A**ccess **C**ontrol **L**ists are also available inside containers. +ACLs allow you to set more detailed file ownership than the traditional user/ +group/others model. + + +Backup of Containers mount points +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By default additional mount points besides the Root Disk mount point are not +included in backups. You can reverse this default behavior by setting the +*Backup* option on a mount point. +// see PVE::VZDump::LXC::prepare() + +Replication of Containers mount points +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By default additional mount points are replicated when the Root Disk +is replicated. If you want the {pve} storage replication mechanism to skip a + mount point when starting a replication job, you can set the +*Skip replication* option on that mount point. + +As of {pve} 5.0, replication requires a storage of type `zfspool`, so adding a + mount point to a different type of storage when the container has replication + configured requires to *Skip replication* for that mount point. + + +[[pct_settings]] +Container Settings +------------------ + +[[pct_general]] +General Settings +~~~~~~~~~~~~~~~~ + +[thumbnail="screenshot/gui-create-ct-general.png"] + +General settings of a container include + +* the *Node* : the physical server on which the container will run +* the *CT ID*: a unique number in this {pve} installation used to identify your container +* *Hostname*: the hostname of the container +* *Resource Pool*: a logical group of containers and VMs +* *Password*: the root password of the container +* *SSH Public Key*: a public key for connecting to the root account over SSH +* *Unprivileged container*: this option allows to choose at creation time +if you want to create a privileged or unprivileged container. + + +Privileged Containers +^^^^^^^^^^^^^^^^^^^^^ + +Security is done by dropping capabilities, using mandatory access +control (AppArmor), SecComp filters and namespaces. The LXC team +considers this kind of container as unsafe, and they will not consider +new container escape exploits to be security issues worthy of a CVE +and quick fix. So you should use this kind of containers only inside a +trusted environment, or when no untrusted task is running as root in +the container. + + +Unprivileged Containers +^^^^^^^^^^^^^^^^^^^^^^^ + +This kind of containers use a new kernel feature called user +namespaces. The root UID 0 inside the container is mapped to an +unprivileged user outside the container. This means that most security +issues (container escape, resource abuse, ...) in those containers +will affect a random unprivileged user, and so would be a generic +kernel security bug rather than an LXC issue. The LXC team thinks +unprivileged containers are safe by design. + +NOTE: If the container uses systemd as an init system, please be +aware the systemd version running inside the container should be equal +or greater than 220. + +[[pct_cpu]] +CPU +~~~ + +[thumbnail="screenshot/gui-create-ct-cpu.png"] + +You can restrict the number of visible CPUs inside the container using +the `cores` option. This is implemented using the Linux 'cpuset' +cgroup (**c**ontrol *group*). A special task inside `pvestatd` tries +to distribute running containers among available CPUs. You can view +the assigned CPUs using the following command: + +---- +# pct cpusets + --------------------- + 102: 6 7 + 105: 2 3 4 5 + 108: 0 1 + --------------------- +---- + +Containers use the host kernel directly, so all task inside a +container are handled by the host CPU scheduler. {pve} uses the Linux +'CFS' (**C**ompletely **F**air **S**cheduler) scheduler by default, +which has additional bandwidth control options. + +[horizontal] + +`cpulimit`: :: You can use this option to further limit assigned CPU +time. Please note that this is a floating point number, so it is +perfectly valid to assign two cores to a container, but restrict +overall CPU consumption to half a core. ++ +---- +cores: 2 +cpulimit: 0.5 +---- + +`cpuunits`: :: This is a relative weight passed to the kernel +scheduler. The larger the number is, the more CPU time this container +gets. Number is relative to the weights of all the other running +containers. The default is 1024. You can use this setting to +prioritize some containers. + + +[[pct_memory]] +Memory +~~~~~~ + +[thumbnail="screenshot/gui-create-ct-memory.png"] + +Container memory is controlled using the cgroup memory controller. + +[horizontal] + +`memory`: :: Limit overall memory usage. This corresponds +to the `memory.limit_in_bytes` cgroup setting. + +`swap`: :: Allows the container to use additional swap memory from the +host swap space. This corresponds to the `memory.memsw.limit_in_bytes` +cgroup setting, which is set to the sum of both value (`memory + +swap`). + + +[[pct_mount_points]] Mount Points ~~~~~~~~~~~~ +[thumbnail="screenshot/gui-create-ct-root-disk.png"] + The root mount point is configured with the `rootfs` property, and you can configure up to 10 additional mount points. The corresponding options are called `mp0` to `mp9`, and they can contain the following setting: @@ -395,6 +487,13 @@ in three different flavors: - Directories: passing `size=0` triggers a special case where instead of a raw image a directory is created. +NOTE: The special option syntax `STORAGE_ID:SIZE_IN_GB` for storage backed +mount point volumes will automatically allocate a volume of the specified size +on the specified storage. E.g., calling +`pct set 100 -mp0 thin1:10,mp=/path/in/container` will allocate a 10GB volume +on the storage `thin1` and replace the volume ID place holder `10` with the +allocated volume ID. + Bind Mount Points ^^^^^^^^^^^^^^^^^ @@ -442,63 +541,65 @@ more features. NOTE: The contents of device mount points are not backed up when using `vzdump`. -FUSE Mounts -~~~~~~~~~~~ - -WARNING: Because of existing issues in the Linux kernel's freezer -subsystem the usage of FUSE mounts inside a container is strongly -advised against, as containers need to be frozen for suspend or -snapshot mode backups. - -If FUSE mounts cannot be replaced by other mounting mechanisms or storage -technologies, it is possible to establish the FUSE mount on the Proxmox host -and use a bind mount point to make it accessible inside the container. - +[[pct_container_network]] +Network +~~~~~~~ -Using Quotas Inside Containers -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +[thumbnail="screenshot/gui-create-ct-network.png"] -Quotas allow to set limits inside a container for the amount of disk -space that each user can use. This only works on ext4 image based -storage types and currently does not work with unprivileged -containers. +You can configure up to 10 network interfaces for a single +container. The corresponding options are called `net0` to `net9`, and +they can contain the following setting: -Activating the `quota` option causes the following mount options to be -used for a mount point: -`usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0` +include::pct-network-opts.adoc[] -This allows quotas to be used like you would on any other system. You -can initialize the `/aquota.user` and `/aquota.group` files by running ----- -quotacheck -cmug / -quotaon / ----- +[[pct_startup_and_shutdown]] +Automatic Start and Shutdown of Containers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -and edit the quotas via the `edquota` command. Refer to the documentation -of the distribution running inside the container for details. +After creating your containers, you probably want them to start automatically +when the host system boots. For this you need to select the option 'Start at +boot' from the 'Options' Tab of your container in the web interface, or set it with +the following command: -NOTE: You need to run the above commands for every mount point by passing -the mount point's path instead of just `/`. + pct set -onboot 1 +.Start and Shutdown Order +// use the screenshot from qemu - its the same +[thumbnail="screenshot/gui-qemu-edit-start-order.png"] -Using ACLs Inside Containers -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If you want to fine tune the boot order of your containers, you can use the following +parameters : -The standard Posix **A**ccess **C**ontrol **L**ists are also available inside containers. -ACLs allow you to set more detailed file ownership than the traditional user/ -group/others model. +* *Start/Shutdown order*: Defines the start order priority. E.g. set it to 1 if +you want the CT to be the first to be started. (We use the reverse startup +order for shutdown, so a container with a start order of 1 would be the last to +be shut down) +* *Startup delay*: Defines the interval between this container start and subsequent +containers starts . E.g. set it to 240 if you want to wait 240 seconds before starting +other containers. +* *Shutdown timeout*: Defines the duration in seconds {pve} should wait +for the container to be offline after issuing a shutdown command. +By default this value is set to 60, which means that {pve} will issue a +shutdown request, wait 60s for the machine to be offline, and if after 60s +the machine is still online will notify that the shutdown action failed. +Please note that containers without a Start/Shutdown order parameter will always +start after those where the parameter is set, and this parameter only +makes sense between the machines running locally on a host, and not +cluster-wide. -Container Network ------------------ +Hookscripts +~~~~~~~~~~~ -You can configure up to 10 network interfaces for a single -container. The corresponding options are called `net0` to `net9`, and -they can contain the following setting: +You can add a hook script to CTs with the config property `hookscript`. -include::pct-network-opts.adoc[] + pct set 100 -hookscript local:snippets/hookscript.pl +It will be called during various phases of the guests lifetime. +For an example and documentation see the example script under +`/usr/share/pve-docs/examples/guest-example-hookscript.pl`. Backup and Restore ------------------ @@ -628,59 +729,147 @@ NOTE: If you have changed the container's configuration since the last start attempt with `pct start`, you need to run `pct start` at least once to also update the configuration used by `lxc-start`. +[[pct_migration]] +Migration +--------- -Files ------- +If you have a cluster, you can migrate your Containers with -`/etc/pve/lxc/.conf`:: + pct migrate -Configuration file for the container ''. +This works as long as your Container is offline. If it has local volumes or +mountpoints defined, the migration will copy the content over the network to +the target host if there is the same storage defined. +If you want to migrate online Containers, the only way is to use +restart migration. This can be initiated with the -restart flag and the optional +-timeout parameter. -Container Advantages --------------------- +A restart migration will shut down the Container and kill it after the specified +timeout (the default is 180 seconds). Then it will migrate the Container +like an offline migration and when finished, it starts the Container on the +target node. -* Simple, and fully integrated into {pve}. Setup looks similar to a normal - VM setup. +[[pct_configuration]] +Configuration +------------- -** Storage (ZFS, LVM, NFS, Ceph, ...) +The `/etc/pve/lxc/.conf` file stores container configuration, +where `` is the numeric ID of the given container. Like all +other files stored inside `/etc/pve/`, they get automatically +replicated to all other cluster nodes. -** Network +NOTE: CTIDs < 100 are reserved for internal purposes, and CTIDs need to be +unique cluster wide. -** Authentication +.Example Container Configuration +---- +ostype: debian +arch: amd64 +hostname: www +memory: 512 +swap: 512 +net0: bridge=vmbr0,hwaddr=66:64:66:64:64:36,ip=dhcp,name=eth0,type=veth +rootfs: local:107/vm-107-disk-1.raw,size=7G +---- -** Cluster +Those configuration files are simple text files, and you can edit them +using a normal text editor (`vi`, `nano`, ...). This is sometimes +useful to do small corrections, but keep in mind that you need to +restart the container to apply such changes. -* Fast: minimal overhead, as fast as bare metal +For that reason, it is usually better to use the `pct` command to +generate and modify those files, or do the whole thing using the GUI. +Our toolkit is smart enough to instantaneously apply most changes to +running containers. This feature is called "hot plug", and there is no +need to restart the container in that case. -* High density (perfect for idle workloads) -* REST API +File Format +~~~~~~~~~~~ -* Direct hardware access +Container configuration files use a simple colon separated key/value +format. Each line has the following format: +----- +# this is a comment +OPTION: value +----- -Technology Overview -------------------- +Blank lines in those files are ignored, and lines starting with a `#` +character are treated as comments and are also ignored. -* Integrated into {pve} graphical user interface (GUI) +It is possible to add low-level, LXC style configuration directly, for +example: -* LXC (https://linuxcontainers.org/) + lxc.init_cmd: /sbin/my_own_init -* lxcfs to provide containerized /proc file system +or -* AppArmor + lxc.init_cmd = /sbin/my_own_init -* CRIU: for live migration (planned) +Those settings are directly passed to the LXC low-level tools. -* We use latest available kernels (4.4.X) -* Image based deployment (templates) +[[pct_snapshots]] +Snapshots +~~~~~~~~~ -* Container setup from host (network, DNS, storage, ...) +When you create a snapshot, `pct` stores the configuration at snapshot +time into a separate snapshot section within the same configuration +file. For example, after creating a snapshot called ``testsnapshot'', +your configuration file will look like this: + +.Container configuration with snapshot +---- +memory: 512 +swap: 512 +parent: testsnaphot +... + +[testsnaphot] +memory: 512 +swap: 512 +snaptime: 1457170803 +... +---- + +There are a few snapshot related properties like `parent` and +`snaptime`. The `parent` property is used to store the parent/child +relationship between snapshots. `snaptime` is the snapshot creation +time stamp (Unix epoch). + + +[[pct_options]] +Options +~~~~~~~ + +include::pct.conf.5-opts.adoc[] + + +Locks +----- + +Container migrations, snapshots and backups (`vzdump`) set a lock to +prevent incompatible concurrent actions on the affected container. Sometimes +you need to remove such a lock manually (e.g., after a power failure). + + pct unlock + +CAUTION: Only do that if you are sure the action which set the lock is +no longer running. ifdef::manvolnum[] + +Files +------ + +`/etc/pve/lxc/.conf`:: + +Configuration file for the container ''. + + include::pve-copyright.adoc[] endif::manvolnum[]