X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=pct.adoc;h=2cb4bbe8c4eacf0da871ef030ecce624a672415a;hp=59969aa4a027b6f13c3852bbd65d58b5eeb4b10d;hb=856993e4166495537f42e0b9c3a51c966227feab;hpb=8c1189b640ae7d10119ff1c046580f48749d38bd diff --git a/pct.adoc b/pct.adoc index 59969aa..2cb4bbe 100644 --- a/pct.adoc +++ b/pct.adoc @@ -1,7 +1,8 @@ +[[chapter_pct]] ifdef::manvolnum[] -PVE({manvolnum}) -================ -include::attributes.txt[] +pct(1) +====== +:pve-toplevel: NAME ---- @@ -9,7 +10,7 @@ NAME pct - Tool to manage Linux Containers (LXC) on Proxmox VE -SYNOPSYS +SYNOPSIS -------- include::pct.1-synopsis.adoc[] @@ -21,9 +22,11 @@ endif::manvolnum[] ifndef::manvolnum[] Proxmox Container Toolkit ========================= -include::attributes.txt[] +:pve-toplevel: endif::manvolnum[] - +ifdef::wiki[] +:title: Linux Container +endif::wiki[] Containers are a lightweight alternative to fully virtualized VMs. Instead of emulating a complete Operating System (OS), containers @@ -63,127 +66,46 @@ NOTE: If you want to run micro-containers (with docker, rkt, ...), it is best to run them inside a VM. -Security Considerations ------------------------ - -Containers use the same kernel as the host, so there is a big attack -surface for malicious users. You should consider this fact if you -provide containers to totally untrusted people. In general, fully -virtualized VMs provide better isolation. - -The good news is that LXC uses many kernel security features like -AppArmor, CGroups and PID and user namespaces, which makes containers -usage quite secure. We distinguish two types of containers: - -Privileged containers -~~~~~~~~~~~~~~~~~~~~~ - -Security is done by dropping capabilities, using mandatory access -control (AppArmor), SecComp filters and namespaces. The LXC team -considers this kind of container as unsafe, and they will not consider -new container escape exploits to be security issues worthy of a CVE -and quick fix. So you should use this kind of containers only inside a -trusted environment, or when no untrusted task is running as root in -the container. - -Unprivileged containers -~~~~~~~~~~~~~~~~~~~~~~~ - -This kind of containers use a new kernel feature called user -namespaces. The root uid 0 inside the container is mapped to an -unprivileged user outside the container. This means that most security -issues (container escape, resource abuse, ...) in those containers -will affect a random unprivileged user, and so would be a generic -kernel security bug rather than an LXC issue. The LXC team thinks -unprivileged containers are safe by design. - - -Configuration -------------- - -The `/etc/pve/lxc/.conf` file stores container configuration, -where `` is the numeric ID of the given container. Like all -other files stored inside `/etc/pve/`, they get automatically -replicated to all other cluster nodes. - -NOTE: CTIDs < 100 are reserved for internal purposes, and CTIDs need to be -unique cluster wide. - -.Example Container Configuration ----- -ostype: debian -arch: amd64 -hostname: www -memory: 512 -swap: 512 -net0: bridge=vmbr0,hwaddr=66:64:66:64:64:36,ip=dhcp,name=eth0,type=veth -rootfs: local:107/vm-107-disk-1.raw,size=7G ----- - -Those configuration files are simple text files, and you can edit them -using a normal text editor (`vi`, `nano`, ...). This is sometimes -useful to do small corrections, but keep in mind that you need to -restart the container to apply such changes. - -For that reason, it is usually better to use the `pct` command to -generate and modify those files, or do the whole thing using the GUI. -Our toolkit is smart enough to instantaneously apply most changes to -running containers. This feature is called "hot plug", and there is no -need to restart the container in that case. +Technology Overview +------------------- -File Format -~~~~~~~~~~~ +* LXC (https://linuxcontainers.org/) -Container configuration files use a simple colon separated key/value -format. Each line has the following format: +* Integrated into {pve} graphical user interface (GUI) - # this is a comment - OPTION: value +* Easy to use command line tool `pct` -Blank lines in those files are ignored, and lines starting with a `#` -character are treated as comments and are also ignored. +* Access via {pve} REST API -It is possible to add low-level, LXC style configuration directly, for -example: +* lxcfs to provide containerized /proc file system - lxc.init_cmd: /sbin/my_own_init +* AppArmor/Seccomp to improve security -or +* CRIU: for live migration (planned) - lxc.init_cmd = /sbin/my_own_init +* Runs on modern Linux kernels -Those settings are directly passed to the LXC low-level tools. +* Image based deployment (templates) -Snapshots -~~~~~~~~~ +* Use {pve} storage library -When you create a snapshot, `pct` stores the configuration at snapshot -time into a separate snapshot section within the same configuration -file. For example, after creating a snapshot called ``testsnapshot'', -your configuration file will look like this: +* Container setup from host (network, DNS, storage, ...) -.Container Configuration with Snapshot ----- -memory: 512 -swap: 512 -parent: testsnaphot -... -[testsnaphot] -memory: 512 -swap: 512 -snaptime: 1457170803 -... ----- +Security Considerations +----------------------- -There are a few snapshot related properties like `parent` and -`snaptime`. The `parent` property is used to store the parent/child -relationship between snapshots. `snaptime` is the snapshot creation -time stamp (Unix epoch). +Containers use the same kernel as the host, so there is a big attack +surface for malicious users. You should consider this fact if you +provide containers to totally untrusted people. In general, fully +virtualized VMs provide better isolation. +The good news is that LXC uses many kernel security features like +AppArmor, CGroups and PID and user namespaces, which makes containers +usage quite secure. Guest Operating System Configuration -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------------------ We normally try to detect the operating system type inside the container, and then modify some files inside the container to make @@ -221,7 +143,7 @@ and will not be moved. Modification of a file can be prevented by adding a `.pve-ignore.` file for it. For instance, if the file `/etc/.pve-ignore.hosts` exists then the `/etc/hosts` file will not be touched. This can be a -simple empty file creatd via: +simple empty file created via: # touch /etc/.pve-ignore.hosts @@ -249,12 +171,8 @@ Gentoo:: test /etc/gentoo-release NOTE: Container start fails if the configured `ostype` differs from the auto detected type. -Options -~~~~~~~ - -include::pct.conf.5-opts.adoc[] - +[[pct_container_images]] Container Images ---------------- @@ -266,7 +184,7 @@ new container, for example: pct create 999 local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz -Proxmox itself ships a set of basic templates for most common +{pve} itself ships a set of basic templates for most common operating systems, and you can download them using the `pveam` (short for {pve} Appliance Manager) command line utility. You can also download https://www.turnkeylinux.org/[TurnKey Linux] containers using @@ -316,11 +234,12 @@ local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz 190.20MB The above command shows you the full {pve} volume identifiers. They include the storage name, and most other {pve} commands can use them. For -examply you can delete that image later with: +example you can delete that image later with: pveam remove local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz +[[pct_container_storage]] Container Storage ----------------- @@ -336,7 +255,8 @@ allows you to choose a suitable storage for each application. For example, you can use a relatively slow (and thus cheap) storage for the container root file system. Then you can use a second mount point to mount a very fast, distributed storage for your database -application. +application. See section <> for further +details. The second big improvement is that you can use any storage type supported by the {pve} storage library. That means that you can store @@ -352,9 +272,193 @@ local storage inside containers with zero overhead. Such bind mounts also provide an easy way to share data between different containers. +FUSE Mounts +~~~~~~~~~~~ + +WARNING: Because of existing issues in the Linux kernel's freezer +subsystem the usage of FUSE mounts inside a container is strongly +advised against, as containers need to be frozen for suspend or +snapshot mode backups. + +If FUSE mounts cannot be replaced by other mounting mechanisms or storage +technologies, it is possible to establish the FUSE mount on the Proxmox host +and use a bind mount point to make it accessible inside the container. + + +Using Quotas Inside Containers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Quotas allow to set limits inside a container for the amount of disk +space that each user can use. This only works on ext4 image based +storage types and currently does not work with unprivileged +containers. + +Activating the `quota` option causes the following mount options to be +used for a mount point: +`usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0` + +This allows quotas to be used like you would on any other system. You +can initialize the `/aquota.user` and `/aquota.group` files by running + +---- +quotacheck -cmug / +quotaon / +---- + +and edit the quotas via the `edquota` command. Refer to the documentation +of the distribution running inside the container for details. + +NOTE: You need to run the above commands for every mount point by passing +the mount point's path instead of just `/`. + + +Using ACLs Inside Containers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The standard Posix **A**ccess **C**ontrol **L**ists are also available inside containers. +ACLs allow you to set more detailed file ownership than the traditional user/ +group/others model. + + +Backup of Containers mount points +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By default additional mount points besides the Root Disk mount point are not +included in backups. You can reverse this default behavior by setting the +*Backup* option on a mount point. +// see PVE::VZDump::LXC::prepare() + +Replication of Containers mount points +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By default additional mount points are replicated when the Root Disk +is replicated. If you want the {pve} storage replication mechanism to skip a + mount point when starting a replication job, you can set the +*Skip replication* option on that mount point. + +As of {pve} 5.0, replication requires a storage of type `zfspool`, so adding a + mount point to a different type of storage when the container has replication + configured requires to *Skip replication* for that mount point. + + +[[pct_settings]] +Container Settings +------------------ + +[[pct_general]] +General Settings +~~~~~~~~~~~~~~~~ + +[thumbnail="screenshot/gui-create-ct-general.png"] + +General settings of a container include + +* the *Node* : the physical server on which the container will run +* the *CT ID*: a unique number in this {pve} installation used to identify your container +* *Hostname*: the hostname of the container +* *Resource Pool*: a logical group of containers and VMs +* *Password*: the root password of the container +* *SSH Public Key*: a public key for connecting to the root account over SSH +* *Unprivileged container*: this option allows to choose at creation time +if you want to create a privileged or unprivileged container. + + +Privileged Containers +^^^^^^^^^^^^^^^^^^^^^ + +Security is done by dropping capabilities, using mandatory access +control (AppArmor), SecComp filters and namespaces. The LXC team +considers this kind of container as unsafe, and they will not consider +new container escape exploits to be security issues worthy of a CVE +and quick fix. So you should use this kind of containers only inside a +trusted environment, or when no untrusted task is running as root in +the container. + + +Unprivileged Containers +^^^^^^^^^^^^^^^^^^^^^^^ + +This kind of containers use a new kernel feature called user +namespaces. The root UID 0 inside the container is mapped to an +unprivileged user outside the container. This means that most security +issues (container escape, resource abuse, ...) in those containers +will affect a random unprivileged user, and so would be a generic +kernel security bug rather than an LXC issue. The LXC team thinks +unprivileged containers are safe by design. + +NOTE: If the container uses systemd as an init system, please be +aware the systemd version running inside the container should be equal +or greater than 220. + +[[pct_cpu]] +CPU +~~~ + +[thumbnail="screenshot/gui-create-ct-cpu.png"] + +You can restrict the number of visible CPUs inside the container using +the `cores` option. This is implemented using the Linux 'cpuset' +cgroup (**c**ontrol *group*). A special task inside `pvestatd` tries +to distribute running containers among available CPUs. You can view +the assigned CPUs using the following command: + +---- +# pct cpusets + --------------------- + 102: 6 7 + 105: 2 3 4 5 + 108: 0 1 + --------------------- +---- + +Containers use the host kernel directly, so all task inside a +container are handled by the host CPU scheduler. {pve} uses the Linux +'CFS' (**C**ompletely **F**air **S**cheduler) scheduler by default, +which has additional bandwidth control options. + +[horizontal] + +`cpulimit`: :: You can use this option to further limit assigned CPU +time. Please note that this is a floating point number, so it is +perfectly valid to assign two cores to a container, but restrict +overall CPU consumption to half a core. ++ +---- +cores: 2 +cpulimit: 0.5 +---- + +`cpuunits`: :: This is a relative weight passed to the kernel +scheduler. The larger the number is, the more CPU time this container +gets. Number is relative to the weights of all the other running +containers. The default is 1024. You can use this setting to +prioritize some containers. + + +[[pct_memory]] +Memory +~~~~~~ + +[thumbnail="screenshot/gui-create-ct-memory.png"] + +Container memory is controlled using the cgroup memory controller. + +[horizontal] + +`memory`: :: Limit overall memory usage. This corresponds +to the `memory.limit_in_bytes` cgroup setting. + +`swap`: :: Allows the container to use additional swap memory from the +host swap space. This corresponds to the `memory.memsw.limit_in_bytes` +cgroup setting, which is set to the sum of both value (`memory + +swap`). + + +[[pct_mount_points]] Mount Points ~~~~~~~~~~~~ +[thumbnail="screenshot/gui-create-ct-root-disk.png"] + The root mount point is configured with the `rootfs` property, and you can configure up to 10 additional mount points. The corresponding options are called `mp0` to `mp9`, and they can contain the following setting: @@ -364,27 +468,34 @@ include::pct-mountpoint-opts.adoc[] Currently there are basically three types of mount points: storage backed mount points, bind mounts and device mounts. -.Typical Container `rootfs` configuration +.Typical container `rootfs` configuration ---- rootfs: thin1:base-100-disk-1,size=8G ---- -Storage backed mount points +Storage Backed Mount Points ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Storage backed mount points are managed by the {pve} storage subsystem and come in three different flavors: -- Image based: These are raw images containing a single ext4 formatted file +- Image based: these are raw images containing a single ext4 formatted file system. -- ZFS Subvolumes: These are technically bind mounts, but with managed storage, +- ZFS subvolumes: these are technically bind mounts, but with managed storage, and thus allow resizing and snapshotting. - Directories: passing `size=0` triggers a special case where instead of a raw image a directory is created. +NOTE: The special option syntax `STORAGE_ID:SIZE_IN_GB` for storage backed +mount point volumes will automatically allocate a volume of the specified size +on the specified storage. E.g., calling +`pct set 100 -mp0 thin1:10,mp=/path/in/container` will allocate a 10GB volume +on the storage `thin1` and replace the volume ID place holder `10` with the +allocated volume ID. -Bind mount points + +Bind Mount Points ^^^^^^^^^^^^^^^^^ Bind mounts allow you to access arbitrary directories from your Proxmox VE host @@ -416,7 +527,7 @@ Alternatively, use `pct set 100 -mp0 /mnt/bindmounts/shared,mp=/shared` to achieve the same result. -Device mount points +Device Mount Points ^^^^^^^^^^^^^^^^^^^ Device mount points allow to mount block devices of the host directly into the @@ -430,67 +541,70 @@ more features. NOTE: The contents of device mount points are not backed up when using `vzdump`. -FUSE mounts -~~~~~~~~~~~ - -WARNING: Because of existing issues in the Linux kernel's freezer -subsystem the usage of FUSE mounts inside a container is strongly -advised against, as containers need to be frozen for suspend or -snapshot mode backups. - -If FUSE mounts cannot be replaced by other mounting mechanisms or storage -technologies, it is possible to establish the FUSE mount on the Proxmox host -and use a bind mount point to make it accessible inside the container. +[[pct_container_network]] +Network +~~~~~~~ +[thumbnail="screenshot/gui-create-ct-network.png"] -Using quotas inside containers -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +You can configure up to 10 network interfaces for a single +container. The corresponding options are called `net0` to `net9`, and +they can contain the following setting: -Quotas allow to set limits inside a container for the amount of disk -space that each user can use. This only works on ext4 image based -storage types and currently does not work with unprivileged -containers. +include::pct-network-opts.adoc[] -Activating the `quota` option causes the following mount options to be -used for a mount point: -`usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0` -This allows quotas to be used like you would on any other system. You -can initialize the `/aquota.user` and `/aquota.group` files by running +[[pct_startup_and_shutdown]] +Automatic Start and Shutdown of Containers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ----- -quotacheck -cmug / -quotaon / ----- +After creating your containers, you probably want them to start automatically +when the host system boots. For this you need to select the option 'Start at +boot' from the 'Options' Tab of your container in the web interface, or set it with +the following command: -and edit the quotas via the `edquota` command. Refer to the documentation -of the distribution running inside the container for details. - -NOTE: You need to run the above commands for every mount point by passing -the mount point's path instead of just `/`. + pct set -onboot 1 +.Start and Shutdown Order +// use the screenshot from qemu - its the same +[thumbnail="screenshot/gui-qemu-edit-start-order.png"] -Using ACLs inside containers -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If you want to fine tune the boot order of your containers, you can use the following +parameters : -The standard Posix Access Control Lists are also available inside containers. -ACLs allow you to set more detailed file ownership than the traditional user/ -group/others model. +* *Start/Shutdown order*: Defines the start order priority. E.g. set it to 1 if +you want the CT to be the first to be started. (We use the reverse startup +order for shutdown, so a container with a start order of 1 would be the last to +be shut down) +* *Startup delay*: Defines the interval between this container start and subsequent +containers starts . E.g. set it to 240 if you want to wait 240 seconds before starting +other containers. +* *Shutdown timeout*: Defines the duration in seconds {pve} should wait +for the container to be offline after issuing a shutdown command. +By default this value is set to 60, which means that {pve} will issue a +shutdown request, wait 60s for the machine to be offline, and if after 60s +the machine is still online will notify that the shutdown action failed. +Please note that containers without a Start/Shutdown order parameter will always +start after those where the parameter is set, and this parameter only +makes sense between the machines running locally on a host, and not +cluster-wide. -Container Network ------------------ +Hookscripts +~~~~~~~~~~~ -You can configure up to 10 network interfaces for a single -container. The corresponding options are called `net0` to `net9`, and -they can contain the following setting: +You can add a hook script to CTs with the config property `hookscript`. -include::pct-network-opts.adoc[] + pct set 100 -hookscript local:snippets/hookscript.pl +It will be called during various phases of the guests lifetime. +For an example and documentation see the example script under +`/usr/share/pve-docs/examples/guest-example-hookscript.pl`. Backup and Restore ------------------ + Container Backup ~~~~~~~~~~~~~~~~ @@ -563,11 +677,12 @@ and destroy containers, and control execution (start, stop, migrate, ...). You can use pct to set parameters in the associated config file, like network configuration or memory limits. + CLI Usage Examples ~~~~~~~~~~~~~~~~~~ Create a container based on a Debian template (provided you have -already downloaded the template via the webgui) +already downloaded the template via the web interface) pct create 100 /var/lib/vz/template/cache/debian-8.0-standard_8.0-1_amd64.tar.gz @@ -597,60 +712,164 @@ Reduce the memory of the container to 512MB pct set 100 -memory 512 -Files ------- +Obtaining Debugging Logs +~~~~~~~~~~~~~~~~~~~~~~~~ -`/etc/pve/lxc/.conf`:: +In case `pct start` is unable to start a specific container, it might be +helpful to collect debugging output by running `lxc-start` (replace `ID` with +the container's ID): -Configuration file for the container ''. + lxc-start -n ID -F -l DEBUG -o /tmp/lxc-ID.log + +This command will attempt to start the container in foreground mode, to stop the container run `pct shutdown ID` or `pct stop ID` in a second terminal. + +The collected debug log is written to `/tmp/lxc-ID.log`. + +NOTE: If you have changed the container's configuration since the last start +attempt with `pct start`, you need to run `pct start` at least once to also +update the configuration used by `lxc-start`. + +[[pct_migration]] +Migration +--------- +If you have a cluster, you can migrate your Containers with -Container Advantages --------------------- + pct migrate -* Simple, and fully integrated into {pve}. Setup looks similar to a normal - VM setup. +This works as long as your Container is offline. If it has local volumes or +mountpoints defined, the migration will copy the content over the network to +the target host if there is the same storage defined. -** Storage (ZFS, LVM, NFS, Ceph, ...) +If you want to migrate online Containers, the only way is to use +restart migration. This can be initiated with the -restart flag and the optional +-timeout parameter. -** Network +A restart migration will shut down the Container and kill it after the specified +timeout (the default is 180 seconds). Then it will migrate the Container +like an offline migration and when finished, it starts the Container on the +target node. -** Authentication +[[pct_configuration]] +Configuration +------------- -** Cluster +The `/etc/pve/lxc/.conf` file stores container configuration, +where `` is the numeric ID of the given container. Like all +other files stored inside `/etc/pve/`, they get automatically +replicated to all other cluster nodes. -* Fast: minimal overhead, as fast as bare metal +NOTE: CTIDs < 100 are reserved for internal purposes, and CTIDs need to be +unique cluster wide. -* High density (perfect for idle workloads) +.Example Container Configuration +---- +ostype: debian +arch: amd64 +hostname: www +memory: 512 +swap: 512 +net0: bridge=vmbr0,hwaddr=66:64:66:64:64:36,ip=dhcp,name=eth0,type=veth +rootfs: local:107/vm-107-disk-1.raw,size=7G +---- -* REST API +Those configuration files are simple text files, and you can edit them +using a normal text editor (`vi`, `nano`, ...). This is sometimes +useful to do small corrections, but keep in mind that you need to +restart the container to apply such changes. -* Direct hardware access +For that reason, it is usually better to use the `pct` command to +generate and modify those files, or do the whole thing using the GUI. +Our toolkit is smart enough to instantaneously apply most changes to +running containers. This feature is called "hot plug", and there is no +need to restart the container in that case. -Technology Overview -------------------- +File Format +~~~~~~~~~~~ -- Integrated into {pve} graphical user interface (GUI) +Container configuration files use a simple colon separated key/value +format. Each line has the following format: -- LXC (https://linuxcontainers.org/) +----- +# this is a comment +OPTION: value +----- -- cgmanager for cgroup management +Blank lines in those files are ignored, and lines starting with a `#` +character are treated as comments and are also ignored. -- lxcfs to provive containerized /proc file system +It is possible to add low-level, LXC style configuration directly, for +example: -- apparmor + lxc.init_cmd: /sbin/my_own_init -- CRIU: for live migration (planned) +or -- We use latest available kernels (4.4.X) + lxc.init_cmd = /sbin/my_own_init -- Image based deployment (templates) +Those settings are directly passed to the LXC low-level tools. -- Container setup from host (Network, DNS, Storage, ...) + +[[pct_snapshots]] +Snapshots +~~~~~~~~~ + +When you create a snapshot, `pct` stores the configuration at snapshot +time into a separate snapshot section within the same configuration +file. For example, after creating a snapshot called ``testsnapshot'', +your configuration file will look like this: + +.Container configuration with snapshot +---- +memory: 512 +swap: 512 +parent: testsnaphot +... + +[testsnaphot] +memory: 512 +swap: 512 +snaptime: 1457170803 +... +---- + +There are a few snapshot related properties like `parent` and +`snaptime`. The `parent` property is used to store the parent/child +relationship between snapshots. `snaptime` is the snapshot creation +time stamp (Unix epoch). + + +[[pct_options]] +Options +~~~~~~~ + +include::pct.conf.5-opts.adoc[] + + +Locks +----- + +Container migrations, snapshots and backups (`vzdump`) set a lock to +prevent incompatible concurrent actions on the affected container. Sometimes +you need to remove such a lock manually (e.g., after a power failure). + + pct unlock + +CAUTION: Only do that if you are sure the action which set the lock is +no longer running. ifdef::manvolnum[] + +Files +------ + +`/etc/pve/lxc/.conf`:: + +Configuration file for the container ''. + + include::pve-copyright.adoc[] endif::manvolnum[]