X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=pct.adoc;h=6a81b14c5ac30a0491474375f8e224990f4c61d0;hp=611ff484b9c7bf0272e2dcdc97373cbf788340ed;hb=26ca7ff55309331b9b11b10b64fab2d819454909;hpb=38fd0958719a329859b3d0d719c37d5df15a2d8d diff --git a/pct.adoc b/pct.adoc index 611ff48..6a81b14 100644 --- a/pct.adoc +++ b/pct.adoc @@ -24,17 +24,559 @@ Proxmox Container Toolkit include::attributes.txt[] endif::manvolnum[] -'pct' is a tool to manages Linux Containers (LXC). You can create and -destroy containers, and control execution -(start/stop/suspend/resume). Besides that, you can use pct to set -parameters in the associated config file, like network configuration -or memory. -CLI Usage Examples +Containers are a lightweight alternative to fully virtualized +VMs. Instead of emulating a complete Operating System (OS), containers +simply use the OS of the host they run on. This implies that all +containers use the same kernel, and that they can access resources +from the host directly. + +This is great because containers do not waste CPU power nor memory due +to kernel emulation. Container run-time costs are close to zero and +usually negligible. But there are also some drawbacks you need to +consider: + +* You can only run Linux based OS inside containers, i.e. it is not + possible to run FreeBSD or MS Windows inside. + +* For security reasons, access to host resources needs to be + restricted. This is done with AppArmor, SecComp filters and other + kernel features. Be prepared that some syscalls are not allowed + inside containers. + +{pve} uses https://linuxcontainers.org/[LXC] as underlying container +technology. We consider LXC as low-level library, which provides +countless options. It would be too difficult to use those tools +directly. Instead, we provide a small wrapper called `pct`, the +"Proxmox Container Toolkit". + +The toolkit is tightly coupled with {pve}. That means that it is aware +of the cluster setup, and it can use the same network and storage +resources as fully virtualized VMs. You can even use the {pve} +firewall, or manage containers using the HA framework. + +Our primary goal is to offer an environment as one would get from a +VM, but without the additional overhead. We call this "System +Containers". + +NOTE: If you want to run micro-containers (with docker, rkt, ...), it +is best to run them inside a VM. + + +Security Considerations +----------------------- + +Containers use the same kernel as the host, so there is a big attack +surface for malicious users. You should consider this fact if you +provide containers to totally untrusted people. In general, fully +virtualized VMs provide better isolation. + +The good news is that LXC uses many kernel security features like +AppArmor, CGroups and PID and user namespaces, which makes containers +usage quite secure. We distinguish two types of containers: + + +Privileged Containers +~~~~~~~~~~~~~~~~~~~~~ + +Security is done by dropping capabilities, using mandatory access +control (AppArmor), SecComp filters and namespaces. The LXC team +considers this kind of container as unsafe, and they will not consider +new container escape exploits to be security issues worthy of a CVE +and quick fix. So you should use this kind of containers only inside a +trusted environment, or when no untrusted task is running as root in +the container. + + +Unprivileged Containers +~~~~~~~~~~~~~~~~~~~~~~~ + +This kind of containers use a new kernel feature called user +namespaces. The root UID 0 inside the container is mapped to an +unprivileged user outside the container. This means that most security +issues (container escape, resource abuse, ...) in those containers +will affect a random unprivileged user, and so would be a generic +kernel security bug rather than an LXC issue. The LXC team thinks +unprivileged containers are safe by design. + + +Configuration +------------- + +The `/etc/pve/lxc/.conf` file stores container configuration, +where `` is the numeric ID of the given container. Like all +other files stored inside `/etc/pve/`, they get automatically +replicated to all other cluster nodes. + +NOTE: CTIDs < 100 are reserved for internal purposes, and CTIDs need to be +unique cluster wide. + +.Example Container Configuration +---- +ostype: debian +arch: amd64 +hostname: www +memory: 512 +swap: 512 +net0: bridge=vmbr0,hwaddr=66:64:66:64:64:36,ip=dhcp,name=eth0,type=veth +rootfs: local:107/vm-107-disk-1.raw,size=7G +---- + +Those configuration files are simple text files, and you can edit them +using a normal text editor (`vi`, `nano`, ...). This is sometimes +useful to do small corrections, but keep in mind that you need to +restart the container to apply such changes. + +For that reason, it is usually better to use the `pct` command to +generate and modify those files, or do the whole thing using the GUI. +Our toolkit is smart enough to instantaneously apply most changes to +running containers. This feature is called "hot plug", and there is no +need to restart the container in that case. + + +File Format +~~~~~~~~~~~ + +Container configuration files use a simple colon separated key/value +format. Each line has the following format: + +----- +# this is a comment +OPTION: value +----- + +Blank lines in those files are ignored, and lines starting with a `#` +character are treated as comments and are also ignored. + +It is possible to add low-level, LXC style configuration directly, for +example: + + lxc.init_cmd: /sbin/my_own_init + +or + + lxc.init_cmd = /sbin/my_own_init + +Those settings are directly passed to the LXC low-level tools. + + +Snapshots +~~~~~~~~~ + +When you create a snapshot, `pct` stores the configuration at snapshot +time into a separate snapshot section within the same configuration +file. For example, after creating a snapshot called ``testsnapshot'', +your configuration file will look like this: + +.Container configuration with snapshot +---- +memory: 512 +swap: 512 +parent: testsnaphot +... + +[testsnaphot] +memory: 512 +swap: 512 +snaptime: 1457170803 +... +---- + +There are a few snapshot related properties like `parent` and +`snaptime`. The `parent` property is used to store the parent/child +relationship between snapshots. `snaptime` is the snapshot creation +time stamp (Unix epoch). + + +Guest Operating System Configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We normally try to detect the operating system type inside the +container, and then modify some files inside the container to make +them work as expected. Here is a short list of things we do at +container startup: + +set /etc/hostname:: to set the container name + +modify /etc/hosts:: to allow lookup of the local hostname + +network setup:: pass the complete network setup to the container + +configure DNS:: pass information about DNS servers + +adapt the init system:: for example, fix the number of spawned getty processes + +set the root password:: when creating a new container + +rewrite ssh_host_keys:: so that each container has unique keys + +randomize crontab:: so that cron does not start at the same time on all containers + +Changes made by {PVE} are enclosed by comment markers: + +---- +# --- BEGIN PVE --- + +# --- END PVE --- +---- + +Those markers will be inserted at a reasonable location in the +file. If such a section already exists, it will be updated in place +and will not be moved. + +Modification of a file can be prevented by adding a `.pve-ignore.` +file for it. For instance, if the file `/etc/.pve-ignore.hosts` +exists then the `/etc/hosts` file will not be touched. This can be a +simple empty file creatd via: + + # touch /etc/.pve-ignore.hosts + +Most modifications are OS dependent, so they differ between different +distributions and versions. You can completely disable modifications +by manually setting the `ostype` to `unmanaged`. + +OS type detection is done by testing for certain files inside the +container: + +Ubuntu:: inspect /etc/lsb-release (`DISTRIB_ID=Ubuntu`) + +Debian:: test /etc/debian_version + +Fedora:: test /etc/fedora-release + +RedHat or CentOS:: test /etc/redhat-release + +ArchLinux:: test /etc/arch-release + +Alpine:: test /etc/alpine-release + +Gentoo:: test /etc/gentoo-release + +NOTE: Container start fails if the configured `ostype` differs from the auto +detected type. + + +Options +~~~~~~~ + +include::pct.conf.5-opts.adoc[] + + +Container Images +---------------- + +Container images, sometimes also referred to as ``templates'' or +``appliances'', are `tar` archives which contain everything to run a +container. You can think of it as a tidy container backup. Like most +modern container toolkits, `pct` uses those images when you create a +new container, for example: + + pct create 999 local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz + +{pve} itself ships a set of basic templates for most common +operating systems, and you can download them using the `pveam` (short +for {pve} Appliance Manager) command line utility. You can also +download https://www.turnkeylinux.org/[TurnKey Linux] containers using +that tool (or the graphical user interface). + +Our image repositories contain a list of available images, and there +is a cron job run each day to download that list. You can trigger that +update manually with: + + pveam update + +After that you can view the list of available images using: + + pveam available + +You can restrict this large list by specifying the `section` you are +interested in, for example basic `system` images: + +.List available system images +---- +# pveam available --section system +system archlinux-base_2015-24-29-1_x86_64.tar.gz +system centos-7-default_20160205_amd64.tar.xz +system debian-6.0-standard_6.0-7_amd64.tar.gz +system debian-7.0-standard_7.0-3_amd64.tar.gz +system debian-8.0-standard_8.0-1_amd64.tar.gz +system ubuntu-12.04-standard_12.04-1_amd64.tar.gz +system ubuntu-14.04-standard_14.04-1_amd64.tar.gz +system ubuntu-15.04-standard_15.04-1_amd64.tar.gz +system ubuntu-15.10-standard_15.10-1_amd64.tar.gz +---- + +Before you can use such a template, you need to download them into one +of your storages. You can simply use storage `local` for that +purpose. For clustered installations, it is preferred to use a shared +storage so that all nodes can access those images. + + pveam download local debian-8.0-standard_8.0-1_amd64.tar.gz + +You are now ready to create containers using that image, and you can +list all downloaded images on storage `local` with: + +---- +# pveam list local +local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz 190.20MB +---- + +The above command shows you the full {pve} volume identifiers. They include +the storage name, and most other {pve} commands can use them. For +example you can delete that image later with: + + pveam remove local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz + + +Container Storage +----------------- + +Traditional containers use a very simple storage model, only allowing +a single mount point, the root file system. This was further +restricted to specific file system types like `ext4` and `nfs`. +Additional mounts are often done by user provided scripts. This turned +out to be complex and error prone, so we try to avoid that now. + +Our new LXC based container model is more flexible regarding +storage. First, you can have more than a single mount point. This +allows you to choose a suitable storage for each application. For +example, you can use a relatively slow (and thus cheap) storage for +the container root file system. Then you can use a second mount point +to mount a very fast, distributed storage for your database +application. + +The second big improvement is that you can use any storage type +supported by the {pve} storage library. That means that you can store +your containers on local `lvmthin` or `zfs`, shared `iSCSI` storage, +or even on distributed storage systems like `ceph`. It also enables us +to use advanced storage features like snapshots and clones. `vzdump` +can also use the snapshot feature to provide consistent container +backups. + +Last but not least, you can also mount local devices directly, or +mount local directories using bind mounts. That way you can access +local storage inside containers with zero overhead. Such bind mounts +also provide an easy way to share data between different containers. + + +Mount Points +~~~~~~~~~~~~ + +The root mount point is configured with the `rootfs` property, and you can +configure up to 10 additional mount points. The corresponding options +are called `mp0` to `mp9`, and they can contain the following setting: + +include::pct-mountpoint-opts.adoc[] + +Currently there are basically three types of mount points: storage backed +mount points, bind mounts and device mounts. + +.Typical container `rootfs` configuration +---- +rootfs: thin1:base-100-disk-1,size=8G +---- + + +Storage Backed Mount Points +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Storage backed mount points are managed by the {pve} storage subsystem and come +in three different flavors: + +- Image based: these are raw images containing a single ext4 formatted file + system. +- ZFS subvolumes: these are technically bind mounts, but with managed storage, + and thus allow resizing and snapshotting. +- Directories: passing `size=0` triggers a special case where instead of a raw + image a directory is created. + + +Bind Mount Points +^^^^^^^^^^^^^^^^^ + +Bind mounts allow you to access arbitrary directories from your Proxmox VE host +inside a container. Some potential use cases are: + +- Accessing your home directory in the guest +- Accessing an USB device directory in the guest +- Accessing an NFS mount from the host in the guest + +Bind mounts are considered to not be managed by the storage subsystem, so you +cannot make snapshots or deal with quotas from inside the container. With +unprivileged containers you might run into permission problems caused by the +user mapping and cannot use ACLs. + +NOTE: The contents of bind mount points are not backed up when using `vzdump`. + +WARNING: For security reasons, bind mounts should only be established +using source directories especially reserved for this purpose, e.g., a +directory hierarchy under `/mnt/bindmounts`. Never bind mount system +directories like `/`, `/var` or `/etc` into a container - this poses a +great security risk. + +NOTE: The bind mount source path must not contain any symlinks. + +For example, to make the directory `/mnt/bindmounts/shared` accessible in the +container with ID `100` under the path `/shared`, use a configuration line like +`mp0: /mnt/bindmounts/shared,mp=/shared` in `/etc/pve/lxc/100.conf`. +Alternatively, use `pct set 100 -mp0 /mnt/bindmounts/shared,mp=/shared` to +achieve the same result. + + +Device Mount Points +^^^^^^^^^^^^^^^^^^^ + +Device mount points allow to mount block devices of the host directly into the +container. Similar to bind mounts, device mounts are not managed by {PVE}'s +storage subsystem, but the `quota` and `acl` options will be honored. + +NOTE: Device mount points should only be used under special circumstances. In +most cases a storage backed mount point offers the same performance and a lot +more features. + +NOTE: The contents of device mount points are not backed up when using `vzdump`. + + +FUSE Mounts +~~~~~~~~~~~ + +WARNING: Because of existing issues in the Linux kernel's freezer +subsystem the usage of FUSE mounts inside a container is strongly +advised against, as containers need to be frozen for suspend or +snapshot mode backups. + +If FUSE mounts cannot be replaced by other mounting mechanisms or storage +technologies, it is possible to establish the FUSE mount on the Proxmox host +and use a bind mount point to make it accessible inside the container. + + +Using Quotas Inside Containers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Quotas allow to set limits inside a container for the amount of disk +space that each user can use. This only works on ext4 image based +storage types and currently does not work with unprivileged +containers. + +Activating the `quota` option causes the following mount options to be +used for a mount point: +`usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0` + +This allows quotas to be used like you would on any other system. You +can initialize the `/aquota.user` and `/aquota.group` files by running + +---- +quotacheck -cmug / +quotaon / +---- + +and edit the quotas via the `edquota` command. Refer to the documentation +of the distribution running inside the container for details. + +NOTE: You need to run the above commands for every mount point by passing +the mount point's path instead of just `/`. + + +Using ACLs Inside Containers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The standard Posix **A**ccess **C**ontrol **L**ists are also available inside containers. +ACLs allow you to set more detailed file ownership than the traditional user/ +group/others model. + + +Container Network +----------------- + +You can configure up to 10 network interfaces for a single +container. The corresponding options are called `net0` to `net9`, and +they can contain the following setting: + +include::pct-network-opts.adoc[] + + +Backup and Restore ------------------ -Create a container based on a Debian template (provided you downloaded -the template via the webgui before) + +Container Backup +~~~~~~~~~~~~~~~~ + +It is possible to use the `vzdump` tool for container backup. Please +refer to the `vzdump` manual page for details. + + +Restoring Container Backups +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Restoring container backups made with `vzdump` is possible using the +`pct restore` command. By default, `pct restore` will attempt to restore as much +of the backed up container configuration as possible. It is possible to override +the backed up configuration by manually setting container options on the command +line (see the `pct` manual page for details). + +NOTE: `pvesm extractconfig` can be used to view the backed up configuration +contained in a vzdump archive. + +There are two basic restore modes, only differing by their handling of mount +points: + + +``Simple'' Restore Mode +^^^^^^^^^^^^^^^^^^^^^^^ + +If neither the `rootfs` parameter nor any of the optional `mpX` parameters +are explicitly set, the mount point configuration from the backed up +configuration file is restored using the following steps: + +. Extract mount points and their options from backup +. Create volumes for storage backed mount points (on storage provided with the +`storage` parameter, or default local storage if unset) +. Extract files from backup archive +. Add bind and device mount points to restored configuration (limited to root user) + +NOTE: Since bind and device mount points are never backed up, no files are +restored in the last step, but only the configuration options. The assumption +is that such mount points are either backed up with another mechanism (e.g., +NFS space that is bind mounted into many containers), or not intended to be +backed up at all. + +This simple mode is also used by the container restore operations in the web +interface. + + +``Advanced'' Restore Mode +^^^^^^^^^^^^^^^^^^^^^^^^^ + +By setting the `rootfs` parameter (and optionally, any combination of `mpX` +parameters), the `pct restore` command is automatically switched into an +advanced mode. This advanced mode completely ignores the `rootfs` and `mpX` +configuration options contained in the backup archive, and instead only +uses the options explicitly provided as parameters. + +This mode allows flexible configuration of mount point settings at restore time, +for example: + +* Set target storages, volume sizes and other options for each mount point +individually +* Redistribute backed up files according to new mount point scheme +* Restore to device and/or bind mount points (limited to root user) + + +Managing Containers with `pct` +------------------------------ + +`pct` is the tool to manage Linux Containers on {pve}. You can create +and destroy containers, and control execution (start, stop, migrate, +...). You can use pct to set parameters in the associated config file, +like network configuration or memory limits. + + +CLI Usage Examples +~~~~~~~~~~~~~~~~~~ + +Create a container based on a Debian template (provided you have +already downloaded the template via the web interface) pct create 100 /var/lib/vz/template/cache/debian-8.0-standard_8.0-1_amd64.tar.gz @@ -54,66 +596,65 @@ Display the configuration pct config 100 -Add a network interface called eth0, bridged to the host bridge vmbr0, +Add a network interface called `eth0`, bridged to the host bridge `vmbr0`, set the address and gateway, while it's running pct set 100 -net0 name=eth0,bridge=vmbr0,ip=192.168.15.147/24,gw=192.168.15.1 Reduce the memory of the container to 512MB - pct set -memory 512 100 + pct set 100 -memory 512 + Files ------ -'/etc/pve/lxc/.conf':: +`/etc/pve/lxc/.conf`:: -Configuration file for the container +Configuration file for the container ''. Container Advantages -------------------- -- Simple, and fully integrated into {pve}. Setup looks similar to a normal +* Simple, and fully integrated into {pve}. Setup looks similar to a normal VM setup. - * Storage (ZFS, LVM, NFS, Ceph, ...) +** Storage (ZFS, LVM, NFS, Ceph, ...) - * Network +** Network - * Authentification +** Authentication - * Cluster +** Cluster -- Fast: minimal overhead, as fast as bare metal +* Fast: minimal overhead, as fast as bare metal -- High density (perfect for idle workloads) +* High density (perfect for idle workloads) -- REST API +* REST API -- Direct hardware access +* Direct hardware access Technology Overview ------------------- -- Integrated into {pve} graphical user interface (GUI) - -- LXC (https://linuxcontainers.org/) +* Integrated into {pve} graphical user interface (GUI) -- cgmanager for cgroup management +* LXC (https://linuxcontainers.org/) -- lxcfs to provive containerized /proc file system +* lxcfs to provide containerized /proc file system -- apparmor +* AppArmor -- CRIU: for live migration (planned) +* CRIU: for live migration (planned) -- We use latest available kernels (4.2.X) +* We use latest available kernels (4.4.X) -- image based deployment (templates) +* Image based deployment (templates) -- Container setup from host (Network, DNS, Storage, ...) +* Container setup from host (network, DNS, storage, ...) ifdef::manvolnum[]