X-Git-Url: https://git.proxmox.com/?p=pve-docs.git;a=blobdiff_plain;f=pct.adoc;h=6a81b14c5ac30a0491474375f8e224990f4c61d0;hp=611ff484b9c7bf0272e2dcdc97373cbf788340ed;hb=26ca7ff55309331b9b11b10b64fab2d819454909;hpb=38fd0958719a329859b3d0d719c37d5df15a2d8d

diff --git a/pct.adoc b/pct.adoc
index 611ff48..6a81b14 100644
--- a/pct.adoc
+++ b/pct.adoc
@@ -24,17 +24,559 @@ Proxmox Container Toolkit
 include::attributes.txt[]
 endif::manvolnum[]
 
-'pct' is a tool to manages Linux Containers (LXC). You can create and
-destroy containers, and control execution
-(start/stop/suspend/resume). Besides that, you can use pct to set
-parameters in the associated config file, like network configuration
-or memory.
 
-CLI Usage Examples
+Containers are a lightweight alternative to fully virtualized
+VMs. Instead of emulating a complete Operating System (OS), containers
+simply use the OS of the host they run on. This implies that all
+containers use the same kernel, and that they can access resources
+from the host directly.
+
+This is great because containers do not waste CPU power nor memory due
+to kernel emulation. Container run-time costs are close to zero and
+usually negligible. But there are also some drawbacks you need to
+consider:
+
+* You can only run Linux based OS inside containers, i.e. it is not
+  possible to run FreeBSD or MS Windows inside.
+
+* For security reasons, access to host resources needs to be
+  restricted. This is done with AppArmor, SecComp filters and other
+  kernel features. Be prepared that some syscalls are not allowed
+  inside containers.
+
+{pve} uses https://linuxcontainers.org/[LXC] as underlying container
+technology. We consider LXC as low-level library, which provides
+countless options. It would be too difficult to use those tools
+directly. Instead, we provide a small wrapper called `pct`, the
+"Proxmox Container Toolkit".
+
+The toolkit is tightly coupled with {pve}. That means that it is aware
+of the cluster setup, and it can use the same network and storage
+resources as fully virtualized VMs. You can even use the {pve}
+firewall, or manage containers using the HA framework.
+
+Our primary goal is to offer an environment as one would get from a
+VM, but without the additional overhead. We call this "System
+Containers".
+
+NOTE: If you want to run micro-containers (with docker, rkt, ...), it
+is best to run them inside a VM.
+
+
+Security Considerations
+-----------------------
+
+Containers use the same kernel as the host, so there is a big attack
+surface for malicious users. You should consider this fact if you
+provide containers to totally untrusted people. In general, fully
+virtualized VMs provide better isolation.
+
+The good news is that LXC uses many kernel security features like
+AppArmor, CGroups and PID and user namespaces, which makes containers
+usage quite secure. We distinguish two types of containers:
+
+
+Privileged Containers
+~~~~~~~~~~~~~~~~~~~~~
+
+Security is done by dropping capabilities, using mandatory access
+control (AppArmor), SecComp filters and namespaces. The LXC team
+considers this kind of container as unsafe, and they will not consider
+new container escape exploits to be security issues worthy of a CVE
+and quick fix. So you should use this kind of containers only inside a
+trusted environment, or when no untrusted task is running as root in
+the container.
+
+
+Unprivileged Containers
+~~~~~~~~~~~~~~~~~~~~~~~
+
+This kind of containers use a new kernel feature called user
+namespaces. The root UID 0 inside the container is mapped to an
+unprivileged user outside the container. This means that most security
+issues (container escape, resource abuse, ...) in those containers
+will affect a random unprivileged user, and so would be a generic
+kernel security bug rather than an LXC issue. The LXC team thinks
+unprivileged containers are safe by design.
+
+
+Configuration
+-------------
+
+The `/etc/pve/lxc/<CTID>.conf` file stores container configuration,
+where `<CTID>` is the numeric ID of the given container. Like all
+other files stored inside `/etc/pve/`, they get automatically
+replicated to all other cluster nodes.
+
+NOTE: CTIDs < 100 are reserved for internal purposes, and CTIDs need to be
+unique cluster wide.
+
+.Example Container Configuration
+----
+ostype: debian
+arch: amd64
+hostname: www
+memory: 512
+swap: 512
+net0: bridge=vmbr0,hwaddr=66:64:66:64:64:36,ip=dhcp,name=eth0,type=veth
+rootfs: local:107/vm-107-disk-1.raw,size=7G
+----
+
+Those configuration files are simple text files, and you can edit them
+using a normal text editor (`vi`, `nano`, ...). This is sometimes
+useful to do small corrections, but keep in mind that you need to
+restart the container to apply such changes.
+
+For that reason, it is usually better to use the `pct` command to
+generate and modify those files, or do the whole thing using the GUI.
+Our toolkit is smart enough to instantaneously apply most changes to
+running containers. This feature is called "hot plug", and there is no
+need to restart the container in that case.
+
+
+File Format
+~~~~~~~~~~~
+
+Container configuration files use a simple colon separated key/value
+format. Each line has the following format:
+
+-----
+# this is a comment
+OPTION: value
+-----
+
+Blank lines in those files are ignored, and lines starting with a `#`
+character are treated as comments and are also ignored.
+
+It is possible to add low-level, LXC style configuration directly, for
+example:
+
+ lxc.init_cmd: /sbin/my_own_init
+
+or
+
+ lxc.init_cmd = /sbin/my_own_init
+
+Those settings are directly passed to the LXC low-level tools.
+
+
+Snapshots
+~~~~~~~~~
+
+When you create a snapshot, `pct` stores the configuration at snapshot
+time into a separate snapshot section within the same configuration
+file. For example, after creating a snapshot called ``testsnapshot'',
+your configuration file will look like this:
+
+.Container configuration with snapshot
+----
+memory: 512
+swap: 512
+parent: testsnaphot
+...
+
+[testsnaphot]
+memory: 512
+swap: 512
+snaptime: 1457170803
+...
+----
+
+There are a few snapshot related properties like `parent` and
+`snaptime`. The `parent` property is used to store the parent/child
+relationship between snapshots. `snaptime` is the snapshot creation
+time stamp (Unix epoch).
+
+
+Guest Operating System Configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We normally try to detect the operating system type inside the
+container, and then modify some files inside the container to make
+them work as expected. Here is a short list of things we do at
+container startup:
+
+set /etc/hostname:: to set the container name
+
+modify /etc/hosts:: to allow lookup of the local hostname
+
+network setup:: pass the complete network setup to the container
+
+configure DNS:: pass information about DNS servers
+
+adapt the init system:: for example, fix the number of spawned getty processes
+
+set the root password:: when creating a new container
+
+rewrite ssh_host_keys:: so that each container has unique keys
+
+randomize crontab:: so that cron does not start at the same time on all containers
+
+Changes made by {PVE} are enclosed by comment markers:
+
+----
+# --- BEGIN PVE ---
+<data>
+# --- END PVE ---
+----
+
+Those markers will be inserted at a reasonable location in the
+file. If such a section already exists, it will be updated in place
+and will not be moved.
+
+Modification of a file can be prevented by adding a `.pve-ignore.`
+file for it.  For instance, if the file `/etc/.pve-ignore.hosts`
+exists then the `/etc/hosts` file will not be touched. This can be a
+simple empty file creatd via:
+
+ # touch /etc/.pve-ignore.hosts
+
+Most modifications are OS dependent, so they differ between different
+distributions and versions. You can completely disable modifications
+by manually setting the `ostype` to `unmanaged`.
+
+OS type detection is done by testing for certain files inside the
+container:
+
+Ubuntu:: inspect /etc/lsb-release (`DISTRIB_ID=Ubuntu`)
+
+Debian:: test /etc/debian_version
+
+Fedora:: test /etc/fedora-release
+
+RedHat or CentOS:: test /etc/redhat-release
+
+ArchLinux:: test /etc/arch-release
+
+Alpine:: test /etc/alpine-release
+
+Gentoo:: test /etc/gentoo-release
+
+NOTE: Container start fails if the configured `ostype` differs from the auto
+detected type.
+
+
+Options
+~~~~~~~
+
+include::pct.conf.5-opts.adoc[]
+
+
+Container Images
+----------------
+
+Container images, sometimes also referred to as ``templates'' or
+``appliances'', are `tar` archives which contain everything to run a
+container. You can think of it as a tidy container backup. Like most
+modern container toolkits, `pct` uses those images when you create a
+new container, for example:
+
+ pct create 999 local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz
+
+{pve} itself ships a set of basic templates for most common
+operating systems, and you can download them using the `pveam` (short
+for {pve} Appliance Manager) command line utility. You can also
+download https://www.turnkeylinux.org/[TurnKey Linux] containers using
+that tool (or the graphical user interface).
+
+Our image repositories contain a list of available images, and there
+is a cron job run each day to download that list. You can trigger that
+update manually with:
+
+ pveam update
+
+After that you can view the list of available images using:
+
+ pveam available
+
+You can restrict this large list by specifying the `section` you are
+interested in, for example basic `system` images:
+
+.List available system images
+----
+# pveam available --section system
+system          archlinux-base_2015-24-29-1_x86_64.tar.gz
+system          centos-7-default_20160205_amd64.tar.xz
+system          debian-6.0-standard_6.0-7_amd64.tar.gz
+system          debian-7.0-standard_7.0-3_amd64.tar.gz
+system          debian-8.0-standard_8.0-1_amd64.tar.gz
+system          ubuntu-12.04-standard_12.04-1_amd64.tar.gz
+system          ubuntu-14.04-standard_14.04-1_amd64.tar.gz
+system          ubuntu-15.04-standard_15.04-1_amd64.tar.gz
+system          ubuntu-15.10-standard_15.10-1_amd64.tar.gz
+----
+
+Before you can use such a template, you need to download them into one
+of your storages. You can simply use storage `local` for that
+purpose. For clustered installations, it is preferred to use a shared
+storage so that all nodes can access those images.
+
+ pveam download local debian-8.0-standard_8.0-1_amd64.tar.gz
+
+You are now ready to create containers using that image, and you can
+list all downloaded images on storage `local` with:
+
+----
+# pveam list local
+local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz  190.20MB
+----
+
+The above command shows you the full {pve} volume identifiers. They include
+the storage name, and most other {pve} commands can use them. For
+example you can delete that image later with:
+
+ pveam remove local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz
+
+
+Container Storage
+-----------------
+
+Traditional containers use a very simple storage model, only allowing
+a single mount point, the root file system. This was further
+restricted to specific file system types like `ext4` and `nfs`.
+Additional mounts are often done by user provided scripts. This turned
+out to be complex and error prone, so we try to avoid that now.
+
+Our new LXC based container model is more flexible regarding
+storage. First, you can have more than a single mount point. This
+allows you to choose a suitable storage for each application. For
+example, you can use a relatively slow (and thus cheap) storage for
+the container root file system. Then you can use a second mount point
+to mount a very fast, distributed storage for your database
+application.
+
+The second big improvement is that you can use any storage type
+supported by the {pve} storage library. That means that you can store
+your containers on local `lvmthin` or `zfs`, shared `iSCSI` storage,
+or even on distributed storage systems like `ceph`. It also enables us
+to use advanced storage features like snapshots and clones. `vzdump`
+can also use the snapshot feature to provide consistent container
+backups.
+
+Last but not least, you can also mount local devices directly, or
+mount local directories using bind mounts. That way you can access
+local storage inside containers with zero overhead. Such bind mounts
+also provide an easy way to share data between different containers.
+
+
+Mount Points
+~~~~~~~~~~~~
+
+The root mount point is configured with the `rootfs` property, and you can
+configure up to 10 additional mount points. The corresponding options
+are called `mp0` to `mp9`, and they can contain the following setting:
+
+include::pct-mountpoint-opts.adoc[]
+
+Currently there are basically three types of mount points: storage backed
+mount points, bind mounts and device mounts.
+
+.Typical container `rootfs` configuration
+----
+rootfs: thin1:base-100-disk-1,size=8G
+----
+
+
+Storage Backed Mount Points
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Storage backed mount points are managed by the {pve} storage subsystem and come
+in three different flavors:
+
+- Image based: these are raw images containing a single ext4 formatted file
+  system.
+- ZFS subvolumes: these are technically bind mounts, but with managed storage,
+  and thus allow resizing and snapshotting.
+- Directories: passing `size=0` triggers a special case where instead of a raw
+  image a directory is created.
+
+
+Bind Mount Points
+^^^^^^^^^^^^^^^^^
+
+Bind mounts allow you to access arbitrary directories from your Proxmox VE host
+inside a container. Some potential use cases are:
+
+- Accessing your home directory in the guest
+- Accessing an USB device directory in the guest
+- Accessing an NFS mount from the host in the guest
+
+Bind mounts are considered to not be managed by the storage subsystem, so you
+cannot make snapshots or deal with quotas from inside the container. With
+unprivileged containers you might run into permission problems caused by the
+user mapping and cannot use ACLs.
+
+NOTE: The contents of bind mount points are not backed up when using `vzdump`.
+
+WARNING: For security reasons, bind mounts should only be established
+using source directories especially reserved for this purpose, e.g., a
+directory hierarchy under `/mnt/bindmounts`. Never bind mount system
+directories like `/`, `/var` or `/etc` into a container - this poses a
+great security risk.
+
+NOTE: The bind mount source path must not contain any symlinks.
+
+For example, to make the directory `/mnt/bindmounts/shared` accessible in the
+container with ID `100` under the path `/shared`, use a configuration line like
+`mp0: /mnt/bindmounts/shared,mp=/shared` in `/etc/pve/lxc/100.conf`.
+Alternatively, use `pct set 100 -mp0 /mnt/bindmounts/shared,mp=/shared` to
+achieve the same result.
+
+
+Device Mount Points
+^^^^^^^^^^^^^^^^^^^
+
+Device mount points allow to mount block devices of the host directly into the
+container. Similar to bind mounts, device mounts are not managed by {PVE}'s
+storage subsystem, but the `quota` and `acl` options will be honored.
+
+NOTE: Device mount points should only be used under special circumstances. In
+most cases a storage backed mount point offers the same performance and a lot
+more features.
+
+NOTE: The contents of device mount points are not backed up when using `vzdump`.
+
+
+FUSE Mounts
+~~~~~~~~~~~
+
+WARNING: Because of existing issues in the Linux kernel's freezer
+subsystem the usage of FUSE mounts inside a container is strongly
+advised against, as containers need to be frozen for suspend or
+snapshot mode backups.
+
+If FUSE mounts cannot be replaced by other mounting mechanisms or storage
+technologies, it is possible to establish the FUSE mount on the Proxmox host
+and use a bind mount point to make it accessible inside the container.
+
+
+Using Quotas Inside Containers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Quotas allow to set limits inside a container for the amount of disk
+space that each user can use.  This only works on ext4 image based
+storage types and currently does not work with unprivileged
+containers.
+
+Activating the `quota` option causes the following mount options to be
+used for a mount point:
+`usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0`
+
+This allows quotas to be used like you would on any other system. You
+can initialize the `/aquota.user` and `/aquota.group` files by running
+
+----
+quotacheck -cmug /
+quotaon /
+----
+
+and edit the quotas via the `edquota` command. Refer to the documentation
+of the distribution running inside the container for details.
+
+NOTE: You need to run the above commands for every mount point by passing
+the mount point's path instead of just `/`.
+
+
+Using ACLs Inside Containers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The standard Posix **A**ccess **C**ontrol **L**ists are also available inside containers.
+ACLs allow you to set more detailed file ownership than the traditional user/
+group/others model.
+
+
+Container Network
+-----------------
+
+You can configure up to 10 network interfaces for a single
+container. The corresponding options are called `net0` to `net9`, and
+they can contain the following setting:
+
+include::pct-network-opts.adoc[]
+
+
+Backup and Restore
 ------------------
 
-Create a container based on a Debian template (provided you downloaded
-the template via the webgui before)
+
+Container Backup
+~~~~~~~~~~~~~~~~
+
+It is possible to use the `vzdump` tool for container backup. Please
+refer to the `vzdump` manual page for details.
+
+
+Restoring Container Backups
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Restoring container backups made with `vzdump` is possible using the
+`pct restore` command. By default, `pct restore` will attempt to restore as much
+of the backed up container configuration as possible. It is possible to override
+the backed up configuration by manually setting container options on the command
+line (see the `pct` manual page for details).
+
+NOTE: `pvesm extractconfig` can be used to view the backed up configuration
+contained in a vzdump archive.
+
+There are two basic restore modes, only differing by their handling of mount
+points:
+
+
+``Simple'' Restore Mode
+^^^^^^^^^^^^^^^^^^^^^^^
+
+If neither the `rootfs` parameter nor any of the optional `mpX` parameters
+are explicitly set, the mount point configuration from the backed up
+configuration file is restored using the following steps:
+
+. Extract mount points and their options from backup
+. Create volumes for storage backed mount points (on storage provided with the
+`storage` parameter, or default local storage if unset)
+. Extract files from backup archive
+. Add bind and device mount points to restored configuration (limited to root user)
+
+NOTE: Since bind and device mount points are never backed up, no files are
+restored in the last step, but only the configuration options. The assumption
+is that such mount points are either backed up with another mechanism (e.g.,
+NFS space that is bind mounted into many containers), or not intended to be
+backed up at all.
+
+This simple mode is also used by the container restore operations in the web
+interface.
+
+
+``Advanced'' Restore Mode
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+By setting the `rootfs` parameter (and optionally, any combination of `mpX`
+parameters), the `pct restore` command is automatically switched into an
+advanced mode. This advanced mode completely ignores the `rootfs` and `mpX`
+configuration options contained in the backup archive, and instead only
+uses the options explicitly provided as parameters.
+
+This mode allows flexible configuration of mount point settings at restore time,
+for example:
+
+* Set target storages, volume sizes and other options for each mount point
+individually
+* Redistribute backed up files according to new mount point scheme
+* Restore to device and/or bind mount points (limited to root user)
+
+
+Managing Containers with `pct`
+------------------------------
+
+`pct` is the tool to manage Linux Containers on {pve}. You can create
+and destroy containers, and control execution (start, stop, migrate,
+...). You can use pct to set parameters in the associated config file,
+like network configuration or memory limits.
+
+
+CLI Usage Examples
+~~~~~~~~~~~~~~~~~~
+
+Create a container based on a Debian template (provided you have
+already downloaded the template via the web interface)
 
  pct create 100 /var/lib/vz/template/cache/debian-8.0-standard_8.0-1_amd64.tar.gz
 
@@ -54,66 +596,65 @@ Display the configuration
 
  pct config 100
 
-Add a network interface called eth0, bridged to the host bridge vmbr0,
+Add a network interface called `eth0`, bridged to the host bridge `vmbr0`,
 set the address and gateway, while it's running
 
  pct set 100 -net0 name=eth0,bridge=vmbr0,ip=192.168.15.147/24,gw=192.168.15.1
 
 Reduce the memory of the container to 512MB
 
- pct set -memory 512 100
+ pct set 100 -memory 512
+
 
 Files
 ------
 
-'/etc/pve/lxc/<vmid>.conf'::
+`/etc/pve/lxc/<CTID>.conf`::
 
-Configuration file for the container <vmid>
+Configuration file for the container '<CTID>'.
 
 
 Container Advantages
 --------------------
 
-- Simple, and fully integrated into {pve}. Setup looks similar to a normal
+* Simple, and fully integrated into {pve}. Setup looks similar to a normal
   VM setup. 
 
-  * Storage (ZFS, LVM, NFS, Ceph, ...)
+** Storage (ZFS, LVM, NFS, Ceph, ...)
 
-  * Network
+** Network
 
-  * Authentification
+** Authentication
 
-  * Cluster
+** Cluster
 
-- Fast: minimal overhead, as fast as bare metal
+* Fast: minimal overhead, as fast as bare metal
 
-- High density (perfect for idle workloads)
+* High density (perfect for idle workloads)
 
-- REST API
+* REST API
 
-- Direct hardware access
+* Direct hardware access
 
 
 Technology Overview
 -------------------
 
-- Integrated into {pve} graphical user interface (GUI)
-
-- LXC (https://linuxcontainers.org/)
+* Integrated into {pve} graphical user interface (GUI)
 
-- cgmanager for cgroup management
+* LXC (https://linuxcontainers.org/)
 
-- lxcfs to provive containerized /proc file system
+* lxcfs to provide containerized /proc file system
 
-- apparmor
+* AppArmor
 
-- CRIU: for live migration (planned)
+* CRIU: for live migration (planned)
 
-- We use latest available kernels (4.2.X)
+* We use latest available kernels (4.4.X)
 
-- image based deployment (templates)
+* Image based deployment (templates)
 
-- Container setup from host (Network, DNS, Storage, ...)
+* Container setup from host (network, DNS, storage, ...)
 
 
 ifdef::manvolnum[]