add vxlan l3 routing

[pve-docs.git] / qm.adoc
diff --git a/qm.adoc b/qm.adoc

index cdd2829d25de8af332deaabe4a9083ba03c83054..06e88e3ae975b3ac0cbba4b3f257f358238c6ceb 100644 (file)
--- a/qm.adoc
+++ b/qm.adoc
@@ -163,14 +163,14 @@ On each controller you attach a number of emulated hard disks, which are backed
  by a file or a block device residing in the configured storage. The choice of
  a storage type will determine the format of the hard disk image. Storages which
  present block devices (LVM, ZFS, Ceph) will require the *raw disk image format*,
-whereas files based storages (Ext4, NFS, GlusterFS) will let you to choose
+whereas files based storages (Ext4, NFS, CIFS, GlusterFS) will let you to choose
  either the *raw disk image format* or the *QEMU image format*.
  
   * the *QEMU image format* is a copy on write format which allows snapshots, and
    thin provisioning of the disk image.
   * the *raw disk image* is a bit-to-bit image of a hard disk, similar to what
   you would get when executing the `dd` command on a block device in Linux. This
- format do not support thin provisioning or snapshots by itself, requiring
+ format does not support thin provisioning or snapshots by itself, requiring
   cooperation from the storage layer for these tasks. It may, however, be up to
   10% faster than the *QEMU image format*. footnote:[See this benchmark for details
   http://events.linuxfoundation.org/sites/events/files/slides/CloudOpen2013_Khoa_Huynh_v3.pdf]
@@ -230,14 +230,58 @@ virtual cpus, as for each virtual cpu you add, Qemu will create a new thread of
  execution on the host system. If you're not sure about the workload of your VM,
  it is usually a safe bet to set the number of *Total cores* to 2.
  
-NOTE: It is perfectly safe to set the _overall_ number of total cores in all
-your VMs to be greater than the number of of cores you have on your server (i.e.
-4 VMs with each 4 Total cores running in a 8 core machine is OK) In that case
-the host system will balance the Qemu execution threads between your server
-cores just like if you were running a standard multithreaded application.
-However {pve} will prevent you to allocate on a _single_ machine more vcpus than
-physically available, as this will only bring the performance down due to the
-cost of context switches.
+NOTE: It is perfectly safe if the _overall_ number of cores of all your VMs
+is greater than the number of cores on the server (e.g., 4 VMs with each 4
+cores on a machine with only 8 cores). In that case the host system will
+balance the Qemu execution threads between your server cores, just like if you
+were running a standard multithreaded application. However, {pve} will prevent
+you from assigning more virtual CPU cores than physically available, as this will
+only bring the performance down due to the cost of context switches.
+
+[[qm_cpu_resource_limits]]
+Resource Limits
+^^^^^^^^^^^^^^^
+
+In addition to the number of virtual cores, you can configure how much resources
+a VM can get in relation to the host CPU time and also in relation to other
+VMs.
+With the *cpulimit* (``Host CPU Time'') option you can limit how much CPU time
+the whole VM can use on the host. It is a floating point value representing CPU
+time in percent, so `1.0` is equal to `100%`, `2.5` to `250%` and so on. If a
+single process would fully use one single core it would have `100%` CPU Time
+usage. If a VM with four cores utilizes all its cores fully it would
+theoretically use `400%`. In reality the usage may be even a bit higher as Qemu
+can have additional threads for VM peripherals besides the vCPU core ones.
+This setting can be useful if a VM should have multiple vCPUs, as it runs a few
+processes in parallel, but the VM as a whole should not be able to run all
+vCPUs at 100% at the same time. Using a specific example: lets say we have a VM
+which would profit from having 8 vCPUs, but at no time all of those 8 cores
+should run at full load - as this would make the server so overloaded that
+other VMs and CTs would get to less CPU. So, we set the *cpulimit* limit to
+`4.0` (=400%). If all cores do the same heavy work they would all get 50% of a
+real host cores CPU time. But, if only 4 would do work they could still get
+almost 100% of a real core each.
+
+NOTE: VMs can, depending on their configuration, use additional threads e.g.,
+for networking or IO operations but also live migration. Thus a VM can show up
+to use more CPU time than just its virtual CPUs could use. To ensure that a VM
+never uses more CPU time than virtual CPUs assigned set the *cpulimit* setting
+to the same value as the total core count.
+
+The second CPU resource limiting setting, *cpuunits* (nowadays often called CPU
+shares or CPU weight), controls how much CPU time a VM gets in regards to other
+VMs running.  It is a relative weight which defaults to `1024`, if you increase
+this for a VM it will be prioritized by the scheduler in comparison to other
+VMs with lower weight. E.g., if VM 100 has set the default 1024 and VM 200 was
+changed to `2048`, the latter VM 200 would receive twice the CPU bandwidth than
+the first VM 100.
+
+For more information see `man systemd.resource-control`, here `CPUQuota`
+corresponds to `cpulimit` and `CPUShares` corresponds to our `cpuunits`
+setting, visit its Notes section for references and implementation details.
+
+CPU Type
+^^^^^^^^
  
  Qemu can emulate a number different of *CPU types* from 486 to the latest Xeon
  processors. Each new processor generation adds new features, like hardware
@@ -256,22 +300,114 @@ kvm64 is a Pentium 4 look a like CPU type, which has a reduced CPU flags set,
  but is guaranteed to work everywhere.
  
  In short, if you care about live migration and moving VMs between nodes, leave
-the kvm64 default. If you don’t care about live migration, set the CPU type to
-host, as in theory this will give your guests maximum performance.
+the kvm64 default. If you don’t care about live migration or have a homogeneous
+cluster where all nodes have the same CPU, set the CPU type to host, as in
+theory this will give your guests maximum performance.
  
-You can also optionally emulate a *NUMA* architecture in your VMs. The basics of
-the NUMA architecture mean that instead of having a global memory pool available
-to all your cores, the memory is spread into local banks close to each socket.
+Meltdown / Spectre related CPU flags
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+There are two CPU flags related to the Meltdown and Spectre vulnerabilities
+footnote:[Meltdown Attack https://meltdownattack.com/] which need to be set
+manually unless the selected CPU type of your VM already enables them by default.
+
+The first, called 'pcid', helps to reduce the performance impact of the Meltdown
+mitigation called 'Kernel Page-Table Isolation (KPTI)', which effectively hides
+the Kernel memory from the user space. Without PCID, KPTI is quite an expensive
+mechanism footnote:[PCID is now a critical performance/security feature on x86
+https://groups.google.com/forum/m/#!topic/mechanical-sympathy/L9mHTbeQLNU].
+
+The second CPU flag is called 'spec-ctrl', which allows an operating system to
+selectively disable or restrict speculative execution in order to limit the
+ability of attackers to exploit the Spectre vulnerability.
+
+There are two requirements that need to be fulfilled in order to use these two
+CPU flags:
+
+* The host CPU(s) must support the feature and propagate it to the guest's virtual CPU(s)
+* The guest operating system must be updated to a version which mitigates the
+  attacks and is able to utilize the CPU feature
+
+In order to use 'spec-ctrl', your CPU or system vendor also needs to provide a
+so-called ``microcode update'' footnote:[You can use `intel-microcode' /
+`amd-microcode' from Debian non-free if your vendor does not provide such an
+update. Note that not all affected CPUs can be updated to support spec-ctrl.]
+for your CPU.
+
+To check if the {pve} host supports PCID, execute the following command as root:
+
+----
+# grep ' pcid ' /proc/cpuinfo
+----
+
+If this does not return empty your host's CPU has support for 'pcid'.
+
+To check if the {pve} host supports spec-ctrl, execute the following command as root:
+
+----
+# grep ' spec_ctrl ' /proc/cpuinfo
+----
+
+If this does not return empty your host's CPU has support for 'spec-ctrl'.
+
+If you use `host' or another CPU type which enables the desired flags by
+default, and you updated your guest OS to make use of the associated CPU
+features, you're already set.
+
+Otherwise you need to set the desired CPU flag of the virtual CPU, either by
+editing the CPU options in the WebUI, or by setting the 'flags' property of the
+'cpu' option in the VM configuration file.
+
+NUMA
+^^^^
+You can also optionally emulate a *NUMA*
+footnote:[https://en.wikipedia.org/wiki/Non-uniform_memory_access] architecture
+in your VMs. The basics of the NUMA architecture mean that instead of having a
+global memory pool available to all your cores, the memory is spread into local
+banks close to each socket.
  This can bring speed improvements as the memory bus is not a bottleneck
  anymore. If your system has a NUMA architecture footnote:[if the command
  `numactl --hardware | grep available` returns more than one node, then your host
  system has a NUMA architecture] we recommend to activate the option, as this
-will allow proper distribution of the VM resources on the host system. This
-option is also required in {pve} to allow hotplugging of cores and RAM to a VM.
+will allow proper distribution of the VM resources on the host system.
+This option is also required to hot-plug cores or RAM in a VM.
  
  If the NUMA option is used, it is recommended to set the number of sockets to
  the number of sockets of the host system.
  
+vCPU hot-plug
+^^^^^^^^^^^^^
+
+Modern operating systems introduced the capability to hot-plug and, to a
+certain extent, hot-unplug CPUs in a running systems. Virtualisation allows us
+to avoid a lot of the (physical) problems real hardware can cause in such
+scenarios.
+Still, this is a rather new and complicated feature, so its use should be
+restricted to cases where its absolutely needed. Most of the functionality can
+be replicated with other, well tested and less complicated, features, see
+xref:qm_cpu_resource_limits[Resource Limits].
+
+In {pve} the maximal number of plugged CPUs is always `cores * sockets`.
+To start a VM with less than this total core count of CPUs you may use the
+*vpus* setting, it denotes how many vCPUs should be plugged in at VM start.
+
+Currently only this feature is only supported on Linux, a kernel newer than 3.10
+is needed, a kernel newer than 4.7 is recommended.
+
+You can use a udev rule as follow to automatically set new CPUs as online in
+the guest:
+
+----
+SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}=="0", ATTR{online}="1"
+----
+
+Save this under /etc/udev/rules.d/ as a file ending in `.rules`.
+
+Note: CPU hot-remove is machine dependent and requires guest cooperation.
+The deletion command does not guarantee CPU removal to actually happen,
+typically it's a request forwarded to guest using target dependent mechanism,
+e.g., ACPI on x86/amd64.
+
  
  [[qm_memory]]
  Memory
@@ -282,27 +418,26 @@ For each VM you have the option to set a fixed size memory or asking
  host.
  
  .Fixed Memory Allocation
-[thumbnail="gui-create-vm-memory-fixed.png"]
+[thumbnail="gui-create-vm-memory.png"]
  
-When choosing a *fixed size memory* {pve} will simply allocate what you
-specify to your VM.
+When setting memory and minimum memory to the same amount
+{pve} will simply allocate what you specify to your VM.
  
  Even when using a fixed memory size, the ballooning device gets added to the
  VM, because it delivers useful information such as how much memory the guest
  really uses.
  In general, you should leave *ballooning* enabled, but if you want to disable
  it (e.g. for debugging purposes), simply uncheck
-*Ballooning* or set
+*Ballooning Device* or set
  
   balloon: 0
  
  in the configuration.
  
  .Automatic Memory Allocation
-[thumbnail="gui-create-vm-memory-dynamic.png", float="left"]
  
  // see autoballoon() in pvestatd.pm
-When choosing to *automatically allocate memory*, {pve} will make sure that the
+When setting the minimum memory lower than memory, {pve} will make sure that the
  minimum amount you specified is always available to the VM, and if RAM usage on
  the host is below 80%, will dynamically add memory to the guest up to the
  maximum memory specified.
@@ -367,7 +502,8 @@ have direct access to the Ethernet LAN on which the host is located.
  the Qemu user networking stack, where a built-in router and DHCP server can
  provide network access. This built-in DHCP will serve addresses in the private
  10.0.2.0/24 range. The NAT mode is much slower than the bridged mode, and
-should only be used for testing.
+should only be used for testing. This mode is only available via CLI or the API,
+but not via the WebUI.
  
  You can also skip adding a network device when creating a VM by selecting *No
  network device*.
@@ -493,15 +629,16 @@ parameters:
  * *Start/Shutdown order*: Defines the start order priority. E.g. set it to 1 if
  you want the VM to be the first to be started. (We use the reverse startup
  order for shutdown, so a machine with a start order of 1 would be the last to
-be shut down)
+be shut down). If multiple VMs have the same order defined on a host, they will
+additionally be ordered by 'VMID' in ascending order.
  * *Startup delay*: Defines the interval between this VM start and subsequent
  VMs starts . E.g. set it to 240 if you want to wait 240 seconds before starting
  other VMs.
  * *Shutdown timeout*: Defines the duration in seconds {pve} should wait
  for the VM to be offline after issuing a shutdown command.
-By default this value is set to 60, which means that {pve} will issue a
-shutdown request, wait 60s for the machine to be offline, and if after 60s
-the machine is still online will notify that the shutdown action failed.
+By default this value is set to 180, which means that {pve} will issue a
+shutdown request and wait 180 seconds for the machine to be offline. If
+the machine is still online after the timeout it will be stopped forcefully.
  
  NOTE: VMs managed by the HA stack do not follow the 'start on boot' and
  'boot order' options currently. Those VMs will be skipped by the startup and
@@ -509,8 +646,8 @@ shutdown algorithm as the HA manager itself ensures that VMs get started and
  stopped.
  
  Please note that machines without a Start/Shutdown order parameter will always
-start after those where the parameter is set, and this parameter only
-makes sense between the machines running locally on a host, and not
+start after those where the parameter is set. Further, this parameter can only
+be enforced between virtual machines running on the same host, not
  cluster-wide.
  
  
@@ -722,7 +859,7 @@ foreign hypervisor, or one that you created yourself.
  Suppose you created a Debian/Ubuntu disk image with the 'vmdebootstrap' tool:
  
   vmdebootstrap --verbose \
-  --size 10G --serial-console \
+  --size 10GiB --serial-console \
    --grub --no-extlinux \
    --package openssh-server \
    --package avahi-daemon \
@@ -746,6 +883,13 @@ Finally attach the unused disk to the SCSI controller of the VM:
  
  The VM is ready to be started.
  
+
+ifndef::wiki[]
+include::qm-cloud-init.adoc[]
+endif::wiki[]
+
+
+
  Managing Virtual Machines with `qm`
  ------------------------------------
  
@@ -876,6 +1020,16 @@ CAUTION: Only do that if you are sure the action which set the lock is
  no longer running.
  
  
+ifdef::wiki[]
+
+See Also
+~~~~~~~~
+
+* link:/wiki/Cloud-Init_Support[Cloud-Init Support]
+
+endif::wiki[]
+
+
  ifdef::manvolnum[]
  
  Files