update static/schema information

[pve-docs.git] / qm.adoc
diff --git a/qm.adoc b/qm.adoc

index 45ec17fb18da692b369eecc6e39116fb24aa3a22..b3c3034896385aee8760a6c3ae6ff3d1855970dd 100644 (file)
--- a/qm.adoc
+++ b/qm.adoc
@@ -65,7 +65,7 @@ SCSI, IDE and SATA controllers, serial ports (the complete list can be seen in
  the `kvm(1)` man page) all of them emulated in software. All these devices
  are the exact software equivalent of existing hardware devices, and if the OS
  running in the guest has the proper drivers it will use the devices as if it
-were running on real hardware. This allows QEMU to runs _unmodified_ operating
+were running on real hardware. This allows QEMU to run _unmodified_ operating
  systems.
  
  This however has a performance cost, as running in software what was meant to
@@ -79,13 +79,13 @@ paravirtualized virtio devices, which includes a paravirtualized generic disk
  controller, a paravirtualized network card, a paravirtualized serial port,
  a paravirtualized SCSI controller, etc ...
  
-It is highly recommended to use the virtio devices whenever you can, as they
-provide a big performance improvement. Using  the virtio generic disk controller
-versus an emulated IDE controller will double the sequential write throughput,
-as measured with `bonnie++(8)`. Using the virtio network interface can deliver
-up to three times the throughput of an emulated Intel E1000 network card, as
-measured with `iperf(1)`. footnote:[See this benchmark on the KVM wiki
-https://www.linux-kvm.org/page/Using_VirtIO_NIC]
+TIP: It is *highly recommended* to use the virtio devices whenever you can, as
+they provide a big performance improvement and are generally better maintained.
+Using the virtio generic disk controller versus an emulated IDE controller will
+double the sequential write throughput, as measured with `bonnie++(8)`. Using
+the virtio network interface can deliver up to three times the throughput of an
+emulated Intel E1000 network card, as measured with `iperf(1)`. footnote:[See
+this benchmark on the KVM wiki https://www.linux-kvm.org/page/Using_VirtIO_NIC]
  
  
  [[qm_virtual_machines_settings]]
@@ -155,6 +155,9 @@ Bus/Controller
  ^^^^^^^^^^^^^^
  QEMU can emulate a number of storage controllers:
  
+TIP: It is highly recommended to use the *VirtIO SCSI* or *VirtIO Block*
+controller for performance reasons and because they are better maintained.
+
  * the *IDE* controller, has a design which goes back to the 1984 PC/AT disk
  controller. Even if this controller has been superseded by recent designs,
  each and every OS you can think of has support for it, making it a great choice
@@ -169,16 +172,15 @@ connected. You can connect up to 6 devices on this controller.
  hardware, and can connect up to 14 storage devices. {pve} emulates by default a
  LSI 53C895A controller.
  +
-A SCSI controller of type _VirtIO SCSI_ is the recommended setting if you aim for
-performance and is automatically selected for newly created Linux VMs since
-{pve} 4.3. Linux distributions have support for this controller since 2012, and
-FreeBSD since 2014. For Windows OSes, you need to provide an extra iso
-containing the drivers during the installation.
+A SCSI controller of type _VirtIO SCSI single_ and enabling the
+xref:qm_hard_disk_iothread[IO Thread] setting for the attached disks is
+recommended if you aim for performance. This is the default for newly created
+Linux VMs since {pve} 7.3. Each disk will have its own _VirtIO SCSI_ controller,
+and QEMU will handle the disks IO in a dedicated thread. Linux distributions
+have support for this controller since 2012, and FreeBSD since 2014. For Windows
+OSes, you need to provide an extra ISO containing the drivers during the
+installation.
  // https://pve.proxmox.com/wiki/Paravirtualized_Block_Drivers_for_Windows#During_windows_installation.
-If you aim at maximum performance, you can select a SCSI controller of type
-_VirtIO SCSI single_ which will allow you to select the *IO Thread* option.
-When selecting _VirtIO SCSI single_ QEMU will create a new controller for
-each disk, instead of adding all disks to the same controller.
  
  * The *VirtIO Block* controller, often just called VirtIO or virtio-blk,
  is an older type of paravirtualized controller. It has been superseded by the
@@ -252,7 +254,7 @@ IO Thread
  The option *IO Thread* can only be used when using a disk with the *VirtIO*
  controller, or with the *SCSI* controller, when the emulated controller type is
  *VirtIO SCSI single*. With *IO Thread* enabled, QEMU creates one I/O thread per
-storage controller, rather than handling all I/O in the main event loop or vCPU
+storage controller rather than handling all I/O in the main event loop or vCPU
  threads. One benefit is better work distribution and utilization of the
  underlying storage. Another benefit is reduced latency (hangs) in the guest for
  very I/O-intensive host workloads, since neither the main thread nor a vCPU
@@ -350,7 +352,10 @@ CPU Type
  
  QEMU can emulate a number different of *CPU types* from 486 to the latest Xeon
  processors. Each new processor generation adds new features, like hardware
-assisted 3d rendering, random number generation, memory protection, etc ...
+assisted 3d rendering, random number generation, memory protection, etc.. Also,
+a current generation can be upgraded through microcode update with bug or
+security fixes.
+
  Usually you should select for your VM a processor type which closely matches the
  CPU of the host system, as it means that the host CPU features (also called _CPU
  flags_ ) will be available in your VMs. If you want an exact match, you can set
@@ -358,16 +363,71 @@ the CPU type to *host* in which case the VM will have exactly the same CPU flags
  as your host system.
  
  This has a downside though. If you want to do a live migration of VMs between
-different hosts, your VM might end up on a new system with a different CPU type.
-If the CPU flags passed to the guest are missing, the qemu process will stop. To
-remedy this QEMU has also its own CPU type *kvm64*, that {pve} uses by defaults.
-kvm64 is a Pentium 4 look a like CPU type, which has a reduced CPU flags set,
-but is guaranteed to work everywhere.
+different hosts, your VM might end up on a new system with a different CPU type
+or a different microcode version.
+If the CPU flags passed to the guest are missing, the QEMU process will stop. To
+remedy this QEMU has also its own virtual CPU types, that {pve} uses by default.
+
+The backend default is 'kvm64' which works on essentially all x86_64 host CPUs
+and the UI default when creating a new VM is 'x86-64-v2-AES', which requires a
+host CPU starting from Westmere for Intel or at least a fourth generation
+Opteron for AMD.
+
+In short:
+
+If you don’t care about live migration or have a homogeneous cluster where all
+nodes have the same CPU and same microcode version, set the CPU type to host, as
+in theory this will give your guests maximum performance.
+
+If you care about live migration and security, and you have only Intel CPUs or
+only AMD CPUs, choose the lowest generation CPU model of your cluster.
+
+If you care about live migration without security, or have mixed Intel/AMD
+cluster, choose the lowest compatible virtual QEMU CPU type.
+
+NOTE: Live migrations between Intel and AMD host CPUs have no guarantee to work.
+
+See also
+xref:chapter_qm_vcpu_list[List of AMD and Intel CPU Types as Defined in QEMU].
+
+QEMU CPU Types
+^^^^^^^^^^^^^^
+
+QEMU also provide virtual CPU types, compatible with both Intel and AMD host
+CPUs.
+
+NOTE: To mitigate the Spectre vulnerability for virtual CPU types, you need to
+add the relevant CPU flags, see
+xref:qm_meltdown_spectre[Meltdown / Spectre related CPU flags].
+
+Historically, {pve} had the 'kvm64' CPU model, with CPU flags at the level of
+Pentium 4 enabled, so performance was not great for certain workloads.
+
+In the summer of 2020, AMD, Intel, Red Hat, and SUSE collaborated to define
+three x86-64 microarchitecture levels on top of the x86-64 baseline, with modern
+flags enabled. For details, see the
+https://gitlab.com/x86-psABIs/x86-64-ABI[x86-64-ABI specification].
+
+NOTE: Some newer distributions like CentOS 9 are now built with 'x86-64-v2'
+flags as a minimum requirement.
  
-In short, if you care about live migration and moving VMs between nodes, leave
-the kvm64 default. If you don’t care about live migration or have a homogeneous
-cluster where all nodes have the same CPU, set the CPU type to host, as in
-theory this will give your guests maximum performance.
+* 'kvm64 (x86-64-v1)': Compatible with Intel CPU >= Pentium 4, AMD CPU >=
+Phenom.
++
+* 'x86-64-v2': Compatible with Intel CPU >= Nehalem, AMD CPU >= Opteron_G3.
+Added CPU flags compared to 'x86-64-v1': '+cx16', '+lahf-lm', '+popcnt', '+pni',
+'+sse4.1', '+sse4.2', '+ssse3'.
++
+* 'x86-64-v2-AES': Compatible with Intel CPU >= Westmere, AMD CPU >= Opteron_G4.
+Added CPU flags compared to 'x86-64-v2': '+aes'.
++
+* 'x86-64-v3': Compatible with Intel CPU >= Broadwell, AMD CPU >= EPYC. Added
+CPU flags compared to 'x86-64-v2-AES': '+avx', '+avx2', '+bmi1', '+bmi2',
+'+f16c', '+fma', '+movbe', '+xsave'.
++
+* 'x86-64-v4': Compatible with Intel CPU >= Skylake, AMD CPU >= EPYC v4 Genoa.
+Added CPU flags compared to 'x86-64-v3': '+avx512f', '+avx512bw', '+avx512cd',
+'+avx512dq', '+avx512vl'.
  
  Custom CPU Types
  ^^^^^^^^^^^^^^^^
@@ -380,6 +440,7 @@ Specified custom types can be selected by any user with the `Sys.Audit`
  privilege on `/nodes`. When configuring a custom CPU type for a VM via the CLI
  or API, the name needs to be prefixed with 'custom-'.
  
+[[qm_meltdown_spectre]]
  Meltdown / Spectre related CPU flags
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  
@@ -751,6 +812,10 @@ if you use a SPICE client which supports it. If you add a SPICE USB port
  to your VM, you can passthrough a USB device from where your SPICE client is,
  directly to the VM (for example an input device or hardware dongle).
  
+It is also possible to map devices on a cluster level, so that they can be
+properly used with HA and hardware changes are detected and non root users
+can configure them. See xref:resource_mapping[Resource Mapping]
+for details on that.
  
  [[qm_bios_and_uefi]]
  BIOS and UEFI
@@ -765,7 +830,7 @@ open-source, x86 BIOS implementation. SeaBIOS is a good choice for most
  standard setups.
  
  Some operating systems (such as Windows 11) may require use of an UEFI
-compatible implementation instead. In such cases, you must rather use *OVMF*,
+compatible implementation. In such cases, you must use *OVMF* instead,
  which is an open-source UEFI implementation. footnote:[See the OVMF Project https://github.com/tianocore/tianocore.github.io/wiki/OVMF]
  
  There are other scenarios in which the SeaBIOS may not be the ideal firmware to
@@ -1045,6 +1110,7 @@ For Windows, it can be installed from the
  https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/stable-virtio/virtio-win.iso[Fedora
  VirtIO driver ISO].
  
+[[qm_qga_enable]]
  Enable Guest Agent Communication
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  
@@ -1052,6 +1118,10 @@ Communication from {pve} with the guest agent can be enabled in the VM's
  *Options* panel. A fresh start of the VM is necessary for the changes to take
  effect.
  
+[[qm_qga_auto_trim]]
+Automatic TRIM Using QGA
+^^^^^^^^^^^^^^^^^^^^^^^^
+
  It is possible to enable the 'Run guest-trim' option. With this enabled,
  {pve} will issue a trim command to the guest after the following
  operations that have the potential to write out zeros to the storage:
@@ -1061,6 +1131,35 @@ operations that have the potential to write out zeros to the storage:
  
  On a thin provisioned storage, this can help to free up unused space.
  
+NOTE: There is a caveat with ext4 on Linux, because it uses an in-memory
+optimization to avoid issuing duplicate TRIM requests. Since the guest doesn't
+know about the change in the underlying storage, only the first guest-trim will
+run as expected. Subsequent ones, until the next reboot, will only consider
+parts of the filesystem that changed since then.
+
+[[qm_qga_fsfreeze]]
+Filesystem Freeze & Thaw on Backup
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+By default, guest filesystems are synced via the 'fs-freeze' QEMU Guest Agent
+Command when a backup is performed, to provide consistency.
+
+On Windows guests, some applications might handle consistent backups themselves
+by hooking into the Windows VSS (Volume Shadow Copy Service) layer, a
+'fs-freeze' then might interfere with that. For example, it has been observed
+that calling 'fs-freeze' with some SQL Servers triggers VSS to call the SQL
+Writer VSS module in a mode that breaks the SQL Server backup chain for
+differential backups.
+
+For such setups you can configure {pve} to not issue a freeze-and-thaw cycle on
+backup by setting the `freeze-fs-on-backup` QGA option to `0`. This can also be
+done via the GUI with the 'Freeze/thaw guest filesystems on backup for
+consistency' option.
+
+IMPORTANT: Disabling this option can potentially lead to backups with inconsistent
+filesystems and should therefore only be disabled if you know what you are
+doing.
+
  Troubleshooting
  ^^^^^^^^^^^^^^^
  
@@ -1475,6 +1574,95 @@ chosen, the first of:
  3. The first non-shared storage from any VM disk.
  4. The storage `local` as a fallback.
  
+[[resource_mapping]]
+Resource Mapping
+----------------
+
+[thumbnail="screenshot/gui-datacenter-resource-mappings.png"]
+
+When using or referencing local resources (e.g. address of a pci device), using
+the raw address or id is sometimes problematic, for example:
+
+* when using HA, a different device with the same id or path may exist on the
+  target node, and if one is not careful when assigning such guests to HA
+  groups, the wrong device could be used, breaking configurations.
+
+* changing hardware can change ids and paths, so one would have to check all
+  assigned devices and see if the path or id is still correct.
+
+To handle this better, one can define cluster wide resource mappings, such that
+a resource has a cluster unique, user selected identifier which can correspond
+to different devices on different hosts. With this, HA won't start a guest with
+a wrong device, and hardware changes can be detected.
+
+Creating such a mapping can be done with the {pve} web GUI under `Datacenter`
+in the relevant tab in the `Resource Mappings` category, or on the cli with
+
+----
+# pvesh create /cluster/mapping/<type> <options>
+----
+
+[thumbnail="screenshot/gui-datacenter-mapping-pci-edit.png"]
+
+Where `<type>` is the hardware type (currently either `pci` or `usb`) and
+`<options>` are the device mappings and other configuration parameters.
+
+Note that the options must include a map property with all identifying
+properties of that hardware, so that it's possible to verify the hardware did
+not change and the correct device is passed through.
+
+For example to add a PCI device as `device1` with the path `0000:01:00.0` that
+has the device id `0001` and the vendor id `0002` on the node `node1`, and
+`0000:02:00.0` on `node2` you can add it with:
+
+----
+# pvesh create /cluster/mapping/pci --id device1 \
+ --map node=node1,path=0000:01:00.0,id=0002:0001 \
+ --map node=node2,path=0000:02:00.0,id=0002:0001
+----
+
+You must repeat the `map` parameter for each node where that device should have
+a mapping (note that you can currently only map one USB device per node per
+mapping).
+
+Using the GUI makes this much easier, as the correct properties are
+automatically picked up and sent to the API.
+
+[thumbnail="screenshot/gui-datacenter-mapping-usb-edit.png"]
+
+It's also possible for PCI devices to provide multiple devices per node with
+multiple map properties for the nodes. If such a device is assigned to a guest,
+the first free one will be used when the guest is started. The order of the
+paths given is also the order in which they are tried, so arbitrary allocation
+policies can be implemented.
+
+This is useful for devices with SR-IOV, since some times it is not important
+which exact virtual function is passed through.
+
+You can assign such a device to a guest either with the GUI or with
+
+----
+# qm set ID -hostpci0 <name>
+----
+
+for PCI devices, or
+
+----
+# qm set <vmid> -usb0 <name>
+----
+
+for USB devices.
+
+Where `<vmid>` is the guests id and `<name>` is the chosen name for the created
+mapping. All usual options for passing through the devices are allowed, such as
+`mdev`.
+
+To create mappings `Mapping.Modify` on `/mapping/<type>/<name>` is necessary
+(where `<type>` is the device type and `<name>` is the name of the mapping).
+
+To use these mappings, `Mapping.Use` on `/mapping/<type>/<name>` is necessary
+(in addition to the normal guest privileges to edit the configuration).
+
  Managing Virtual Machines with `qm`
  ------------------------------------
  
@@ -1641,7 +1829,6 @@ remove such a lock manually (for example after a power failure).
  CAUTION: Only do that if you are sure the action which set the lock is
  no longer running.
  
-
  ifdef::wiki[]
  
  See Also