]>
Commit | Line | Data |
---|---|---|
f69cfd23 DM |
1 | ifdef::manvolnum[] |
2 | PVE({manvolnum}) | |
3 | ================ | |
38fd0958 | 4 | include::attributes.txt[] |
f69cfd23 DM |
5 | |
6 | NAME | |
7 | ---- | |
8 | ||
9 | qm - Qemu/KVM Virtual Machine Manager | |
10 | ||
11 | ||
12 | SYNOPSYS | |
13 | -------- | |
14 | ||
15 | include::qm.1-synopsis.adoc[] | |
16 | ||
17 | DESCRIPTION | |
18 | ----------- | |
19 | endif::manvolnum[] | |
20 | ||
21 | ifndef::manvolnum[] | |
22 | Qemu/KVM Virtual Machines | |
23 | ========================= | |
38fd0958 | 24 | include::attributes.txt[] |
f69cfd23 DM |
25 | endif::manvolnum[] |
26 | ||
c4cba5d7 EK |
27 | // deprecates |
28 | // http://pve.proxmox.com/wiki/Container_and_Full_Virtualization | |
29 | // http://pve.proxmox.com/wiki/KVM | |
30 | // http://pve.proxmox.com/wiki/Qemu_Server | |
31 | ||
32 | Qemu (short form for Quick Emulator) is an opensource hypervisor that emulates a | |
33 | physical computer. From the perspective of the host system where Qemu is | |
34 | running, Qemu is a user program which has access to a number of local resources | |
35 | like partitions, files, network cards which are then passed to an | |
189d3661 | 36 | emulated computer which sees them as if they were real devices. |
c4cba5d7 EK |
37 | |
38 | A guest operating system running in the emulated computer accesses these | |
39 | devices, and runs as it were running on real hardware. For instance you can pass | |
40 | an iso image as a parameter to Qemu, and the OS running in the emulated computer | |
189d3661 | 41 | will see a real CDROM inserted in a CD drive. |
c4cba5d7 | 42 | |
189d3661 | 43 | Qemu can emulates a great variety of hardware from ARM to Sparc, but {pve} is |
c4cba5d7 EK |
44 | only concerned with 32 and 64 bits PC clone emulation, since it represents the |
45 | overwhelming majority of server hardware. The emulation of PC clones is also one | |
46 | of the fastest due to the availability of processor extensions which greatly | |
47 | speed up Qemu when the emulated architecture is the same as the host | |
9c63b5d9 EK |
48 | architecture. |
49 | ||
50 | NOTE: You may sometimes encounter the term _KVM_ (Kernel-based Virtual Machine). | |
51 | It means that Qemu is running with the support of the virtualization processor | |
52 | extensions, via the Linux kvm module. In the context of {pve} _Qemu_ and | |
53 | _KVM_ can be use interchangeably as Qemu in {pve} will always try to load the kvm | |
54 | module. | |
55 | ||
c4cba5d7 EK |
56 | Qemu inside {pve} runs as a root process, since this is required to access block |
57 | and PCI devices. | |
58 | ||
59 | Emulated devices and paravirtualized devices | |
60 | -------------------------------------------- | |
61 | ||
189d3661 DC |
62 | The PC hardware emulated by Qemu includes a mainboard, network controllers, |
63 | scsi, ide and sata controllers, serial ports (the complete list can be seen in | |
64 | the `kvm(1)` man page) all of them emulated in software. All these devices | |
65 | are the exact software equivalent of existing hardware devices, and if the OS | |
66 | running in the guest has the proper drivers it will use the devices as if it | |
c4cba5d7 EK |
67 | were running on real hardware. This allows Qemu to runs _unmodified_ operating |
68 | systems. | |
69 | ||
70 | This however has a performance cost, as running in software what was meant to | |
71 | run in hardware involves a lot of extra work for the host CPU. To mitigate this, | |
72 | Qemu can present to the guest operating system _paravirtualized devices_, where | |
73 | the guest OS recognizes it is running inside Qemu and cooperates with the | |
74 | hypervisor. | |
75 | ||
76 | Qemu relies on the virtio virtualization standard, and is thus able to presente | |
189d3661 DC |
77 | paravirtualized virtio devices, which includes a paravirtualized generic disk |
78 | controller, a paravirtualized network card, a paravirtualized serial port, | |
c4cba5d7 EK |
79 | a paravirtualized SCSI controller, etc ... |
80 | ||
189d3661 DC |
81 | It is highly recommended to use the virtio devices whenever you can, as they |
82 | provide a big performance improvement. Using the virtio generic disk controller | |
83 | versus an emulated IDE controller will double the sequential write throughput, | |
84 | as measured with `bonnie++(8)`. Using the virtio network interface can deliver | |
c4cba5d7 | 85 | up to three times the throughput of an emulated Intel E1000 network card, as |
189d3661 | 86 | measured with `iperf(1)`. footnote:[See this benchmark on the KVM wiki |
c4cba5d7 EK |
87 | http://www.linux-kvm.org/page/Using_VirtIO_NIC] |
88 | ||
89 | Virtual Machines settings | |
90 | ------------------------- | |
91 | Generally speaking {pve} tries to choose sane defaults for virtual machines | |
92 | (VM). Make sure you understand the meaning of the settings you change, as it | |
93 | could incur a performance slowdown, or putting your data at risk. | |
94 | ||
95 | General Settings | |
96 | ~~~~~~~~~~~~~~~~ | |
97 | General settings of a VM include | |
98 | ||
99 | * the *Node* : the physical server on which the VM will run | |
100 | * the *VM ID*: a unique number in this {pve} installation used to identify your VM | |
101 | * *Name*: a free form text string you can use to describe the VM | |
102 | * *Resource Pool*: a logical group of VMs | |
103 | ||
104 | OS Settings | |
105 | ~~~~~~~~~~~ | |
106 | When creating a VM, setting the proper Operating System(OS) allows {pve} to | |
107 | optimize some low level parameters. For instance Windows OS expect the BIOS | |
108 | clock to use the local time, while Unix based OS expect the BIOS clock to have | |
109 | the UTC time. | |
110 | ||
111 | Hard Disk | |
112 | ~~~~~~~~~ | |
2ec49380 | 113 | Qemu can emulate a number of storage controllers: |
c4cba5d7 EK |
114 | |
115 | * the *IDE* controller, has a design which goes back to the 1984 PC/AT disk | |
116 | controller. Even if this controller has been superseded by more more designs, | |
117 | each and every OS you can think has support for it, making it a great choice | |
118 | if you want to run an OS released before 2003. You can connect up to 4 devices | |
119 | on this controller. | |
120 | ||
121 | * the *SATA* (Serial ATA) controller, dating from 2003, has a more modern | |
122 | design, allowing higher throughput and a greater number of devices to be | |
123 | connected. You can connect up to 6 devices on this controller. | |
124 | ||
125 | * the *SCSI* controller, designed in 1985, is commonly found on server | |
189d3661 | 126 | grade hardware, and can connect up to 14 storage devices. {pve} emulates by |
c4cba5d7 EK |
127 | default a LSI 53C895A controller. |
128 | ||
129 | * The *Virtio* controller is a generic paravirtualized controller, and is the | |
130 | recommended setting if you aim for performance. To use this controller, the OS | |
131 | need to have special drivers which may be included in your installation ISO or | |
132 | not. Linux distributions have support for the Virtio controller since 2010, and | |
133 | FreeBSD since 2014. For Windows OSes, you need to provide an extra iso | |
189d3661 | 134 | containing the Virtio drivers during the installation. |
c4cba5d7 EK |
135 | // see: https://pve.proxmox.com/wiki/Paravirtualized_Block_Drivers_for_Windows#During_windows_installation. |
136 | You can connect up to 16 devices on this controller. | |
137 | ||
138 | On each controller you attach a number of emulated hard disks, which are backed | |
139 | by a file or a block device residing in the configured storage. The choice of | |
140 | a storage type will determine the format of the hard disk image. Storages which | |
141 | present block devices (LVM, ZFS, Ceph) will require the *raw disk image format*, | |
142 | whereas files based storages (Ext4, NFS, GlusterFS) will let you to choose | |
143 | either the *raw disk image format* or the *QEMU image format*. | |
144 | ||
145 | * the *QEMU image format* is a copy on write format which allows snapshots, and | |
146 | thin provisioning of the disk image. | |
189d3661 DC |
147 | * the *raw disk image* is a bit-to-bit image of a hard disk, similar to what |
148 | you would get when executing the `dd` command on a block device in Linux. This | |
149 | format do not support thin provisioning or snapshotting by itself, requiring | |
150 | cooperation from the storage layer for these tasks. It is however 10% faster | |
151 | than the *QEMU image format*. footnote:[See this benchmark for details | |
c4cba5d7 | 152 | http://events.linuxfoundation.org/sites/events/files/slides/CloudOpen2013_Khoa_Huynh_v3.pdf] |
189d3661 | 153 | * the *VMware image format* only makes sense if you intend to import/export the |
c4cba5d7 EK |
154 | disk image to other hypervisors. |
155 | ||
156 | Setting the *Cache* mode of the hard drive will impact how the host system will | |
157 | notify the guest systems of block write completions. The *No cache* default | |
158 | means that the guest system will be notified that a write is complete when each | |
159 | block reaches the physical storage write queue, ignoring the host page cache. | |
160 | This provides a good balance between safety and speed. | |
161 | ||
162 | If you want the {pve} backup manager to skip a disk when doing a backup of a VM, | |
163 | you can set the *No backup* option on that disk. | |
164 | ||
165 | If your storage supports _thin provisioning_ (see the storage chapter in the | |
166 | {pve} guide), and your VM has a *SCSI* controller you can activate the *Discard* | |
167 | option on the hard disks connected to that controller. With *Discard* enabled, | |
168 | when the filesystem of a VM marks blocks as unused after removing files, the | |
169 | emulated SCSI controller will relay this information to the storage, which will | |
170 | then shrink the disk image accordingly. | |
171 | ||
af9c6de1 | 172 | .IO Thread |
2b6e4b66 EK |
173 | The option *IO Thread* can only be enabled when using a disk with the *VirtIO* controller, |
174 | or with the *SCSI* controller, when the emulated controller type is *VirtIO SCSI*. | |
c564fc52 DC |
175 | With this enabled, Qemu uses one thread per disk, instead of one thread for all, |
176 | so it should increase performance when using multiple disks. | |
177 | Note that backups do not currently work with *IO Thread* enabled. | |
178 | ||
34e541c5 EK |
179 | CPU |
180 | ~~~ | |
181 | A *CPU socket* is a physical slot on a PC motherboard where you can plug a CPU. | |
182 | This CPU can then contain one or many *cores*, which are independent | |
183 | processing units. Whether you have a single CPU socket with 4 cores, or two CPU | |
184 | sockets with two cores is mostly irrelevant from a performance point of view. | |
185 | However some software is licensed depending on the number of sockets you have in | |
186 | your machine, in that case it makes sense to set the number of of sockets to | |
187 | what the license allows you, and increase the number of cores. + | |
188 | Increasing the number of virtual cpus (cores and sockets) will usually provide a | |
189 | performance improvement though that is heavily dependent on the use of the VM. | |
190 | Multithreaded applications will of course benefit from a large number of | |
191 | virtual cpus, as for each virtual cpu you add, Qemu will create a new thread of | |
192 | execution on the host system. If you're not sure about the workload of your VM, | |
193 | it is usually a safe bet to set the number of *Total cores* to 2. | |
194 | ||
195 | NOTE: It is perfectly safe to set the _overall_ number of total cores in all | |
196 | your VMs to be greater than the number of of cores you have on your server (ie. | |
197 | 4 VMs with each 4 Total cores running in a 8 core machine is OK) In that case | |
198 | the host system will balance the Qemu execution threads between your server | |
199 | cores just like if you were running a standard multithreaded application. | |
200 | However {pve} will prevent you to allocate on a _single_ machine more vcpus than | |
201 | physically available, as this will only bring the performance down due to the | |
202 | cost of context switches. | |
203 | ||
204 | Qemu can emulate a number different of *CPU types* from 486 to the latest Xeon | |
205 | processors. Each new processor generation adds new features, like hardware | |
206 | assisted 3d rendering, random number generation, memory protection, etc ... | |
207 | Usually you should select for your VM a processor type which closely matches the | |
208 | CPU of the host system, as it means that the host CPU features (also called _CPU | |
209 | flags_ ) will be available in your VMs. If you want an exact match, you can set | |
210 | the CPU type to *host* in which case the VM will have exactly the same CPU flags | |
211 | as your host system. + | |
212 | This has a downside though. If you want to do a live migration of VMs between | |
213 | different hosts, your VM might end up on a new system with a different CPU type. | |
214 | If the CPU flags passed to the guest are missing, the qemu process will stop. To | |
215 | remedy this Qemu has also its own CPU type *kvm64*, that {pve} uses by defaults. | |
216 | kvm64 is a Pentium 4 look a like CPU type, which has a reduced CPU flags set, | |
217 | but is guaranteed to work everywhere. + | |
218 | In short, if you care about live migration and moving VMs between nodes, leave | |
219 | the kvm64 default. If you don’t care about live migration, set the CPU type to | |
220 | host, as in theory this will give your guests maximum performance. | |
221 | ||
222 | You can also optionally emulate a *NUMA* architecture in your VMs. The basics of | |
223 | the NUMA architecture mean that instead of having a global memory pool available | |
224 | to all your cores, the memory is spread into local banks close to each socket. | |
225 | This can bring speed improvements as the memory bus is not a bottleneck | |
226 | anymore. If your system has a NUMA architecture footnote:[if the command | |
227 | `numactl --hardware | grep available` returns more than one node, then your host | |
228 | system has a NUMA architecture] we recommend to activate the option, as this | |
229 | will allow proper distribution of the VM resources on the host system. This | |
230 | option is also required in {pve} to allow hotplugging of cores and RAM to a VM. | |
231 | ||
232 | If the NUMA option is used, it is recommended to set the number of sockets to | |
233 | the number of sockets of the host system. | |
234 | ||
235 | Memory | |
236 | ~~~~~~ | |
237 | For each VM you have the option to set a fixed size memory or asking | |
238 | {pve} to dynamically allocate memory based on the current RAM usage of the | |
239 | host. | |
240 | ||
241 | When choosing a *fixed size memory* {pve} will simply allocate what you | |
242 | specify to your VM. | |
243 | ||
244 | // see autoballoon() in pvestatd.pm | |
245 | When choosing to *automatically allocate memory*, {pve} will make sure that the | |
246 | minimum amount you specified is always available to the VM, and if RAM usage on | |
247 | the host is below 80%, will dynamically add memory to the guest up to the | |
248 | maximum memory specified. + | |
249 | When the host is becoming short on RAM, the VM will then release some memory | |
250 | back to the host, swapping running processes if needed and starting the oom | |
251 | killer in last resort. The passing around of memory between host and guest is | |
252 | done via a special `balloon` kernel driver running inside the guest, which will | |
253 | grab or release memory pages from the host. | |
254 | footnote:[A good explanation of the inner workings of the balloon driver can be found here https://rwmj.wordpress.com/2010/07/17/virtio-balloon/] | |
255 | ||
c9f6e1a4 EK |
256 | When multiple VMs use the autoallocate facility, it is possible to set a |
257 | *Shares* coefficient which indicates the relative amount of the free host memory | |
258 | that each VM shoud take. Suppose for instance you have four VMs, three of them | |
259 | running a HTTP server and the last one is a database server. To cache more | |
260 | database blocks in the database server RAM, you would like to prioritize the | |
261 | database VM when spare RAM is available. For this you assign a Shares property | |
262 | of 3000 to the database VM, leaving the other VMs to the Shares default setting | |
263 | of 1000. The host server has 32GB of RAM, and is curring using 16GB, leaving 32 | |
264 | * 80/100 - 16 = 9GB RAM to be allocated to the VMs. The database VM will get 9 * | |
265 | 3000 / (3000 + 1000 + 1000 + 1000) = 4.5 GB extra RAM and each HTTP server will | |
266 | get 1/5 GB. | |
267 | ||
34e541c5 EK |
268 | All Linux distributions released after 2010 have the balloon kernel driver |
269 | included. For Windows OSes, the balloon driver needs to be added manually and can | |
270 | incur a slowdown of the guest, so we don't recommend using it on critical | |
271 | systems. | |
272 | // see https://forum.proxmox.com/threads/solved-hyper-threading-vs-no-hyper-threading-fixed-vs-variable-memory.20265/ | |
273 | ||
274 | When allocating RAMs to your VMs, a good rule of thumb is always to leave 1GB | |
275 | of RAM available to the host. | |
276 | ||
1ff7835b EK |
277 | Network Device |
278 | ~~~~~~~~~~~~~~ | |
279 | Each VM can have many _Network interface controllers_ (NIC), of four different | |
280 | types: | |
281 | ||
282 | * *Intel E1000* is the default, and emulates an Intel Gigabit network card. | |
283 | * the *VirtIO* paravirtualized NIC should be used if you aim for maximum | |
284 | performance. Like all VirtIO devices, the guest OS should have the proper driver | |
285 | installed. | |
286 | * the *Realtek 8139* emulates an older 100 MB/s network card, and should | |
287 | only be used when emulating older operating systems ( released before 2002 ) | |
288 | * the *vmxnet3* is another paravirtualized device, which should only be used | |
289 | when importing a VM from another hypervisor. | |
290 | ||
291 | {pve} will generate for each NIC a random *MAC address*, so that your VM is | |
292 | addressable on Ethernet networks. | |
293 | ||
af9c6de1 EK |
294 | The NIC you added to the VM can follow one of two differents models: |
295 | ||
296 | * in the default *Bridged mode* each virtual NIC is backed on the host by a | |
297 | _tap device_, ( a software loopback device simulating an Ethernet NIC ). This | |
298 | tap device is added to a bridge, by default vmbr0 in {pve}. In this mode, VMs | |
299 | have direct access to the Ethernet LAN on which the host is located. | |
300 | * in the alternative *NAT mode*, each virtual NIC will only communicate with | |
301 | the Qemu user networking stack, where a builting router and DHCP server can | |
302 | provide network access. This built-in DHCP will serve adresses in the private | |
303 | 10.0.2.0/24 range. The NAT mode is much slower than the bridged mode, and | |
304 | should only be used for testing. | |
305 | ||
306 | You can also skip adding a network device when creating a VM by selecting *No | |
307 | network device*. | |
308 | ||
309 | .Multiqueue | |
1ff7835b | 310 | If you are using the VirtIO driver, you can optionally activate the |
af9c6de1 | 311 | *Multiqueue* option. This option allows the guest OS to process networking |
1ff7835b EK |
312 | packets using multiple virtual CPUs, providing an increase in the total number |
313 | of packets transfered. | |
314 | ||
315 | //http://blog.vmsplice.net/2011/09/qemu-internals-vhost-architecture.html | |
316 | When using the VirtIO driver with {pve}, each NIC network queue is passed to the | |
317 | host kernel, where the queue will be processed by a kernel thread spawn by the | |
318 | vhost driver. With this option activated, it is possible to pass _multiple_ | |
319 | network queues to the host kernel for each NIC. | |
320 | ||
321 | //https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Tuning_and_Optimization_Guide/sect-Virtualization_Tuning_Optimization_Guide-Networking-Techniques.html#sect-Virtualization_Tuning_Optimization_Guide-Networking-Multi-queue_virtio-net | |
af9c6de1 | 322 | When using Multiqueue, it is recommended to set it to a value equal |
1ff7835b EK |
323 | to the number of Total Cores of your guest. You also need to set in |
324 | the VM the number of multi-purpose channels on each VirtIO NIC with the ethtool | |
325 | command: | |
326 | ||
327 | `ethtool -L eth0 combined X` | |
328 | ||
329 | where X is the number of the number of vcpus of the VM. | |
330 | ||
af9c6de1 | 331 | You should note that setting the Multiqueue parameter to a value greater |
1ff7835b EK |
332 | than one will increase the CPU load on the host and guest systems as the |
333 | traffic increases. We recommend to set this option only when the VM has to | |
334 | process a great number of incoming connections, such as when the VM is running | |
335 | as a router, reverse proxy or a busy HTTP server doing long polling. | |
336 | ||
685cc8e0 DC |
337 | USB Passthrough |
338 | ~~~~~~~~~~~~~~~ | |
339 | There are two different types of USB passthrough devices: | |
340 | ||
341 | * Host USB passtrough | |
342 | * SPICE USB passthrough | |
343 | ||
344 | Host USB passthrough works by giving a VM a USB device of the host. | |
345 | This can either be done via the vendor- and product-id, or | |
346 | via the host bus and port. | |
347 | ||
348 | The vendor/product-id looks like this: *0123:abcd*, | |
349 | where *0123* is the id of the vendor, and *abcd* is the id | |
350 | of the product, meaning two pieces of the same usb device | |
351 | have the same id. | |
352 | ||
353 | The bus/port looks like this: *1-2.3.4*, where *1* is the bus | |
354 | and *2.3.4* is the port path. This represents the physical | |
355 | ports of your host (depending of the internal order of the | |
356 | usb controllers). | |
357 | ||
358 | If a device is present in a VM configuration when the VM starts up, | |
359 | but the device is not present in the host, the VM can boot without problems. | |
360 | As soon as the device/port ist available in the host, it gets passed through. | |
361 | ||
362 | WARNING: Using this kind of USB passthrough, means that you cannot move | |
363 | a VM online to another host, since the hardware is only available | |
364 | on the host the VM is currently residing. | |
365 | ||
366 | The second type of passthrough is SPICE USB passthrough. This is useful | |
367 | if you use a SPICE client which supports it. If you add a SPICE USB port | |
368 | to your VM, you can passthrough a USB device from where your SPICE client is, | |
369 | directly to the VM (for example an input device or hardware dongle). | |
370 | ||
dd042288 EK |
371 | Managing Virtual Machines with 'qm' |
372 | ------------------------------------ | |
f69cfd23 | 373 | |
dd042288 | 374 | qm is the tool to manage Qemu/Kvm virtual machines on {pve}. You can |
f69cfd23 DM |
375 | create and destroy virtual machines, and control execution |
376 | (start/stop/suspend/resume). Besides that, you can use qm to set | |
377 | parameters in the associated config file. It is also possible to | |
378 | create and delete virtual disks. | |
379 | ||
dd042288 EK |
380 | CLI Usage Examples |
381 | ~~~~~~~~~~~~~~~~~~ | |
382 | ||
383 | Create a new VM with 4 GB IDE disk. | |
384 | ||
385 | qm create 300 -ide0 4 -net0 e1000 -cdrom proxmox-mailgateway_2.1.iso | |
386 | ||
387 | Start the new VM | |
388 | ||
389 | qm start 300 | |
390 | ||
391 | Send a shutdown request, then wait until the VM is stopped. | |
392 | ||
393 | qm shutdown 300 && qm wait 300 | |
394 | ||
395 | Same as above, but only wait for 40 seconds. | |
396 | ||
397 | qm shutdown 300 && qm wait 300 -timeout 40 | |
398 | ||
f69cfd23 DM |
399 | Configuration |
400 | ------------- | |
401 | ||
402 | All configuration files consists of lines in the form | |
403 | ||
404 | PARAMETER: value | |
405 | ||
871e1fd6 | 406 | Configuration files are stored inside the Proxmox cluster file |
c4cba5d7 | 407 | system, and can be accessed at '/etc/pve/qemu-server/<VMID>.conf'. |
f69cfd23 | 408 | |
a7f36905 DM |
409 | Options |
410 | ~~~~~~~ | |
411 | ||
412 | include::qm.conf.5-opts.adoc[] | |
413 | ||
f69cfd23 DM |
414 | |
415 | Locks | |
416 | ----- | |
417 | ||
871e1fd6 FG |
418 | Online migrations and backups ('vzdump') set a lock to prevent incompatible |
419 | concurrent actions on the affected VMs. Sometimes you need to remove such a | |
420 | lock manually (e.g., after a power failure). | |
f69cfd23 DM |
421 | |
422 | qm unlock <vmid> | |
423 | ||
f69cfd23 DM |
424 | |
425 | ifdef::manvolnum[] | |
426 | include::pve-copyright.adoc[] | |
427 | endif::manvolnum[] |