]> git.proxmox.com Git - mirror_ubuntu-jammy-kernel.git/blame - Documentation/virtual/kvm/api.txt
KVM: doc: Document the life cycle of a VM and its resources
[mirror_ubuntu-jammy-kernel.git] / Documentation / virtual / kvm / api.txt
CommitLineData
9c1b96e3
AK
1The Definitive KVM (Kernel-based Virtual Machine) API Documentation
2===================================================================
3
41. General description
414fa985 5----------------------
9c1b96e3
AK
6
7The kvm API is a set of ioctls that are issued to control various aspects
5e124900 8of a virtual machine. The ioctls belong to three classes:
9c1b96e3
AK
9
10 - System ioctls: These query and set global attributes which affect the
11 whole kvm subsystem. In addition a system ioctl is used to create
5e124900 12 virtual machines.
9c1b96e3
AK
13
14 - VM ioctls: These query and set attributes that affect an entire virtual
15 machine, for example memory layout. In addition a VM ioctl is used to
ddba9180 16 create virtual cpus (vcpus) and devices.
9c1b96e3 17
5e124900
SC
18 VM ioctls must be issued from the same process (address space) that was
19 used to create the VM.
9c1b96e3
AK
20
21 - vcpu ioctls: These query and set attributes that control the operation
22 of a single virtual cpu.
23
5e124900
SC
24 vcpu ioctls should be issued from the same thread that was used to create
25 the vcpu, except for asynchronous vcpu ioctl that are marked as such in
26 the documentation. Otherwise, the first ioctl after switching threads
27 could see a performance impact.
9c1b96e3 28
ddba9180
SC
29 - device ioctls: These query and set attributes that control the operation
30 of a single device.
31
32 device ioctls must be issued from the same process (address space) that
33 was used to create the VM.
414fa985 34
2044892d 352. File descriptors
414fa985 36-------------------
9c1b96e3
AK
37
38The kvm API is centered around file descriptors. An initial
39open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
40can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this
2044892d 41handle will create a VM file descriptor which can be used to issue VM
ddba9180
SC
42ioctls. A KVM_CREATE_VCPU or KVM_CREATE_DEVICE ioctl on a VM fd will
43create a virtual cpu or device and return a file descriptor pointing to
44the new resource. Finally, ioctls on a vcpu or device fd can be used
45to control the vcpu or device. For vcpus, this includes the important
46task of actually running guest code.
9c1b96e3
AK
47
48In general file descriptors can be migrated among processes by means
49of fork() and the SCM_RIGHTS facility of unix domain socket. These
50kinds of tricks are explicitly not supported by kvm. While they will
51not cause harm to the host, their actual behavior is not guaranteed by
5e124900
SC
52the API. See "General description" for details on the ioctl usage
53model that is supported by KVM.
eca6be56 54
919f6cd8
SC
55It is important to note that althought VM ioctls may only be issued from
56the process that created the VM, a VM's lifecycle is associated with its
57file descriptor, not its creator (process). In other words, the VM and
58its resources, *including the associated address space*, are not freed
59until the last reference to the VM's file descriptor has been released.
60For example, if fork() is issued after ioctl(KVM_CREATE_VM), the VM will
61not be freed until both the parent (original) process and its child have
62put their references to the VM's file descriptor.
63
64Because a VM's resources are not freed until the last reference to its
65file descriptor is released, creating additional references to a VM via
66via fork(), dup(), etc... without careful consideration is strongly
67discouraged and may have unwanted side effects, e.g. memory allocated
68by and on behalf of the VM's process may not be freed/unaccounted when
69the VM is shut down.
70
71
eca6be56
SC
72It is important to note that althought VM ioctls may only be issued from
73the process that created the VM, a VM's lifecycle is associated with its
74file descriptor, not its creator (process). In other words, the VM and
75its resources, *including the associated address space*, are not freed
76until the last reference to the VM's file descriptor has been released.
77For example, if fork() is issued after ioctl(KVM_CREATE_VM), the VM will
78not be freed until both the parent (original) process and its child have
79put their references to the VM's file descriptor.
80
81Because a VM's resources are not freed until the last reference to its
82file descriptor is released, creating additional references to a VM via
83via fork(), dup(), etc... without careful consideration is strongly
84discouraged and may have unwanted side effects, e.g. memory allocated
85by and on behalf of the VM's process may not be freed/unaccounted when
86the VM is shut down.
87
414fa985 88
9c1b96e3 893. Extensions
414fa985 90-------------
9c1b96e3
AK
91
92As of Linux 2.6.22, the KVM ABI has been stabilized: no backward
93incompatible change are allowed. However, there is an extension
94facility that allows backward-compatible extensions to the API to be
95queried and used.
96
c9f3f2d8 97The extension mechanism is not based on the Linux version number.
9c1b96e3
AK
98Instead, kvm defines extension identifiers and a facility to query
99whether a particular extension identifier is available. If it is, a
100set of ioctls is available for application use.
101
414fa985 102
9c1b96e3 1034. API description
414fa985 104------------------
9c1b96e3
AK
105
106This section describes ioctls that can be used to control kvm guests.
107For each ioctl, the following information is provided along with a
108description:
109
110 Capability: which KVM extension provides this ioctl. Can be 'basic',
111 which means that is will be provided by any kernel that supports
7f05db6a 112 API version 12 (see section 4.1), a KVM_CAP_xyz constant, which
9c1b96e3 113 means availability needs to be checked with KVM_CHECK_EXTENSION
7f05db6a
MT
114 (see section 4.4), or 'none' which means that while not all kernels
115 support this ioctl, there's no capability bit to check its
116 availability: for kernels that don't support the ioctl,
117 the ioctl returns -ENOTTY.
9c1b96e3
AK
118
119 Architectures: which instruction set architectures provide this ioctl.
120 x86 includes both i386 and x86_64.
121
122 Type: system, vm, or vcpu.
123
124 Parameters: what parameters are accepted by the ioctl.
125
126 Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
127 are not detailed, but errors with specific meanings are.
128
414fa985 129
9c1b96e3
AK
1304.1 KVM_GET_API_VERSION
131
132Capability: basic
133Architectures: all
134Type: system ioctl
135Parameters: none
136Returns: the constant KVM_API_VERSION (=12)
137
138This identifies the API version as the stable kvm API. It is not
139expected that this number will change. However, Linux 2.6.20 and
1402.6.21 report earlier versions; these are not documented and not
141supported. Applications should refuse to run if KVM_GET_API_VERSION
142returns a value other than 12. If this check passes, all ioctls
143described as 'basic' will be available.
144
414fa985 145
9c1b96e3
AK
1464.2 KVM_CREATE_VM
147
148Capability: basic
149Architectures: all
150Type: system ioctl
e08b9637 151Parameters: machine type identifier (KVM_VM_*)
9c1b96e3
AK
152Returns: a VM fd that can be used to control the new virtual machine.
153
bcb85c88 154The new VM has no virtual cpus and no memory.
a8a3c426 155You probably want to use 0 as machine type.
e08b9637
CO
156
157In order to create user controlled virtual machines on S390, check
158KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
159privileged user (CAP_SYS_ADMIN).
9c1b96e3 160
a8a3c426
JH
161To use hardware assisted virtualization on MIPS (VZ ASE) rather than
162the default trap & emulate implementation (which changes the virtual
163memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the
164flag KVM_VM_MIPS_VZ.
165
414fa985 166
233a7cb2
SP
167On arm64, the physical address size for a VM (IPA Size limit) is limited
168to 40bits by default. The limit can be configured if the host supports the
169extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
170KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type
171identifier, where IPA_Bits is the maximum width of any physical
172address used by the VM. The IPA_Bits is encoded in bits[7-0] of the
173machine type identifier.
174
175e.g, to configure a guest to use 48bit physical address size :
176
177 vm_fd = ioctl(dev_fd, KVM_CREATE_VM, KVM_VM_TYPE_ARM_IPA_SIZE(48));
178
179The requested size (IPA_Bits) must be :
180 0 - Implies default size, 40bits (for backward compatibility)
181
182 or
183
184 N - Implies N bits, where N is a positive integer such that,
185 32 <= N <= Host_IPA_Limit
186
187Host_IPA_Limit is the maximum possible value for IPA_Bits on the host and
188is dependent on the CPU capability and the kernel configuration. The limit can
189be retrieved using KVM_CAP_ARM_VM_IPA_SIZE of the KVM_CHECK_EXTENSION
190ioctl() at run-time.
191
192Please note that configuring the IPA size does not affect the capability
193exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects
194size of the address translated by the stage2 level (guest physical to
195host physical address translations).
196
197
801e459a 1984.3 KVM_GET_MSR_INDEX_LIST, KVM_GET_MSR_FEATURE_INDEX_LIST
9c1b96e3 199
801e459a 200Capability: basic, KVM_CAP_GET_MSR_FEATURES for KVM_GET_MSR_FEATURE_INDEX_LIST
9c1b96e3 201Architectures: x86
801e459a 202Type: system ioctl
9c1b96e3
AK
203Parameters: struct kvm_msr_list (in/out)
204Returns: 0 on success; -1 on error
205Errors:
801e459a 206 EFAULT: the msr index list cannot be read from or written to
9c1b96e3
AK
207 E2BIG: the msr index list is to be to fit in the array specified by
208 the user.
209
210struct kvm_msr_list {
211 __u32 nmsrs; /* number of msrs in entries */
212 __u32 indices[0];
213};
214
801e459a
TL
215The user fills in the size of the indices array in nmsrs, and in return
216kvm adjusts nmsrs to reflect the actual number of msrs and fills in the
217indices array with their numbers.
218
219KVM_GET_MSR_INDEX_LIST returns the guest msrs that are supported. The list
220varies by kvm version and host processor, but does not change otherwise.
9c1b96e3 221
2e2602ca
AK
222Note: if kvm indicates supports MCE (KVM_CAP_MCE), then the MCE bank MSRs are
223not returned in the MSR list, as different vcpus can have a different number
224of banks, as set via the KVM_X86_SETUP_MCE ioctl.
225
801e459a
TL
226KVM_GET_MSR_FEATURE_INDEX_LIST returns the list of MSRs that can be passed
227to the KVM_GET_MSRS system ioctl. This lets userspace probe host capabilities
228and processor features that are exposed via MSRs (e.g., VMX capabilities).
229This list also varies by kvm version and host processor, but does not change
230otherwise.
231
414fa985 232
9c1b96e3
AK
2334.4 KVM_CHECK_EXTENSION
234
92b591a4 235Capability: basic, KVM_CAP_CHECK_EXTENSION_VM for vm ioctl
9c1b96e3 236Architectures: all
92b591a4 237Type: system ioctl, vm ioctl
9c1b96e3
AK
238Parameters: extension identifier (KVM_CAP_*)
239Returns: 0 if unsupported; 1 (or some other positive integer) if supported
240
241The API allows the application to query about extensions to the core
242kvm API. Userspace passes an extension identifier (an integer) and
243receives an integer that describes the extension availability.
244Generally 0 means no and 1 means yes, but some extensions may report
245additional information in the integer return value.
246
92b591a4
AG
247Based on their initialization different VMs may have different capabilities.
248It is thus encouraged to use the vm ioctl to query for capabilities (available
249with KVM_CAP_CHECK_EXTENSION_VM on the vm fd)
414fa985 250
9c1b96e3
AK
2514.5 KVM_GET_VCPU_MMAP_SIZE
252
253Capability: basic
254Architectures: all
255Type: system ioctl
256Parameters: none
257Returns: size of vcpu mmap area, in bytes
258
259The KVM_RUN ioctl (cf.) communicates with userspace via a shared
260memory region. This ioctl returns the size of that region. See the
261KVM_RUN documentation for details.
262
414fa985 263
9c1b96e3
AK
2644.6 KVM_SET_MEMORY_REGION
265
266Capability: basic
267Architectures: all
268Type: vm ioctl
269Parameters: struct kvm_memory_region (in)
270Returns: 0 on success, -1 on error
271
b74a07be 272This ioctl is obsolete and has been removed.
9c1b96e3 273
414fa985 274
68ba6974 2754.7 KVM_CREATE_VCPU
9c1b96e3
AK
276
277Capability: basic
278Architectures: all
279Type: vm ioctl
280Parameters: vcpu id (apic id on x86)
281Returns: vcpu fd on success, -1 on error
282
0b1b1dfd
GK
283This API adds a vcpu to a virtual machine. No more than max_vcpus may be added.
284The vcpu id is an integer in the range [0, max_vcpu_id).
8c3ba334
SL
285
286The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
287the KVM_CHECK_EXTENSION ioctl() at run-time.
288The maximum possible value for max_vcpus can be retrieved using the
289KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
290
76d25402
PE
291If the KVM_CAP_NR_VCPUS does not exist, you should assume that max_vcpus is 4
292cpus max.
8c3ba334
SL
293If the KVM_CAP_MAX_VCPUS does not exist, you should assume that max_vcpus is
294same as the value returned from KVM_CAP_NR_VCPUS.
9c1b96e3 295
0b1b1dfd
GK
296The maximum possible value for max_vcpu_id can be retrieved using the
297KVM_CAP_MAX_VCPU_ID of the KVM_CHECK_EXTENSION ioctl() at run-time.
298
299If the KVM_CAP_MAX_VCPU_ID does not exist, you should assume that max_vcpu_id
300is the same as the value returned from KVM_CAP_MAX_VCPUS.
301
371fefd6
PM
302On powerpc using book3s_hv mode, the vcpus are mapped onto virtual
303threads in one or more virtual CPU cores. (This is because the
304hardware requires all the hardware threads in a CPU core to be in the
305same partition.) The KVM_CAP_PPC_SMT capability indicates the number
36442687
AK
306of vcpus per virtual core (vcore). The vcore id is obtained by
307dividing the vcpu id by the number of vcpus per vcore. The vcpus in a
308given vcore will always be in the same physical core as each other
309(though that might be a different physical core from time to time).
310Userspace can control the threading (SMT) mode of the guest by its
311allocation of vcpu ids. For example, if userspace wants
312single-threaded guest vcpus, it should make all vcpu ids be a multiple
313of the number of vcpus per vcore.
314
5b1c1493
CO
315For virtual cpus that have been created with S390 user controlled virtual
316machines, the resulting vcpu fd can be memory mapped at page offset
317KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual
318cpu's hardware control block.
319
414fa985 320
68ba6974 3214.8 KVM_GET_DIRTY_LOG (vm ioctl)
9c1b96e3
AK
322
323Capability: basic
324Architectures: x86
325Type: vm ioctl
326Parameters: struct kvm_dirty_log (in/out)
327Returns: 0 on success, -1 on error
328
329/* for KVM_GET_DIRTY_LOG */
330struct kvm_dirty_log {
331 __u32 slot;
332 __u32 padding;
333 union {
334 void __user *dirty_bitmap; /* one bit per page */
335 __u64 padding;
336 };
337};
338
339Given a memory slot, return a bitmap containing any pages dirtied
340since the last call to this ioctl. Bit 0 is the first page in the
341memory slot. Ensure the entire structure is cleared to avoid padding
342issues.
343
f481b069
PB
344If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 specifies
345the address space for which you want to return the dirty bitmap.
346They must be less than the value that KVM_CHECK_EXTENSION returns for
347the KVM_CAP_MULTI_ADDRESS_SPACE capability.
348
2a31b9db
PB
349The bits in the dirty bitmap are cleared before the ioctl returns, unless
350KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is enabled. For more information,
351see the description of the capability.
414fa985 352
68ba6974 3534.9 KVM_SET_MEMORY_ALIAS
9c1b96e3
AK
354
355Capability: basic
356Architectures: x86
357Type: vm ioctl
358Parameters: struct kvm_memory_alias (in)
359Returns: 0 (success), -1 (error)
360
a1f4d395 361This ioctl is obsolete and has been removed.
9c1b96e3 362
414fa985 363
68ba6974 3644.10 KVM_RUN
9c1b96e3
AK
365
366Capability: basic
367Architectures: all
368Type: vcpu ioctl
369Parameters: none
370Returns: 0 on success, -1 on error
371Errors:
372 EINTR: an unmasked signal is pending
373
374This ioctl is used to run a guest virtual cpu. While there are no
375explicit parameters, there is an implicit parameter block that can be
376obtained by mmap()ing the vcpu fd at offset 0, with the size given by
377KVM_GET_VCPU_MMAP_SIZE. The parameter block is formatted as a 'struct
378kvm_run' (see below).
379
414fa985 380
68ba6974 3814.11 KVM_GET_REGS
9c1b96e3
AK
382
383Capability: basic
379e04c7 384Architectures: all except ARM, arm64
9c1b96e3
AK
385Type: vcpu ioctl
386Parameters: struct kvm_regs (out)
387Returns: 0 on success, -1 on error
388
389Reads the general purpose registers from the vcpu.
390
391/* x86 */
392struct kvm_regs {
393 /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
394 __u64 rax, rbx, rcx, rdx;
395 __u64 rsi, rdi, rsp, rbp;
396 __u64 r8, r9, r10, r11;
397 __u64 r12, r13, r14, r15;
398 __u64 rip, rflags;
399};
400
c2d2c21b
JH
401/* mips */
402struct kvm_regs {
403 /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
404 __u64 gpr[32];
405 __u64 hi;
406 __u64 lo;
407 __u64 pc;
408};
409
414fa985 410
68ba6974 4114.12 KVM_SET_REGS
9c1b96e3
AK
412
413Capability: basic
379e04c7 414Architectures: all except ARM, arm64
9c1b96e3
AK
415Type: vcpu ioctl
416Parameters: struct kvm_regs (in)
417Returns: 0 on success, -1 on error
418
419Writes the general purpose registers into the vcpu.
420
421See KVM_GET_REGS for the data structure.
422
414fa985 423
68ba6974 4244.13 KVM_GET_SREGS
9c1b96e3
AK
425
426Capability: basic
5ce941ee 427Architectures: x86, ppc
9c1b96e3
AK
428Type: vcpu ioctl
429Parameters: struct kvm_sregs (out)
430Returns: 0 on success, -1 on error
431
432Reads special registers from the vcpu.
433
434/* x86 */
435struct kvm_sregs {
436 struct kvm_segment cs, ds, es, fs, gs, ss;
437 struct kvm_segment tr, ldt;
438 struct kvm_dtable gdt, idt;
439 __u64 cr0, cr2, cr3, cr4, cr8;
440 __u64 efer;
441 __u64 apic_base;
442 __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64];
443};
444
68e2ffed 445/* ppc -- see arch/powerpc/include/uapi/asm/kvm.h */
5ce941ee 446
9c1b96e3
AK
447interrupt_bitmap is a bitmap of pending external interrupts. At most
448one bit may be set. This interrupt has been acknowledged by the APIC
449but not yet injected into the cpu core.
450
414fa985 451
68ba6974 4524.14 KVM_SET_SREGS
9c1b96e3
AK
453
454Capability: basic
5ce941ee 455Architectures: x86, ppc
9c1b96e3
AK
456Type: vcpu ioctl
457Parameters: struct kvm_sregs (in)
458Returns: 0 on success, -1 on error
459
460Writes special registers into the vcpu. See KVM_GET_SREGS for the
461data structures.
462
414fa985 463
68ba6974 4644.15 KVM_TRANSLATE
9c1b96e3
AK
465
466Capability: basic
467Architectures: x86
468Type: vcpu ioctl
469Parameters: struct kvm_translation (in/out)
470Returns: 0 on success, -1 on error
471
472Translates a virtual address according to the vcpu's current address
473translation mode.
474
475struct kvm_translation {
476 /* in */
477 __u64 linear_address;
478
479 /* out */
480 __u64 physical_address;
481 __u8 valid;
482 __u8 writeable;
483 __u8 usermode;
484 __u8 pad[5];
485};
486
414fa985 487
68ba6974 4884.16 KVM_INTERRUPT
9c1b96e3
AK
489
490Capability: basic
c2d2c21b 491Architectures: x86, ppc, mips
9c1b96e3
AK
492Type: vcpu ioctl
493Parameters: struct kvm_interrupt (in)
1c1a9ce9 494Returns: 0 on success, negative on failure.
9c1b96e3 495
1c1a9ce9 496Queues a hardware interrupt vector to be injected.
9c1b96e3
AK
497
498/* for KVM_INTERRUPT */
499struct kvm_interrupt {
500 /* in */
501 __u32 irq;
502};
503
6f7a2bd4
AG
504X86:
505
1c1a9ce9
SR
506Returns: 0 on success,
507 -EEXIST if an interrupt is already enqueued
508 -EINVAL the the irq number is invalid
509 -ENXIO if the PIC is in the kernel
510 -EFAULT if the pointer is invalid
511
512Note 'irq' is an interrupt vector, not an interrupt pin or line. This
513ioctl is useful if the in-kernel PIC is not used.
9c1b96e3 514
6f7a2bd4
AG
515PPC:
516
517Queues an external interrupt to be injected. This ioctl is overleaded
518with 3 different irq values:
519
520a) KVM_INTERRUPT_SET
521
522 This injects an edge type external interrupt into the guest once it's ready
523 to receive interrupts. When injected, the interrupt is done.
524
525b) KVM_INTERRUPT_UNSET
526
527 This unsets any pending interrupt.
528
529 Only available with KVM_CAP_PPC_UNSET_IRQ.
530
531c) KVM_INTERRUPT_SET_LEVEL
532
533 This injects a level type external interrupt into the guest context. The
534 interrupt stays pending until a specific ioctl with KVM_INTERRUPT_UNSET
535 is triggered.
536
537 Only available with KVM_CAP_PPC_IRQ_LEVEL.
538
539Note that any value for 'irq' other than the ones stated above is invalid
540and incurs unexpected behavior.
541
5e124900
SC
542This is an asynchronous vcpu ioctl and can be invoked from any thread.
543
c2d2c21b
JH
544MIPS:
545
546Queues an external interrupt to be injected into the virtual CPU. A negative
547interrupt number dequeues the interrupt.
548
5e124900
SC
549This is an asynchronous vcpu ioctl and can be invoked from any thread.
550
414fa985 551
68ba6974 5524.17 KVM_DEBUG_GUEST
9c1b96e3
AK
553
554Capability: basic
555Architectures: none
556Type: vcpu ioctl
557Parameters: none)
558Returns: -1 on error
559
560Support for this has been removed. Use KVM_SET_GUEST_DEBUG instead.
561
414fa985 562
68ba6974 5634.18 KVM_GET_MSRS
9c1b96e3 564
801e459a 565Capability: basic (vcpu), KVM_CAP_GET_MSR_FEATURES (system)
9c1b96e3 566Architectures: x86
801e459a 567Type: system ioctl, vcpu ioctl
9c1b96e3 568Parameters: struct kvm_msrs (in/out)
801e459a
TL
569Returns: number of msrs successfully returned;
570 -1 on error
571
572When used as a system ioctl:
573Reads the values of MSR-based features that are available for the VM. This
574is similar to KVM_GET_SUPPORTED_CPUID, but it returns MSR indices and values.
575The list of msr-based features can be obtained using KVM_GET_MSR_FEATURE_INDEX_LIST
576in a system ioctl.
9c1b96e3 577
801e459a 578When used as a vcpu ioctl:
9c1b96e3 579Reads model-specific registers from the vcpu. Supported msr indices can
801e459a 580be obtained using KVM_GET_MSR_INDEX_LIST in a system ioctl.
9c1b96e3
AK
581
582struct kvm_msrs {
583 __u32 nmsrs; /* number of msrs in entries */
584 __u32 pad;
585
586 struct kvm_msr_entry entries[0];
587};
588
589struct kvm_msr_entry {
590 __u32 index;
591 __u32 reserved;
592 __u64 data;
593};
594
595Application code should set the 'nmsrs' member (which indicates the
596size of the entries array) and the 'index' member of each array entry.
597kvm will fill in the 'data' member.
598
414fa985 599
68ba6974 6004.19 KVM_SET_MSRS
9c1b96e3
AK
601
602Capability: basic
603Architectures: x86
604Type: vcpu ioctl
605Parameters: struct kvm_msrs (in)
606Returns: 0 on success, -1 on error
607
608Writes model-specific registers to the vcpu. See KVM_GET_MSRS for the
609data structures.
610
611Application code should set the 'nmsrs' member (which indicates the
612size of the entries array), and the 'index' and 'data' members of each
613array entry.
614
414fa985 615
68ba6974 6164.20 KVM_SET_CPUID
9c1b96e3
AK
617
618Capability: basic
619Architectures: x86
620Type: vcpu ioctl
621Parameters: struct kvm_cpuid (in)
622Returns: 0 on success, -1 on error
623
624Defines the vcpu responses to the cpuid instruction. Applications
625should use the KVM_SET_CPUID2 ioctl if available.
626
627
628struct kvm_cpuid_entry {
629 __u32 function;
630 __u32 eax;
631 __u32 ebx;
632 __u32 ecx;
633 __u32 edx;
634 __u32 padding;
635};
636
637/* for KVM_SET_CPUID */
638struct kvm_cpuid {
639 __u32 nent;
640 __u32 padding;
641 struct kvm_cpuid_entry entries[0];
642};
643
414fa985 644
68ba6974 6454.21 KVM_SET_SIGNAL_MASK
9c1b96e3
AK
646
647Capability: basic
572e0929 648Architectures: all
9c1b96e3
AK
649Type: vcpu ioctl
650Parameters: struct kvm_signal_mask (in)
651Returns: 0 on success, -1 on error
652
653Defines which signals are blocked during execution of KVM_RUN. This
654signal mask temporarily overrides the threads signal mask. Any
655unblocked signal received (except SIGKILL and SIGSTOP, which retain
656their traditional behaviour) will cause KVM_RUN to return with -EINTR.
657
658Note the signal will only be delivered if not blocked by the original
659signal mask.
660
661/* for KVM_SET_SIGNAL_MASK */
662struct kvm_signal_mask {
663 __u32 len;
664 __u8 sigset[0];
665};
666
414fa985 667
68ba6974 6684.22 KVM_GET_FPU
9c1b96e3
AK
669
670Capability: basic
671Architectures: x86
672Type: vcpu ioctl
673Parameters: struct kvm_fpu (out)
674Returns: 0 on success, -1 on error
675
676Reads the floating point state from the vcpu.
677
678/* for KVM_GET_FPU and KVM_SET_FPU */
679struct kvm_fpu {
680 __u8 fpr[8][16];
681 __u16 fcw;
682 __u16 fsw;
683 __u8 ftwx; /* in fxsave format */
684 __u8 pad1;
685 __u16 last_opcode;
686 __u64 last_ip;
687 __u64 last_dp;
688 __u8 xmm[16][16];
689 __u32 mxcsr;
690 __u32 pad2;
691};
692
414fa985 693
68ba6974 6944.23 KVM_SET_FPU
9c1b96e3
AK
695
696Capability: basic
697Architectures: x86
698Type: vcpu ioctl
699Parameters: struct kvm_fpu (in)
700Returns: 0 on success, -1 on error
701
702Writes the floating point state to the vcpu.
703
704/* for KVM_GET_FPU and KVM_SET_FPU */
705struct kvm_fpu {
706 __u8 fpr[8][16];
707 __u16 fcw;
708 __u16 fsw;
709 __u8 ftwx; /* in fxsave format */
710 __u8 pad1;
711 __u16 last_opcode;
712 __u64 last_ip;
713 __u64 last_dp;
714 __u8 xmm[16][16];
715 __u32 mxcsr;
716 __u32 pad2;
717};
718
414fa985 719
68ba6974 7204.24 KVM_CREATE_IRQCHIP
5dadbfd6 721
84223598 722Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390)
c32a4272 723Architectures: x86, ARM, arm64, s390
5dadbfd6
AK
724Type: vm ioctl
725Parameters: none
726Returns: 0 on success, -1 on error
727
ac3d3735
AP
728Creates an interrupt controller model in the kernel.
729On x86, creates a virtual ioapic, a virtual PIC (two PICs, nested), and sets up
730future vcpus to have a local APIC. IRQ routing for GSIs 0-15 is set to both
731PIC and IOAPIC; GSI 16-23 only go to the IOAPIC.
732On ARM/arm64, a GICv2 is created. Any other GIC versions require the usage of
733KVM_CREATE_DEVICE, which also supports creating a GICv2. Using
734KVM_CREATE_DEVICE is preferred over KVM_CREATE_IRQCHIP for GICv2.
735On s390, a dummy irq routing table is created.
84223598
CH
736
737Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
738before KVM_CREATE_IRQCHIP can be used.
5dadbfd6 739
414fa985 740
68ba6974 7414.25 KVM_IRQ_LINE
5dadbfd6
AK
742
743Capability: KVM_CAP_IRQCHIP
c32a4272 744Architectures: x86, arm, arm64
5dadbfd6
AK
745Type: vm ioctl
746Parameters: struct kvm_irq_level
747Returns: 0 on success, -1 on error
748
749Sets the level of a GSI input to the interrupt controller model in the kernel.
86ce8535
CD
750On some architectures it is required that an interrupt controller model has
751been previously created with KVM_CREATE_IRQCHIP. Note that edge-triggered
752interrupts require the level to be set to 1 and then back to 0.
753
100943c5
GS
754On real hardware, interrupt pins can be active-low or active-high. This
755does not matter for the level field of struct kvm_irq_level: 1 always
756means active (asserted), 0 means inactive (deasserted).
757
758x86 allows the operating system to program the interrupt polarity
759(active-low/active-high) for level-triggered interrupts, and KVM used
760to consider the polarity. However, due to bitrot in the handling of
761active-low interrupts, the above convention is now valid on x86 too.
762This is signaled by KVM_CAP_X86_IOAPIC_POLARITY_IGNORED. Userspace
763should not present interrupts to the guest as active-low unless this
764capability is present (or unless it is not using the in-kernel irqchip,
765of course).
766
767
379e04c7
MZ
768ARM/arm64 can signal an interrupt either at the CPU level, or at the
769in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to
770use PPIs designated for specific cpus. The irq field is interpreted
771like this:
86ce8535
CD
772
773  bits: | 31 ... 24 | 23 ... 16 | 15 ... 0 |
774 field: | irq_type | vcpu_index | irq_id |
775
776The irq_type field has the following values:
777- irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
778- irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)
779 (the vcpu_index field is ignored)
780- irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)
781
782(The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs)
783
100943c5 784In both cases, level is used to assert/deassert the line.
5dadbfd6
AK
785
786struct kvm_irq_level {
787 union {
788 __u32 irq; /* GSI */
789 __s32 status; /* not used for KVM_IRQ_LEVEL */
790 };
791 __u32 level; /* 0 or 1 */
792};
793
414fa985 794
68ba6974 7954.26 KVM_GET_IRQCHIP
5dadbfd6
AK
796
797Capability: KVM_CAP_IRQCHIP
c32a4272 798Architectures: x86
5dadbfd6
AK
799Type: vm ioctl
800Parameters: struct kvm_irqchip (in/out)
801Returns: 0 on success, -1 on error
802
803Reads the state of a kernel interrupt controller created with
804KVM_CREATE_IRQCHIP into a buffer provided by the caller.
805
806struct kvm_irqchip {
807 __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
808 __u32 pad;
809 union {
810 char dummy[512]; /* reserving space */
811 struct kvm_pic_state pic;
812 struct kvm_ioapic_state ioapic;
813 } chip;
814};
815
414fa985 816
68ba6974 8174.27 KVM_SET_IRQCHIP
5dadbfd6
AK
818
819Capability: KVM_CAP_IRQCHIP
c32a4272 820Architectures: x86
5dadbfd6
AK
821Type: vm ioctl
822Parameters: struct kvm_irqchip (in)
823Returns: 0 on success, -1 on error
824
825Sets the state of a kernel interrupt controller created with
826KVM_CREATE_IRQCHIP from a buffer provided by the caller.
827
828struct kvm_irqchip {
829 __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
830 __u32 pad;
831 union {
832 char dummy[512]; /* reserving space */
833 struct kvm_pic_state pic;
834 struct kvm_ioapic_state ioapic;
835 } chip;
836};
837
414fa985 838
68ba6974 8394.28 KVM_XEN_HVM_CONFIG
ffde22ac
ES
840
841Capability: KVM_CAP_XEN_HVM
842Architectures: x86
843Type: vm ioctl
844Parameters: struct kvm_xen_hvm_config (in)
845Returns: 0 on success, -1 on error
846
847Sets the MSR that the Xen HVM guest uses to initialize its hypercall
848page, and provides the starting address and size of the hypercall
849blobs in userspace. When the guest writes the MSR, kvm copies one
850page of a blob (32- or 64-bit, depending on the vcpu mode) to guest
851memory.
852
853struct kvm_xen_hvm_config {
854 __u32 flags;
855 __u32 msr;
856 __u64 blob_addr_32;
857 __u64 blob_addr_64;
858 __u8 blob_size_32;
859 __u8 blob_size_64;
860 __u8 pad2[30];
861};
862
414fa985 863
68ba6974 8644.29 KVM_GET_CLOCK
afbcf7ab
GC
865
866Capability: KVM_CAP_ADJUST_CLOCK
867Architectures: x86
868Type: vm ioctl
869Parameters: struct kvm_clock_data (out)
870Returns: 0 on success, -1 on error
871
872Gets the current timestamp of kvmclock as seen by the current guest. In
873conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
874such as migration.
875
e3fd9a93
PB
876When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
877set of bits that KVM can return in struct kvm_clock_data's flag member.
878
879The only flag defined now is KVM_CLOCK_TSC_STABLE. If set, the returned
880value is the exact kvmclock value seen by all VCPUs at the instant
881when KVM_GET_CLOCK was called. If clear, the returned value is simply
882CLOCK_MONOTONIC plus a constant offset; the offset can be modified
883with KVM_SET_CLOCK. KVM will try to make all VCPUs follow this clock,
884but the exact value read by each VCPU could differ, because the host
885TSC is not stable.
886
afbcf7ab
GC
887struct kvm_clock_data {
888 __u64 clock; /* kvmclock current value */
889 __u32 flags;
890 __u32 pad[9];
891};
892
414fa985 893
68ba6974 8944.30 KVM_SET_CLOCK
afbcf7ab
GC
895
896Capability: KVM_CAP_ADJUST_CLOCK
897Architectures: x86
898Type: vm ioctl
899Parameters: struct kvm_clock_data (in)
900Returns: 0 on success, -1 on error
901
2044892d 902Sets the current timestamp of kvmclock to the value specified in its parameter.
afbcf7ab
GC
903In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
904such as migration.
905
906struct kvm_clock_data {
907 __u64 clock; /* kvmclock current value */
908 __u32 flags;
909 __u32 pad[9];
910};
911
414fa985 912
68ba6974 9134.31 KVM_GET_VCPU_EVENTS
3cfc3092
JK
914
915Capability: KVM_CAP_VCPU_EVENTS
48005f64 916Extended by: KVM_CAP_INTR_SHADOW
b0960b95 917Architectures: x86, arm, arm64
b7b27fac 918Type: vcpu ioctl
3cfc3092
JK
919Parameters: struct kvm_vcpu_event (out)
920Returns: 0 on success, -1 on error
921
b7b27fac
DG
922X86:
923
3cfc3092
JK
924Gets currently pending exceptions, interrupts, and NMIs as well as related
925states of the vcpu.
926
927struct kvm_vcpu_events {
928 struct {
929 __u8 injected;
930 __u8 nr;
931 __u8 has_error_code;
59073aaf 932 __u8 pending;
3cfc3092
JK
933 __u32 error_code;
934 } exception;
935 struct {
936 __u8 injected;
937 __u8 nr;
938 __u8 soft;
48005f64 939 __u8 shadow;
3cfc3092
JK
940 } interrupt;
941 struct {
942 __u8 injected;
943 __u8 pending;
944 __u8 masked;
945 __u8 pad;
946 } nmi;
947 __u32 sipi_vector;
dab4b911 948 __u32 flags;
f077825a
PB
949 struct {
950 __u8 smm;
951 __u8 pending;
952 __u8 smm_inside_nmi;
953 __u8 latched_init;
954 } smi;
59073aaf
JM
955 __u8 reserved[27];
956 __u8 exception_has_payload;
957 __u64 exception_payload;
3cfc3092
JK
958};
959
59073aaf 960The following bits are defined in the flags field:
f077825a 961
59073aaf 962- KVM_VCPUEVENT_VALID_SHADOW may be set to signal that
f077825a 963 interrupt.shadow contains a valid state.
48005f64 964
59073aaf
JM
965- KVM_VCPUEVENT_VALID_SMM may be set to signal that smi contains a
966 valid state.
967
968- KVM_VCPUEVENT_VALID_PAYLOAD may be set to signal that the
969 exception_has_payload, exception_payload, and exception.pending
970 fields contain a valid state. This bit will be set whenever
971 KVM_CAP_EXCEPTION_PAYLOAD is enabled.
414fa985 972
b0960b95 973ARM/ARM64:
b7b27fac
DG
974
975If the guest accesses a device that is being emulated by the host kernel in
976such a way that a real device would generate a physical SError, KVM may make
977a virtual SError pending for that VCPU. This system error interrupt remains
978pending until the guest takes the exception by unmasking PSTATE.A.
979
980Running the VCPU may cause it to take a pending SError, or make an access that
981causes an SError to become pending. The event's description is only valid while
982the VPCU is not running.
983
984This API provides a way to read and write the pending 'event' state that is not
985visible to the guest. To save, restore or migrate a VCPU the struct representing
986the state can be read then written using this GET/SET API, along with the other
987guest-visible registers. It is not possible to 'cancel' an SError that has been
988made pending.
989
990A device being emulated in user-space may also wish to generate an SError. To do
991this the events structure can be populated by user-space. The current state
992should be read first, to ensure no existing SError is pending. If an existing
993SError is pending, the architecture's 'Multiple SError interrupts' rules should
994be followed. (2.5.3 of DDI0587.a "ARM Reliability, Availability, and
995Serviceability (RAS) Specification").
996
be26b3a7
DG
997SError exceptions always have an ESR value. Some CPUs have the ability to
998specify what the virtual SError's ESR value should be. These systems will
688e0581 999advertise KVM_CAP_ARM_INJECT_SERROR_ESR. In this case exception.has_esr will
be26b3a7
DG
1000always have a non-zero value when read, and the agent making an SError pending
1001should specify the ISS field in the lower 24 bits of exception.serror_esr. If
688e0581 1002the system supports KVM_CAP_ARM_INJECT_SERROR_ESR, but user-space sets the events
be26b3a7
DG
1003with exception.has_esr as zero, KVM will choose an ESR.
1004
1005Specifying exception.has_esr on a system that does not support it will return
1006-EINVAL. Setting anything other than the lower 24bits of exception.serror_esr
1007will return -EINVAL.
1008
b7b27fac
DG
1009struct kvm_vcpu_events {
1010 struct {
1011 __u8 serror_pending;
1012 __u8 serror_has_esr;
1013 /* Align it to 8 bytes */
1014 __u8 pad[6];
1015 __u64 serror_esr;
1016 } exception;
1017 __u32 reserved[12];
1018};
1019
68ba6974 10204.32 KVM_SET_VCPU_EVENTS
3cfc3092
JK
1021
1022Capability: KVM_CAP_VCPU_EVENTS
48005f64 1023Extended by: KVM_CAP_INTR_SHADOW
b0960b95 1024Architectures: x86, arm, arm64
b7b27fac 1025Type: vcpu ioctl
3cfc3092
JK
1026Parameters: struct kvm_vcpu_event (in)
1027Returns: 0 on success, -1 on error
1028
b7b27fac
DG
1029X86:
1030
3cfc3092
JK
1031Set pending exceptions, interrupts, and NMIs as well as related states of the
1032vcpu.
1033
1034See KVM_GET_VCPU_EVENTS for the data structure.
1035
dab4b911 1036Fields that may be modified asynchronously by running VCPUs can be excluded
f077825a
PB
1037from the update. These fields are nmi.pending, sipi_vector, smi.smm,
1038smi.pending. Keep the corresponding bits in the flags field cleared to
1039suppress overwriting the current in-kernel state. The bits are:
dab4b911
JK
1040
1041KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel
1042KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector
f077825a 1043KVM_VCPUEVENT_VALID_SMM - transfer the smi sub-struct.
dab4b911 1044
48005f64
JK
1045If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in
1046the flags field to signal that interrupt.shadow contains a valid state and
1047shall be written into the VCPU.
1048
f077825a
PB
1049KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
1050
59073aaf
JM
1051If KVM_CAP_EXCEPTION_PAYLOAD is enabled, KVM_VCPUEVENT_VALID_PAYLOAD
1052can be set in the flags field to signal that the
1053exception_has_payload, exception_payload, and exception.pending fields
1054contain a valid state and shall be written into the VCPU.
1055
b0960b95 1056ARM/ARM64:
b7b27fac
DG
1057
1058Set the pending SError exception state for this VCPU. It is not possible to
1059'cancel' an Serror that has been made pending.
1060
1061See KVM_GET_VCPU_EVENTS for the data structure.
1062
414fa985 1063
68ba6974 10644.33 KVM_GET_DEBUGREGS
a1efbe77
JK
1065
1066Capability: KVM_CAP_DEBUGREGS
1067Architectures: x86
1068Type: vm ioctl
1069Parameters: struct kvm_debugregs (out)
1070Returns: 0 on success, -1 on error
1071
1072Reads debug registers from the vcpu.
1073
1074struct kvm_debugregs {
1075 __u64 db[4];
1076 __u64 dr6;
1077 __u64 dr7;
1078 __u64 flags;
1079 __u64 reserved[9];
1080};
1081
414fa985 1082
68ba6974 10834.34 KVM_SET_DEBUGREGS
a1efbe77
JK
1084
1085Capability: KVM_CAP_DEBUGREGS
1086Architectures: x86
1087Type: vm ioctl
1088Parameters: struct kvm_debugregs (in)
1089Returns: 0 on success, -1 on error
1090
1091Writes debug registers into the vcpu.
1092
1093See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
1094yet and must be cleared on entry.
1095
414fa985 1096
68ba6974 10974.35 KVM_SET_USER_MEMORY_REGION
0f2d8f4d
AK
1098
1099Capability: KVM_CAP_USER_MEM
1100Architectures: all
1101Type: vm ioctl
1102Parameters: struct kvm_userspace_memory_region (in)
1103Returns: 0 on success, -1 on error
1104
1105struct kvm_userspace_memory_region {
1106 __u32 slot;
1107 __u32 flags;
1108 __u64 guest_phys_addr;
1109 __u64 memory_size; /* bytes */
1110 __u64 userspace_addr; /* start of the userspace allocated memory */
1111};
1112
1113/* for kvm_memory_region::flags */
4d8b81ab
XG
1114#define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0)
1115#define KVM_MEM_READONLY (1UL << 1)
0f2d8f4d
AK
1116
1117This ioctl allows the user to create or modify a guest physical memory
1118slot. When changing an existing slot, it may be moved in the guest
1119physical memory space, or its flags may be modified. It may not be
1120resized. Slots may not overlap in guest physical address space.
a677e704
LC
1121Bits 0-15 of "slot" specifies the slot id and this value should be
1122less than the maximum number of user memory slots supported per VM.
1123The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS,
1124if this capability is supported by the architecture.
0f2d8f4d 1125
f481b069
PB
1126If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot"
1127specifies the address space which is being modified. They must be
1128less than the value that KVM_CHECK_EXTENSION returns for the
1129KVM_CAP_MULTI_ADDRESS_SPACE capability. Slots in separate address spaces
1130are unrelated; the restriction on overlapping slots only applies within
1131each address space.
1132
0f2d8f4d
AK
1133Memory for the region is taken starting at the address denoted by the
1134field userspace_addr, which must point at user addressable memory for
1135the entire memory slot size. Any object may back this memory, including
1136anonymous memory, ordinary files, and hugetlbfs.
1137
1138It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
1139be identical. This allows large pages in the guest to be backed by large
1140pages in the host.
1141
75d61fbc
TY
1142The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
1143KVM_MEM_READONLY. The former can be set to instruct KVM to keep track of
1144writes to memory within the slot. See KVM_GET_DIRTY_LOG ioctl to know how to
1145use it. The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
1146to make a new slot read-only. In this case, writes to this memory will be
1147posted to userspace as KVM_EXIT_MMIO exits.
7efd8fa1
JK
1148
1149When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
1150the memory region are automatically reflected into the guest. For example, an
1151mmap() that affects the region will be made visible immediately. Another
1152example is madvise(MADV_DROP).
0f2d8f4d
AK
1153
1154It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl.
1155The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
1156allocation and is deprecated.
3cfc3092 1157
414fa985 1158
68ba6974 11594.36 KVM_SET_TSS_ADDR
8a5416db
AK
1160
1161Capability: KVM_CAP_SET_TSS_ADDR
1162Architectures: x86
1163Type: vm ioctl
1164Parameters: unsigned long tss_address (in)
1165Returns: 0 on success, -1 on error
1166
1167This ioctl defines the physical address of a three-page region in the guest
1168physical address space. The region must be within the first 4GB of the
1169guest physical address space and must not conflict with any memory slot
1170or any mmio address. The guest may malfunction if it accesses this memory
1171region.
1172
1173This ioctl is required on Intel-based hosts. This is needed on Intel hardware
1174because of a quirk in the virtualization implementation (see the internals
1175documentation when it pops into existence).
1176
414fa985 1177
68ba6974 11784.37 KVM_ENABLE_CAP
71fbfd5f 1179
e5d83c74
PB
1180Capability: KVM_CAP_ENABLE_CAP
1181Architectures: mips, ppc, s390
1182Type: vcpu ioctl
1183Parameters: struct kvm_enable_cap (in)
1184Returns: 0 on success; -1 on error
1185
1186Capability: KVM_CAP_ENABLE_CAP_VM
1187Architectures: all
1188Type: vcpu ioctl
71fbfd5f
AG
1189Parameters: struct kvm_enable_cap (in)
1190Returns: 0 on success; -1 on error
1191
1192+Not all extensions are enabled by default. Using this ioctl the application
1193can enable an extension, making it available to the guest.
1194
1195On systems that do not support this ioctl, it always fails. On systems that
1196do support it, it only works for extensions that are supported for enablement.
1197
1198To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should
1199be used.
1200
1201struct kvm_enable_cap {
1202 /* in */
1203 __u32 cap;
1204
1205The capability that is supposed to get enabled.
1206
1207 __u32 flags;
1208
1209A bitfield indicating future enhancements. Has to be 0 for now.
1210
1211 __u64 args[4];
1212
1213Arguments for enabling a feature. If a feature needs initial values to
1214function properly, this is the place to put them.
1215
1216 __u8 pad[64];
1217};
1218
d938dc55
CH
1219The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
1220for vm-wide capabilities.
414fa985 1221
68ba6974 12224.38 KVM_GET_MP_STATE
b843f065
AK
1223
1224Capability: KVM_CAP_MP_STATE
ecccf0cc 1225Architectures: x86, s390, arm, arm64
b843f065
AK
1226Type: vcpu ioctl
1227Parameters: struct kvm_mp_state (out)
1228Returns: 0 on success; -1 on error
1229
1230struct kvm_mp_state {
1231 __u32 mp_state;
1232};
1233
1234Returns the vcpu's current "multiprocessing state" (though also valid on
1235uniprocessor guests).
1236
1237Possible values are:
1238
ecccf0cc 1239 - KVM_MP_STATE_RUNNABLE: the vcpu is currently running [x86,arm/arm64]
b843f065 1240 - KVM_MP_STATE_UNINITIALIZED: the vcpu is an application processor (AP)
c32a4272 1241 which has not yet received an INIT signal [x86]
b843f065 1242 - KVM_MP_STATE_INIT_RECEIVED: the vcpu has received an INIT signal, and is
c32a4272 1243 now ready for a SIPI [x86]
b843f065 1244 - KVM_MP_STATE_HALTED: the vcpu has executed a HLT instruction and
c32a4272 1245 is waiting for an interrupt [x86]
b843f065 1246 - KVM_MP_STATE_SIPI_RECEIVED: the vcpu has just received a SIPI (vector
c32a4272 1247 accessible via KVM_GET_VCPU_EVENTS) [x86]
ecccf0cc 1248 - KVM_MP_STATE_STOPPED: the vcpu is stopped [s390,arm/arm64]
6352e4d2
DH
1249 - KVM_MP_STATE_CHECK_STOP: the vcpu is in a special error state [s390]
1250 - KVM_MP_STATE_OPERATING: the vcpu is operating (running or halted)
1251 [s390]
1252 - KVM_MP_STATE_LOAD: the vcpu is in a special load/startup state
1253 [s390]
b843f065 1254
c32a4272 1255On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
0b4820d6
DH
1256in-kernel irqchip, the multiprocessing state must be maintained by userspace on
1257these architectures.
b843f065 1258
ecccf0cc
AB
1259For arm/arm64:
1260
1261The only states that are valid are KVM_MP_STATE_STOPPED and
1262KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
414fa985 1263
68ba6974 12644.39 KVM_SET_MP_STATE
b843f065
AK
1265
1266Capability: KVM_CAP_MP_STATE
ecccf0cc 1267Architectures: x86, s390, arm, arm64
b843f065
AK
1268Type: vcpu ioctl
1269Parameters: struct kvm_mp_state (in)
1270Returns: 0 on success; -1 on error
1271
1272Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for
1273arguments.
1274
c32a4272 1275On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
0b4820d6
DH
1276in-kernel irqchip, the multiprocessing state must be maintained by userspace on
1277these architectures.
b843f065 1278
ecccf0cc
AB
1279For arm/arm64:
1280
1281The only states that are valid are KVM_MP_STATE_STOPPED and
1282KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
414fa985 1283
68ba6974 12844.40 KVM_SET_IDENTITY_MAP_ADDR
47dbb84f
AK
1285
1286Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
1287Architectures: x86
1288Type: vm ioctl
1289Parameters: unsigned long identity (in)
1290Returns: 0 on success, -1 on error
1291
1292This ioctl defines the physical address of a one-page region in the guest
1293physical address space. The region must be within the first 4GB of the
1294guest physical address space and must not conflict with any memory slot
1295or any mmio address. The guest may malfunction if it accesses this memory
1296region.
1297
726b99c4
DH
1298Setting the address to 0 will result in resetting the address to its default
1299(0xfffbc000).
1300
47dbb84f
AK
1301This ioctl is required on Intel-based hosts. This is needed on Intel hardware
1302because of a quirk in the virtualization implementation (see the internals
1303documentation when it pops into existence).
1304
1af1ac91 1305Fails if any VCPU has already been created.
414fa985 1306
68ba6974 13074.41 KVM_SET_BOOT_CPU_ID
57bc24cf
AK
1308
1309Capability: KVM_CAP_SET_BOOT_CPU_ID
c32a4272 1310Architectures: x86
57bc24cf
AK
1311Type: vm ioctl
1312Parameters: unsigned long vcpu_id
1313Returns: 0 on success, -1 on error
1314
1315Define which vcpu is the Bootstrap Processor (BSP). Values are the same
1316as the vcpu id in KVM_CREATE_VCPU. If this ioctl is not called, the default
1317is vcpu 0.
1318
414fa985 1319
68ba6974 13204.42 KVM_GET_XSAVE
2d5b5a66
SY
1321
1322Capability: KVM_CAP_XSAVE
1323Architectures: x86
1324Type: vcpu ioctl
1325Parameters: struct kvm_xsave (out)
1326Returns: 0 on success, -1 on error
1327
1328struct kvm_xsave {
1329 __u32 region[1024];
1330};
1331
1332This ioctl would copy current vcpu's xsave struct to the userspace.
1333
414fa985 1334
68ba6974 13354.43 KVM_SET_XSAVE
2d5b5a66
SY
1336
1337Capability: KVM_CAP_XSAVE
1338Architectures: x86
1339Type: vcpu ioctl
1340Parameters: struct kvm_xsave (in)
1341Returns: 0 on success, -1 on error
1342
1343struct kvm_xsave {
1344 __u32 region[1024];
1345};
1346
1347This ioctl would copy userspace's xsave struct to the kernel.
1348
414fa985 1349
68ba6974 13504.44 KVM_GET_XCRS
2d5b5a66
SY
1351
1352Capability: KVM_CAP_XCRS
1353Architectures: x86
1354Type: vcpu ioctl
1355Parameters: struct kvm_xcrs (out)
1356Returns: 0 on success, -1 on error
1357
1358struct kvm_xcr {
1359 __u32 xcr;
1360 __u32 reserved;
1361 __u64 value;
1362};
1363
1364struct kvm_xcrs {
1365 __u32 nr_xcrs;
1366 __u32 flags;
1367 struct kvm_xcr xcrs[KVM_MAX_XCRS];
1368 __u64 padding[16];
1369};
1370
1371This ioctl would copy current vcpu's xcrs to the userspace.
1372
414fa985 1373
68ba6974 13744.45 KVM_SET_XCRS
2d5b5a66
SY
1375
1376Capability: KVM_CAP_XCRS
1377Architectures: x86
1378Type: vcpu ioctl
1379Parameters: struct kvm_xcrs (in)
1380Returns: 0 on success, -1 on error
1381
1382struct kvm_xcr {
1383 __u32 xcr;
1384 __u32 reserved;
1385 __u64 value;
1386};
1387
1388struct kvm_xcrs {
1389 __u32 nr_xcrs;
1390 __u32 flags;
1391 struct kvm_xcr xcrs[KVM_MAX_XCRS];
1392 __u64 padding[16];
1393};
1394
1395This ioctl would set vcpu's xcr to the value userspace specified.
1396
414fa985 1397
68ba6974 13984.46 KVM_GET_SUPPORTED_CPUID
d153513d
AK
1399
1400Capability: KVM_CAP_EXT_CPUID
1401Architectures: x86
1402Type: system ioctl
1403Parameters: struct kvm_cpuid2 (in/out)
1404Returns: 0 on success, -1 on error
1405
1406struct kvm_cpuid2 {
1407 __u32 nent;
1408 __u32 padding;
1409 struct kvm_cpuid_entry2 entries[0];
1410};
1411
9c15bb1d
BP
1412#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
1413#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
1414#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
d153513d
AK
1415
1416struct kvm_cpuid_entry2 {
1417 __u32 function;
1418 __u32 index;
1419 __u32 flags;
1420 __u32 eax;
1421 __u32 ebx;
1422 __u32 ecx;
1423 __u32 edx;
1424 __u32 padding[3];
1425};
1426
df9cb9cc
JM
1427This ioctl returns x86 cpuid features which are supported by both the
1428hardware and kvm in its default configuration. Userspace can use the
1429information returned by this ioctl to construct cpuid information (for
1430KVM_SET_CPUID2) that is consistent with hardware, kernel, and
1431userspace capabilities, and with user requirements (for example, the
1432user may wish to constrain cpuid to emulate older hardware, or for
1433feature consistency across a cluster).
1434
1435Note that certain capabilities, such as KVM_CAP_X86_DISABLE_EXITS, may
1436expose cpuid features (e.g. MONITOR) which are not supported by kvm in
1437its default configuration. If userspace enables such capabilities, it
1438is responsible for modifying the results of this ioctl appropriately.
d153513d
AK
1439
1440Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure
1441with the 'nent' field indicating the number of entries in the variable-size
1442array 'entries'. If the number of entries is too low to describe the cpu
1443capabilities, an error (E2BIG) is returned. If the number is too high,
1444the 'nent' field is adjusted and an error (ENOMEM) is returned. If the
1445number is just right, the 'nent' field is adjusted to the number of valid
1446entries in the 'entries' array, which is then filled.
1447
1448The entries returned are the host cpuid as returned by the cpuid instruction,
c39cbd2a
AK
1449with unknown or unsupported features masked out. Some features (for example,
1450x2apic), may not be present in the host cpu, but are exposed by kvm if it can
1451emulate them efficiently. The fields in each entry are defined as follows:
d153513d
AK
1452
1453 function: the eax value used to obtain the entry
1454 index: the ecx value used to obtain the entry (for entries that are
1455 affected by ecx)
1456 flags: an OR of zero or more of the following:
1457 KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
1458 if the index field is valid
1459 KVM_CPUID_FLAG_STATEFUL_FUNC:
1460 if cpuid for this function returns different values for successive
1461 invocations; there will be several entries with the same function,
1462 all with this flag set
1463 KVM_CPUID_FLAG_STATE_READ_NEXT:
1464 for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
1465 the first entry to be read by a cpu
1466 eax, ebx, ecx, edx: the values returned by the cpuid instruction for
1467 this function/index combination
1468
4d25a066
JK
1469The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned
1470as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC
1471support. Instead it is reported via
1472
1473 ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER)
1474
1475if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
1476feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
1477
414fa985 1478
68ba6974 14794.47 KVM_PPC_GET_PVINFO
15711e9c
AG
1480
1481Capability: KVM_CAP_PPC_GET_PVINFO
1482Architectures: ppc
1483Type: vm ioctl
1484Parameters: struct kvm_ppc_pvinfo (out)
1485Returns: 0 on success, !0 on error
1486
1487struct kvm_ppc_pvinfo {
1488 __u32 flags;
1489 __u32 hcall[4];
1490 __u8 pad[108];
1491};
1492
1493This ioctl fetches PV specific information that need to be passed to the guest
1494using the device tree or other means from vm context.
1495
9202e076 1496The hcall array defines 4 instructions that make up a hypercall.
15711e9c
AG
1497
1498If any additional field gets added to this structure later on, a bit for that
1499additional piece of information will be set in the flags bitmap.
1500
9202e076
LYB
1501The flags bitmap is defined as:
1502
1503 /* the host supports the ePAPR idle hcall
1504 #define KVM_PPC_PVINFO_FLAGS_EV_IDLE (1<<0)
414fa985 1505
68ba6974 15064.52 KVM_SET_GSI_ROUTING
49f48172
JK
1507
1508Capability: KVM_CAP_IRQ_ROUTING
180ae7b1 1509Architectures: x86 s390 arm arm64
49f48172
JK
1510Type: vm ioctl
1511Parameters: struct kvm_irq_routing (in)
1512Returns: 0 on success, -1 on error
1513
1514Sets the GSI routing table entries, overwriting any previously set entries.
1515
180ae7b1
EA
1516On arm/arm64, GSI routing has the following limitation:
1517- GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD.
1518
49f48172
JK
1519struct kvm_irq_routing {
1520 __u32 nr;
1521 __u32 flags;
1522 struct kvm_irq_routing_entry entries[0];
1523};
1524
1525No flags are specified so far, the corresponding field must be set to zero.
1526
1527struct kvm_irq_routing_entry {
1528 __u32 gsi;
1529 __u32 type;
1530 __u32 flags;
1531 __u32 pad;
1532 union {
1533 struct kvm_irq_routing_irqchip irqchip;
1534 struct kvm_irq_routing_msi msi;
84223598 1535 struct kvm_irq_routing_s390_adapter adapter;
5c919412 1536 struct kvm_irq_routing_hv_sint hv_sint;
49f48172
JK
1537 __u32 pad[8];
1538 } u;
1539};
1540
1541/* gsi routing entry types */
1542#define KVM_IRQ_ROUTING_IRQCHIP 1
1543#define KVM_IRQ_ROUTING_MSI 2
84223598 1544#define KVM_IRQ_ROUTING_S390_ADAPTER 3
5c919412 1545#define KVM_IRQ_ROUTING_HV_SINT 4
49f48172 1546
76a10b86 1547flags:
6f49b2f3
PB
1548- KVM_MSI_VALID_DEVID: used along with KVM_IRQ_ROUTING_MSI routing entry
1549 type, specifies that the devid field contains a valid value. The per-VM
1550 KVM_CAP_MSI_DEVID capability advertises the requirement to provide
1551 the device ID. If this capability is not available, userspace should
1552 never set the KVM_MSI_VALID_DEVID flag as the ioctl might fail.
76a10b86 1553- zero otherwise
49f48172
JK
1554
1555struct kvm_irq_routing_irqchip {
1556 __u32 irqchip;
1557 __u32 pin;
1558};
1559
1560struct kvm_irq_routing_msi {
1561 __u32 address_lo;
1562 __u32 address_hi;
1563 __u32 data;
76a10b86
EA
1564 union {
1565 __u32 pad;
1566 __u32 devid;
1567 };
49f48172
JK
1568};
1569
6f49b2f3
PB
1570If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier
1571for the device that wrote the MSI message. For PCI, this is usually a
1572BFD identifier in the lower 16 bits.
76a10b86 1573
37131313
RK
1574On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS
1575feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled,
1576address_hi bits 31-8 provide bits 31-8 of the destination id. Bits 7-0 of
1577address_hi must be zero.
1578
84223598
CH
1579struct kvm_irq_routing_s390_adapter {
1580 __u64 ind_addr;
1581 __u64 summary_addr;
1582 __u64 ind_offset;
1583 __u32 summary_offset;
1584 __u32 adapter_id;
1585};
1586
5c919412
AS
1587struct kvm_irq_routing_hv_sint {
1588 __u32 vcpu;
1589 __u32 sint;
1590};
414fa985 1591
414fa985
JK
1592
15934.55 KVM_SET_TSC_KHZ
92a1f12d
JR
1594
1595Capability: KVM_CAP_TSC_CONTROL
1596Architectures: x86
1597Type: vcpu ioctl
1598Parameters: virtual tsc_khz
1599Returns: 0 on success, -1 on error
1600
1601Specifies the tsc frequency for the virtual machine. The unit of the
1602frequency is KHz.
1603
414fa985
JK
1604
16054.56 KVM_GET_TSC_KHZ
92a1f12d
JR
1606
1607Capability: KVM_CAP_GET_TSC_KHZ
1608Architectures: x86
1609Type: vcpu ioctl
1610Parameters: none
1611Returns: virtual tsc-khz on success, negative value on error
1612
1613Returns the tsc frequency of the guest. The unit of the return value is
1614KHz. If the host has unstable tsc this ioctl returns -EIO instead as an
1615error.
1616
414fa985
JK
1617
16184.57 KVM_GET_LAPIC
e7677933
AK
1619
1620Capability: KVM_CAP_IRQCHIP
1621Architectures: x86
1622Type: vcpu ioctl
1623Parameters: struct kvm_lapic_state (out)
1624Returns: 0 on success, -1 on error
1625
1626#define KVM_APIC_REG_SIZE 0x400
1627struct kvm_lapic_state {
1628 char regs[KVM_APIC_REG_SIZE];
1629};
1630
1631Reads the Local APIC registers and copies them into the input argument. The
1632data format and layout are the same as documented in the architecture manual.
1633
37131313
RK
1634If KVM_X2APIC_API_USE_32BIT_IDS feature of KVM_CAP_X2APIC_API is
1635enabled, then the format of APIC_ID register depends on the APIC mode
1636(reported by MSR_IA32_APICBASE) of its VCPU. x2APIC stores APIC ID in
1637the APIC_ID register (bytes 32-35). xAPIC only allows an 8-bit APIC ID
1638which is stored in bits 31-24 of the APIC register, or equivalently in
1639byte 35 of struct kvm_lapic_state's regs field. KVM_GET_LAPIC must then
1640be called after MSR_IA32_APICBASE has been set with KVM_SET_MSR.
1641
1642If KVM_X2APIC_API_USE_32BIT_IDS feature is disabled, struct kvm_lapic_state
1643always uses xAPIC format.
1644
414fa985
JK
1645
16464.58 KVM_SET_LAPIC
e7677933
AK
1647
1648Capability: KVM_CAP_IRQCHIP
1649Architectures: x86
1650Type: vcpu ioctl
1651Parameters: struct kvm_lapic_state (in)
1652Returns: 0 on success, -1 on error
1653
1654#define KVM_APIC_REG_SIZE 0x400
1655struct kvm_lapic_state {
1656 char regs[KVM_APIC_REG_SIZE];
1657};
1658
df5cbb27 1659Copies the input argument into the Local APIC registers. The data format
e7677933
AK
1660and layout are the same as documented in the architecture manual.
1661
37131313
RK
1662The format of the APIC ID register (bytes 32-35 of struct kvm_lapic_state's
1663regs field) depends on the state of the KVM_CAP_X2APIC_API capability.
1664See the note in KVM_GET_LAPIC.
1665
414fa985
JK
1666
16674.59 KVM_IOEVENTFD
55399a02
SL
1668
1669Capability: KVM_CAP_IOEVENTFD
1670Architectures: all
1671Type: vm ioctl
1672Parameters: struct kvm_ioeventfd (in)
1673Returns: 0 on success, !0 on error
1674
1675This ioctl attaches or detaches an ioeventfd to a legal pio/mmio address
1676within the guest. A guest write in the registered address will signal the
1677provided event instead of triggering an exit.
1678
1679struct kvm_ioeventfd {
1680 __u64 datamatch;
1681 __u64 addr; /* legal pio/mmio address */
e9ea5069 1682 __u32 len; /* 0, 1, 2, 4, or 8 bytes */
55399a02
SL
1683 __s32 fd;
1684 __u32 flags;
1685 __u8 pad[36];
1686};
1687
2b83451b
CH
1688For the special case of virtio-ccw devices on s390, the ioevent is matched
1689to a subchannel/virtqueue tuple instead.
1690
55399a02
SL
1691The following flags are defined:
1692
1693#define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch)
1694#define KVM_IOEVENTFD_FLAG_PIO (1 << kvm_ioeventfd_flag_nr_pio)
1695#define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign)
2b83451b
CH
1696#define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
1697 (1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)
55399a02
SL
1698
1699If datamatch flag is set, the event will be signaled only if the written value
1700to the registered address is equal to datamatch in struct kvm_ioeventfd.
1701
2b83451b
CH
1702For virtio-ccw devices, addr contains the subchannel id and datamatch the
1703virtqueue index.
1704
e9ea5069
JW
1705With KVM_CAP_IOEVENTFD_ANY_LENGTH, a zero length ioeventfd is allowed, and
1706the kernel will ignore the length of guest write and may get a faster vmexit.
1707The speedup may only apply to specific architectures, but the ioeventfd will
1708work anyway.
414fa985
JK
1709
17104.60 KVM_DIRTY_TLB
dc83b8bc
SW
1711
1712Capability: KVM_CAP_SW_TLB
1713Architectures: ppc
1714Type: vcpu ioctl
1715Parameters: struct kvm_dirty_tlb (in)
1716Returns: 0 on success, -1 on error
1717
1718struct kvm_dirty_tlb {
1719 __u64 bitmap;
1720 __u32 num_dirty;
1721};
1722
1723This must be called whenever userspace has changed an entry in the shared
1724TLB, prior to calling KVM_RUN on the associated vcpu.
1725
1726The "bitmap" field is the userspace address of an array. This array
1727consists of a number of bits, equal to the total number of TLB entries as
1728determined by the last successful call to KVM_CONFIG_TLB, rounded up to the
1729nearest multiple of 64.
1730
1731Each bit corresponds to one TLB entry, ordered the same as in the shared TLB
1732array.
1733
1734The array is little-endian: the bit 0 is the least significant bit of the
1735first byte, bit 8 is the least significant bit of the second byte, etc.
1736This avoids any complications with differing word sizes.
1737
1738The "num_dirty" field is a performance hint for KVM to determine whether it
1739should skip processing the bitmap and just invalidate everything. It must
1740be set to the number of set bits in the bitmap.
1741
414fa985 1742
54738c09
DG
17434.62 KVM_CREATE_SPAPR_TCE
1744
1745Capability: KVM_CAP_SPAPR_TCE
1746Architectures: powerpc
1747Type: vm ioctl
1748Parameters: struct kvm_create_spapr_tce (in)
1749Returns: file descriptor for manipulating the created TCE table
1750
1751This creates a virtual TCE (translation control entry) table, which
1752is an IOMMU for PAPR-style virtual I/O. It is used to translate
1753logical addresses used in virtual I/O into guest physical addresses,
1754and provides a scatter/gather capability for PAPR virtual I/O.
1755
1756/* for KVM_CAP_SPAPR_TCE */
1757struct kvm_create_spapr_tce {
1758 __u64 liobn;
1759 __u32 window_size;
1760};
1761
1762The liobn field gives the logical IO bus number for which to create a
1763TCE table. The window_size field specifies the size of the DMA window
1764which this TCE table will translate - the table will contain one 64
1765bit TCE entry for every 4kiB of the DMA window.
1766
1767When the guest issues an H_PUT_TCE hcall on a liobn for which a TCE
1768table has been created using this ioctl(), the kernel will handle it
1769in real mode, updating the TCE table. H_PUT_TCE calls for other
1770liobns will cause a vm exit and must be handled by userspace.
1771
1772The return value is a file descriptor which can be passed to mmap(2)
1773to map the created TCE table into userspace. This lets userspace read
1774the entries written by kernel-handled H_PUT_TCE calls, and also lets
1775userspace update the TCE table directly which is useful in some
1776circumstances.
1777
414fa985 1778
aa04b4cc
PM
17794.63 KVM_ALLOCATE_RMA
1780
1781Capability: KVM_CAP_PPC_RMA
1782Architectures: powerpc
1783Type: vm ioctl
1784Parameters: struct kvm_allocate_rma (out)
1785Returns: file descriptor for mapping the allocated RMA
1786
1787This allocates a Real Mode Area (RMA) from the pool allocated at boot
1788time by the kernel. An RMA is a physically-contiguous, aligned region
1789of memory used on older POWER processors to provide the memory which
1790will be accessed by real-mode (MMU off) accesses in a KVM guest.
1791POWER processors support a set of sizes for the RMA that usually
1792includes 64MB, 128MB, 256MB and some larger powers of two.
1793
1794/* for KVM_ALLOCATE_RMA */
1795struct kvm_allocate_rma {
1796 __u64 rma_size;
1797};
1798
1799The return value is a file descriptor which can be passed to mmap(2)
1800to map the allocated RMA into userspace. The mapped area can then be
1801passed to the KVM_SET_USER_MEMORY_REGION ioctl to establish it as the
1802RMA for a virtual machine. The size of the RMA in bytes (which is
1803fixed at host kernel boot time) is returned in the rma_size field of
1804the argument structure.
1805
1806The KVM_CAP_PPC_RMA capability is 1 or 2 if the KVM_ALLOCATE_RMA ioctl
1807is supported; 2 if the processor requires all virtual machines to have
1808an RMA, or 1 if the processor can use an RMA but doesn't require it,
1809because it supports the Virtual RMA (VRMA) facility.
1810
414fa985 1811
3f745f1e
AK
18124.64 KVM_NMI
1813
1814Capability: KVM_CAP_USER_NMI
1815Architectures: x86
1816Type: vcpu ioctl
1817Parameters: none
1818Returns: 0 on success, -1 on error
1819
1820Queues an NMI on the thread's vcpu. Note this is well defined only
1821when KVM_CREATE_IRQCHIP has not been called, since this is an interface
1822between the virtual cpu core and virtual local APIC. After KVM_CREATE_IRQCHIP
1823has been called, this interface is completely emulated within the kernel.
1824
1825To use this to emulate the LINT1 input with KVM_CREATE_IRQCHIP, use the
1826following algorithm:
1827
5d4f6f3d 1828 - pause the vcpu
3f745f1e
AK
1829 - read the local APIC's state (KVM_GET_LAPIC)
1830 - check whether changing LINT1 will queue an NMI (see the LVT entry for LINT1)
1831 - if so, issue KVM_NMI
1832 - resume the vcpu
1833
1834Some guests configure the LINT1 NMI input to cause a panic, aiding in
1835debugging.
1836
414fa985 1837
e24ed81f 18384.65 KVM_S390_UCAS_MAP
27e0393f
CO
1839
1840Capability: KVM_CAP_S390_UCONTROL
1841Architectures: s390
1842Type: vcpu ioctl
1843Parameters: struct kvm_s390_ucas_mapping (in)
1844Returns: 0 in case of success
1845
1846The parameter is defined like this:
1847 struct kvm_s390_ucas_mapping {
1848 __u64 user_addr;
1849 __u64 vcpu_addr;
1850 __u64 length;
1851 };
1852
1853This ioctl maps the memory at "user_addr" with the length "length" to
1854the vcpu's address space starting at "vcpu_addr". All parameters need to
f884ab15 1855be aligned by 1 megabyte.
27e0393f 1856
414fa985 1857
e24ed81f 18584.66 KVM_S390_UCAS_UNMAP
27e0393f
CO
1859
1860Capability: KVM_CAP_S390_UCONTROL
1861Architectures: s390
1862Type: vcpu ioctl
1863Parameters: struct kvm_s390_ucas_mapping (in)
1864Returns: 0 in case of success
1865
1866The parameter is defined like this:
1867 struct kvm_s390_ucas_mapping {
1868 __u64 user_addr;
1869 __u64 vcpu_addr;
1870 __u64 length;
1871 };
1872
1873This ioctl unmaps the memory in the vcpu's address space starting at
1874"vcpu_addr" with the length "length". The field "user_addr" is ignored.
f884ab15 1875All parameters need to be aligned by 1 megabyte.
27e0393f 1876
414fa985 1877
e24ed81f 18784.67 KVM_S390_VCPU_FAULT
ccc7910f
CO
1879
1880Capability: KVM_CAP_S390_UCONTROL
1881Architectures: s390
1882Type: vcpu ioctl
1883Parameters: vcpu absolute address (in)
1884Returns: 0 in case of success
1885
1886This call creates a page table entry on the virtual cpu's address space
1887(for user controlled virtual machines) or the virtual machine's address
1888space (for regular virtual machines). This only works for minor faults,
1889thus it's recommended to access subject memory page via the user page
1890table upfront. This is useful to handle validity intercepts for user
1891controlled virtual machines to fault in the virtual cpu's lowcore pages
1892prior to calling the KVM_RUN ioctl.
1893
414fa985 1894
e24ed81f
AG
18954.68 KVM_SET_ONE_REG
1896
1897Capability: KVM_CAP_ONE_REG
1898Architectures: all
1899Type: vcpu ioctl
1900Parameters: struct kvm_one_reg (in)
1901Returns: 0 on success, negative value on failure
1902
1903struct kvm_one_reg {
1904 __u64 id;
1905 __u64 addr;
1906};
1907
1908Using this ioctl, a single vcpu register can be set to a specific value
1909defined by user space with the passed in struct kvm_one_reg, where id
1910refers to the register identifier as described below and addr is a pointer
1911to a variable with the respective size. There can be architecture agnostic
1912and architecture specific registers. Each have their own range of operation
1913and their own constants and width. To keep track of the implemented
1914registers, find a list below:
1915
bf5590f3
JH
1916 Arch | Register | Width (bits)
1917 | |
1918 PPC | KVM_REG_PPC_HIOR | 64
1919 PPC | KVM_REG_PPC_IAC1 | 64
1920 PPC | KVM_REG_PPC_IAC2 | 64
1921 PPC | KVM_REG_PPC_IAC3 | 64
1922 PPC | KVM_REG_PPC_IAC4 | 64
1923 PPC | KVM_REG_PPC_DAC1 | 64
1924 PPC | KVM_REG_PPC_DAC2 | 64
1925 PPC | KVM_REG_PPC_DABR | 64
1926 PPC | KVM_REG_PPC_DSCR | 64
1927 PPC | KVM_REG_PPC_PURR | 64
1928 PPC | KVM_REG_PPC_SPURR | 64
1929 PPC | KVM_REG_PPC_DAR | 64
1930 PPC | KVM_REG_PPC_DSISR | 32
1931 PPC | KVM_REG_PPC_AMR | 64
1932 PPC | KVM_REG_PPC_UAMOR | 64
1933 PPC | KVM_REG_PPC_MMCR0 | 64
1934 PPC | KVM_REG_PPC_MMCR1 | 64
1935 PPC | KVM_REG_PPC_MMCRA | 64
1936 PPC | KVM_REG_PPC_MMCR2 | 64
1937 PPC | KVM_REG_PPC_MMCRS | 64
1938 PPC | KVM_REG_PPC_SIAR | 64
1939 PPC | KVM_REG_PPC_SDAR | 64
1940 PPC | KVM_REG_PPC_SIER | 64
1941 PPC | KVM_REG_PPC_PMC1 | 32
1942 PPC | KVM_REG_PPC_PMC2 | 32
1943 PPC | KVM_REG_PPC_PMC3 | 32
1944 PPC | KVM_REG_PPC_PMC4 | 32
1945 PPC | KVM_REG_PPC_PMC5 | 32
1946 PPC | KVM_REG_PPC_PMC6 | 32
1947 PPC | KVM_REG_PPC_PMC7 | 32
1948 PPC | KVM_REG_PPC_PMC8 | 32
1949 PPC | KVM_REG_PPC_FPR0 | 64
a8bd19ef 1950 ...
bf5590f3
JH
1951 PPC | KVM_REG_PPC_FPR31 | 64
1952 PPC | KVM_REG_PPC_VR0 | 128
a8bd19ef 1953 ...
bf5590f3
JH
1954 PPC | KVM_REG_PPC_VR31 | 128
1955 PPC | KVM_REG_PPC_VSR0 | 128
a8bd19ef 1956 ...
bf5590f3
JH
1957 PPC | KVM_REG_PPC_VSR31 | 128
1958 PPC | KVM_REG_PPC_FPSCR | 64
1959 PPC | KVM_REG_PPC_VSCR | 32
1960 PPC | KVM_REG_PPC_VPA_ADDR | 64
1961 PPC | KVM_REG_PPC_VPA_SLB | 128
1962 PPC | KVM_REG_PPC_VPA_DTL | 128
1963 PPC | KVM_REG_PPC_EPCR | 32
1964 PPC | KVM_REG_PPC_EPR | 32
1965 PPC | KVM_REG_PPC_TCR | 32
1966 PPC | KVM_REG_PPC_TSR | 32
1967 PPC | KVM_REG_PPC_OR_TSR | 32
1968 PPC | KVM_REG_PPC_CLEAR_TSR | 32
1969 PPC | KVM_REG_PPC_MAS0 | 32
1970 PPC | KVM_REG_PPC_MAS1 | 32
1971 PPC | KVM_REG_PPC_MAS2 | 64
1972 PPC | KVM_REG_PPC_MAS7_3 | 64
1973 PPC | KVM_REG_PPC_MAS4 | 32
1974 PPC | KVM_REG_PPC_MAS6 | 32
1975 PPC | KVM_REG_PPC_MMUCFG | 32
1976 PPC | KVM_REG_PPC_TLB0CFG | 32
1977 PPC | KVM_REG_PPC_TLB1CFG | 32
1978 PPC | KVM_REG_PPC_TLB2CFG | 32
1979 PPC | KVM_REG_PPC_TLB3CFG | 32
1980 PPC | KVM_REG_PPC_TLB0PS | 32
1981 PPC | KVM_REG_PPC_TLB1PS | 32
1982 PPC | KVM_REG_PPC_TLB2PS | 32
1983 PPC | KVM_REG_PPC_TLB3PS | 32
1984 PPC | KVM_REG_PPC_EPTCFG | 32
1985 PPC | KVM_REG_PPC_ICP_STATE | 64
1986 PPC | KVM_REG_PPC_TB_OFFSET | 64
1987 PPC | KVM_REG_PPC_SPMC1 | 32
1988 PPC | KVM_REG_PPC_SPMC2 | 32
1989 PPC | KVM_REG_PPC_IAMR | 64
1990 PPC | KVM_REG_PPC_TFHAR | 64
1991 PPC | KVM_REG_PPC_TFIAR | 64
1992 PPC | KVM_REG_PPC_TEXASR | 64
1993 PPC | KVM_REG_PPC_FSCR | 64
1994 PPC | KVM_REG_PPC_PSPB | 32
1995 PPC | KVM_REG_PPC_EBBHR | 64
1996 PPC | KVM_REG_PPC_EBBRR | 64
1997 PPC | KVM_REG_PPC_BESCR | 64
1998 PPC | KVM_REG_PPC_TAR | 64
1999 PPC | KVM_REG_PPC_DPDES | 64
2000 PPC | KVM_REG_PPC_DAWR | 64
2001 PPC | KVM_REG_PPC_DAWRX | 64
2002 PPC | KVM_REG_PPC_CIABR | 64
2003 PPC | KVM_REG_PPC_IC | 64
2004 PPC | KVM_REG_PPC_VTB | 64
2005 PPC | KVM_REG_PPC_CSIGR | 64
2006 PPC | KVM_REG_PPC_TACR | 64
2007 PPC | KVM_REG_PPC_TCSCR | 64
2008 PPC | KVM_REG_PPC_PID | 64
2009 PPC | KVM_REG_PPC_ACOP | 64
2010 PPC | KVM_REG_PPC_VRSAVE | 32
cc568ead
PB
2011 PPC | KVM_REG_PPC_LPCR | 32
2012 PPC | KVM_REG_PPC_LPCR_64 | 64
bf5590f3
JH
2013 PPC | KVM_REG_PPC_PPR | 64
2014 PPC | KVM_REG_PPC_ARCH_COMPAT | 32
2015 PPC | KVM_REG_PPC_DABRX | 32
2016 PPC | KVM_REG_PPC_WORT | 64
bc8a4e5c
BB
2017 PPC | KVM_REG_PPC_SPRG9 | 64
2018 PPC | KVM_REG_PPC_DBSR | 32
e9cf1e08
PM
2019 PPC | KVM_REG_PPC_TIDR | 64
2020 PPC | KVM_REG_PPC_PSSCR | 64
5855564c 2021 PPC | KVM_REG_PPC_DEC_EXPIRY | 64
30323418 2022 PPC | KVM_REG_PPC_PTCR | 64
bf5590f3 2023 PPC | KVM_REG_PPC_TM_GPR0 | 64
3b783474 2024 ...
bf5590f3
JH
2025 PPC | KVM_REG_PPC_TM_GPR31 | 64
2026 PPC | KVM_REG_PPC_TM_VSR0 | 128
3b783474 2027 ...
bf5590f3
JH
2028 PPC | KVM_REG_PPC_TM_VSR63 | 128
2029 PPC | KVM_REG_PPC_TM_CR | 64
2030 PPC | KVM_REG_PPC_TM_LR | 64
2031 PPC | KVM_REG_PPC_TM_CTR | 64
2032 PPC | KVM_REG_PPC_TM_FPSCR | 64
2033 PPC | KVM_REG_PPC_TM_AMR | 64
2034 PPC | KVM_REG_PPC_TM_PPR | 64
2035 PPC | KVM_REG_PPC_TM_VRSAVE | 64
2036 PPC | KVM_REG_PPC_TM_VSCR | 32
2037 PPC | KVM_REG_PPC_TM_DSCR | 64
2038 PPC | KVM_REG_PPC_TM_TAR | 64
0d808df0 2039 PPC | KVM_REG_PPC_TM_XER | 64
c2d2c21b
JH
2040 | |
2041 MIPS | KVM_REG_MIPS_R0 | 64
2042 ...
2043 MIPS | KVM_REG_MIPS_R31 | 64
2044 MIPS | KVM_REG_MIPS_HI | 64
2045 MIPS | KVM_REG_MIPS_LO | 64
2046 MIPS | KVM_REG_MIPS_PC | 64
2047 MIPS | KVM_REG_MIPS_CP0_INDEX | 32
013044cc
JH
2048 MIPS | KVM_REG_MIPS_CP0_ENTRYLO0 | 64
2049 MIPS | KVM_REG_MIPS_CP0_ENTRYLO1 | 64
c2d2c21b 2050 MIPS | KVM_REG_MIPS_CP0_CONTEXT | 64
dffe042f 2051 MIPS | KVM_REG_MIPS_CP0_CONTEXTCONFIG| 32
c2d2c21b 2052 MIPS | KVM_REG_MIPS_CP0_USERLOCAL | 64
dffe042f 2053 MIPS | KVM_REG_MIPS_CP0_XCONTEXTCONFIG| 64
c2d2c21b 2054 MIPS | KVM_REG_MIPS_CP0_PAGEMASK | 32
c992a4f6 2055 MIPS | KVM_REG_MIPS_CP0_PAGEGRAIN | 32
4b7de028
JH
2056 MIPS | KVM_REG_MIPS_CP0_SEGCTL0 | 64
2057 MIPS | KVM_REG_MIPS_CP0_SEGCTL1 | 64
2058 MIPS | KVM_REG_MIPS_CP0_SEGCTL2 | 64
5a2f352f
JH
2059 MIPS | KVM_REG_MIPS_CP0_PWBASE | 64
2060 MIPS | KVM_REG_MIPS_CP0_PWFIELD | 64
2061 MIPS | KVM_REG_MIPS_CP0_PWSIZE | 64
c2d2c21b 2062 MIPS | KVM_REG_MIPS_CP0_WIRED | 32
5a2f352f 2063 MIPS | KVM_REG_MIPS_CP0_PWCTL | 32
c2d2c21b
JH
2064 MIPS | KVM_REG_MIPS_CP0_HWRENA | 32
2065 MIPS | KVM_REG_MIPS_CP0_BADVADDR | 64
edc89260
JH
2066 MIPS | KVM_REG_MIPS_CP0_BADINSTR | 32
2067 MIPS | KVM_REG_MIPS_CP0_BADINSTRP | 32
c2d2c21b
JH
2068 MIPS | KVM_REG_MIPS_CP0_COUNT | 32
2069 MIPS | KVM_REG_MIPS_CP0_ENTRYHI | 64
2070 MIPS | KVM_REG_MIPS_CP0_COMPARE | 32
2071 MIPS | KVM_REG_MIPS_CP0_STATUS | 32
ad58d4d4 2072 MIPS | KVM_REG_MIPS_CP0_INTCTL | 32
c2d2c21b
JH
2073 MIPS | KVM_REG_MIPS_CP0_CAUSE | 32
2074 MIPS | KVM_REG_MIPS_CP0_EPC | 64
1068eaaf 2075 MIPS | KVM_REG_MIPS_CP0_PRID | 32
7801bbe1 2076 MIPS | KVM_REG_MIPS_CP0_EBASE | 64
c2d2c21b
JH
2077 MIPS | KVM_REG_MIPS_CP0_CONFIG | 32
2078 MIPS | KVM_REG_MIPS_CP0_CONFIG1 | 32
2079 MIPS | KVM_REG_MIPS_CP0_CONFIG2 | 32
2080 MIPS | KVM_REG_MIPS_CP0_CONFIG3 | 32
c771607a
JH
2081 MIPS | KVM_REG_MIPS_CP0_CONFIG4 | 32
2082 MIPS | KVM_REG_MIPS_CP0_CONFIG5 | 32
c2d2c21b 2083 MIPS | KVM_REG_MIPS_CP0_CONFIG7 | 32
c992a4f6 2084 MIPS | KVM_REG_MIPS_CP0_XCONTEXT | 64
c2d2c21b 2085 MIPS | KVM_REG_MIPS_CP0_ERROREPC | 64
05108709
JH
2086 MIPS | KVM_REG_MIPS_CP0_KSCRATCH1 | 64
2087 MIPS | KVM_REG_MIPS_CP0_KSCRATCH2 | 64
2088 MIPS | KVM_REG_MIPS_CP0_KSCRATCH3 | 64
2089 MIPS | KVM_REG_MIPS_CP0_KSCRATCH4 | 64
2090 MIPS | KVM_REG_MIPS_CP0_KSCRATCH5 | 64
2091 MIPS | KVM_REG_MIPS_CP0_KSCRATCH6 | 64
d42a008f 2092 MIPS | KVM_REG_MIPS_CP0_MAAR(0..63) | 64
c2d2c21b
JH
2093 MIPS | KVM_REG_MIPS_COUNT_CTL | 64
2094 MIPS | KVM_REG_MIPS_COUNT_RESUME | 64
2095 MIPS | KVM_REG_MIPS_COUNT_HZ | 64
379245cd
JH
2096 MIPS | KVM_REG_MIPS_FPR_32(0..31) | 32
2097 MIPS | KVM_REG_MIPS_FPR_64(0..31) | 64
ab86bd60 2098 MIPS | KVM_REG_MIPS_VEC_128(0..31) | 128
379245cd
JH
2099 MIPS | KVM_REG_MIPS_FCR_IR | 32
2100 MIPS | KVM_REG_MIPS_FCR_CSR | 32
ab86bd60
JH
2101 MIPS | KVM_REG_MIPS_MSA_IR | 32
2102 MIPS | KVM_REG_MIPS_MSA_CSR | 32
414fa985 2103
749cf76c
CD
2104ARM registers are mapped using the lower 32 bits. The upper 16 of that
2105is the register group type, or coprocessor number:
2106
2107ARM core registers have the following id bit patterns:
aa404ddf 2108 0x4020 0000 0010 <index into the kvm_regs struct:16>
749cf76c 2109
1138245c 2110ARM 32-bit CP15 registers have the following id bit patterns:
aa404ddf 2111 0x4020 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>
1138245c
CD
2112
2113ARM 64-bit CP15 registers have the following id bit patterns:
aa404ddf 2114 0x4030 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>
749cf76c 2115
c27581ed 2116ARM CCSIDR registers are demultiplexed by CSSELR value:
aa404ddf 2117 0x4020 0000 0011 00 <csselr:8>
749cf76c 2118
4fe21e4c 2119ARM 32-bit VFP control registers have the following id bit patterns:
aa404ddf 2120 0x4020 0000 0012 1 <regno:12>
4fe21e4c
RR
2121
2122ARM 64-bit FP registers have the following id bit patterns:
aa404ddf 2123 0x4030 0000 0012 0 <regno:12>
4fe21e4c 2124
85bd0ba1
MZ
2125ARM firmware pseudo-registers have the following bit pattern:
2126 0x4030 0000 0014 <regno:16>
2127
379e04c7
MZ
2128
2129arm64 registers are mapped using the lower 32 bits. The upper 16 of
2130that is the register group type, or coprocessor number:
2131
2132arm64 core/FP-SIMD registers have the following id bit patterns. Note
2133that the size of the access is variable, as the kvm_regs structure
2134contains elements ranging from 32 to 128 bits. The index is a 32bit
2135value in the kvm_regs structure seen as a 32bit array.
2136 0x60x0 0000 0010 <index into the kvm_regs struct:16>
2137
2138arm64 CCSIDR registers are demultiplexed by CSSELR value:
2139 0x6020 0000 0011 00 <csselr:8>
2140
2141arm64 system registers have the following id bit patterns:
2142 0x6030 0000 0013 <op0:2> <op1:3> <crn:4> <crm:4> <op2:3>
2143
85bd0ba1
MZ
2144arm64 firmware pseudo-registers have the following bit pattern:
2145 0x6030 0000 0014 <regno:16>
2146
c2d2c21b
JH
2147
2148MIPS registers are mapped using the lower 32 bits. The upper 16 of that is
2149the register group type:
2150
2151MIPS core registers (see above) have the following id bit patterns:
2152 0x7030 0000 0000 <reg:16>
2153
2154MIPS CP0 registers (see KVM_REG_MIPS_CP0_* above) have the following id bit
2155patterns depending on whether they're 32-bit or 64-bit registers:
2156 0x7020 0000 0001 00 <reg:5> <sel:3> (32-bit)
2157 0x7030 0000 0001 00 <reg:5> <sel:3> (64-bit)
2158
013044cc
JH
2159Note: KVM_REG_MIPS_CP0_ENTRYLO0 and KVM_REG_MIPS_CP0_ENTRYLO1 are the MIPS64
2160versions of the EntryLo registers regardless of the word size of the host
2161hardware, host kernel, guest, and whether XPA is present in the guest, i.e.
2162with the RI and XI bits (if they exist) in bits 63 and 62 respectively, and
2163the PFNX field starting at bit 30.
2164
d42a008f
JH
2165MIPS MAARs (see KVM_REG_MIPS_CP0_MAAR(*) above) have the following id bit
2166patterns:
2167 0x7030 0000 0001 01 <reg:8>
2168
c2d2c21b
JH
2169MIPS KVM control registers (see above) have the following id bit patterns:
2170 0x7030 0000 0002 <reg:16>
2171
379245cd
JH
2172MIPS FPU registers (see KVM_REG_MIPS_FPR_{32,64}() above) have the following
2173id bit patterns depending on the size of the register being accessed. They are
2174always accessed according to the current guest FPU mode (Status.FR and
2175Config5.FRE), i.e. as the guest would see them, and they become unpredictable
ab86bd60
JH
2176if the guest FPU mode is changed. MIPS SIMD Architecture (MSA) vector
2177registers (see KVM_REG_MIPS_VEC_128() above) have similar patterns as they
2178overlap the FPU registers:
379245cd
JH
2179 0x7020 0000 0003 00 <0:3> <reg:5> (32-bit FPU registers)
2180 0x7030 0000 0003 00 <0:3> <reg:5> (64-bit FPU registers)
ab86bd60 2181 0x7040 0000 0003 00 <0:3> <reg:5> (128-bit MSA vector registers)
379245cd
JH
2182
2183MIPS FPU control registers (see KVM_REG_MIPS_FCR_{IR,CSR} above) have the
2184following id bit patterns:
2185 0x7020 0000 0003 01 <0:3> <reg:5>
2186
ab86bd60
JH
2187MIPS MSA control registers (see KVM_REG_MIPS_MSA_{IR,CSR} above) have the
2188following id bit patterns:
2189 0x7020 0000 0003 02 <0:3> <reg:5>
2190
c2d2c21b 2191
e24ed81f
AG
21924.69 KVM_GET_ONE_REG
2193
2194Capability: KVM_CAP_ONE_REG
2195Architectures: all
2196Type: vcpu ioctl
2197Parameters: struct kvm_one_reg (in and out)
2198Returns: 0 on success, negative value on failure
2199
2200This ioctl allows to receive the value of a single register implemented
2201in a vcpu. The register to read is indicated by the "id" field of the
2202kvm_one_reg struct passed in. On success, the register value can be found
2203at the memory location pointed to by "addr".
2204
2205The list of registers accessible using this interface is identical to the
2e232702 2206list in 4.68.
e24ed81f 2207
414fa985 2208
1c0b28c2
EM
22094.70 KVM_KVMCLOCK_CTRL
2210
2211Capability: KVM_CAP_KVMCLOCK_CTRL
2212Architectures: Any that implement pvclocks (currently x86 only)
2213Type: vcpu ioctl
2214Parameters: None
2215Returns: 0 on success, -1 on error
2216
2217This signals to the host kernel that the specified guest is being paused by
2218userspace. The host will set a flag in the pvclock structure that is checked
2219from the soft lockup watchdog. The flag is part of the pvclock structure that
2220is shared between guest and host, specifically the second bit of the flags
2221field of the pvclock_vcpu_time_info structure. It will be set exclusively by
2222the host and read/cleared exclusively by the guest. The guest operation of
2223checking and clearing the flag must an atomic operation so
2224load-link/store-conditional, or equivalent must be used. There are two cases
2225where the guest will clear the flag: when the soft lockup watchdog timer resets
2226itself or when a soft lockup is detected. This ioctl can be called any time
2227after pausing the vcpu, but before it is resumed.
2228
414fa985 2229
07975ad3
JK
22304.71 KVM_SIGNAL_MSI
2231
2232Capability: KVM_CAP_SIGNAL_MSI
2988509d 2233Architectures: x86 arm arm64
07975ad3
JK
2234Type: vm ioctl
2235Parameters: struct kvm_msi (in)
2236Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error
2237
2238Directly inject a MSI message. Only valid with in-kernel irqchip that handles
2239MSI messages.
2240
2241struct kvm_msi {
2242 __u32 address_lo;
2243 __u32 address_hi;
2244 __u32 data;
2245 __u32 flags;
2b8ddd93
AP
2246 __u32 devid;
2247 __u8 pad[12];
07975ad3
JK
2248};
2249
6f49b2f3
PB
2250flags: KVM_MSI_VALID_DEVID: devid contains a valid value. The per-VM
2251 KVM_CAP_MSI_DEVID capability advertises the requirement to provide
2252 the device ID. If this capability is not available, userspace
2253 should never set the KVM_MSI_VALID_DEVID flag as the ioctl might fail.
2b8ddd93 2254
6f49b2f3
PB
2255If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier
2256for the device that wrote the MSI message. For PCI, this is usually a
2257BFD identifier in the lower 16 bits.
07975ad3 2258
055b6ae9
PB
2259On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS
2260feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled,
2261address_hi bits 31-8 provide bits 31-8 of the destination id. Bits 7-0 of
2262address_hi must be zero.
37131313 2263
414fa985 2264
0589ff6c
JK
22654.71 KVM_CREATE_PIT2
2266
2267Capability: KVM_CAP_PIT2
2268Architectures: x86
2269Type: vm ioctl
2270Parameters: struct kvm_pit_config (in)
2271Returns: 0 on success, -1 on error
2272
2273Creates an in-kernel device model for the i8254 PIT. This call is only valid
2274after enabling in-kernel irqchip support via KVM_CREATE_IRQCHIP. The following
2275parameters have to be passed:
2276
2277struct kvm_pit_config {
2278 __u32 flags;
2279 __u32 pad[15];
2280};
2281
2282Valid flags are:
2283
2284#define KVM_PIT_SPEAKER_DUMMY 1 /* emulate speaker port stub */
2285
b6ddf05f
JK
2286PIT timer interrupts may use a per-VM kernel thread for injection. If it
2287exists, this thread will have a name of the following pattern:
2288
2289kvm-pit/<owner-process-pid>
2290
2291When running a guest with elevated priorities, the scheduling parameters of
2292this thread may have to be adjusted accordingly.
2293
0589ff6c
JK
2294This IOCTL replaces the obsolete KVM_CREATE_PIT.
2295
2296
22974.72 KVM_GET_PIT2
2298
2299Capability: KVM_CAP_PIT_STATE2
2300Architectures: x86
2301Type: vm ioctl
2302Parameters: struct kvm_pit_state2 (out)
2303Returns: 0 on success, -1 on error
2304
2305Retrieves the state of the in-kernel PIT model. Only valid after
2306KVM_CREATE_PIT2. The state is returned in the following structure:
2307
2308struct kvm_pit_state2 {
2309 struct kvm_pit_channel_state channels[3];
2310 __u32 flags;
2311 __u32 reserved[9];
2312};
2313
2314Valid flags are:
2315
2316/* disable PIT in HPET legacy mode */
2317#define KVM_PIT_FLAGS_HPET_LEGACY 0x00000001
2318
2319This IOCTL replaces the obsolete KVM_GET_PIT.
2320
2321
23224.73 KVM_SET_PIT2
2323
2324Capability: KVM_CAP_PIT_STATE2
2325Architectures: x86
2326Type: vm ioctl
2327Parameters: struct kvm_pit_state2 (in)
2328Returns: 0 on success, -1 on error
2329
2330Sets the state of the in-kernel PIT model. Only valid after KVM_CREATE_PIT2.
2331See KVM_GET_PIT2 for details on struct kvm_pit_state2.
2332
2333This IOCTL replaces the obsolete KVM_SET_PIT.
2334
2335
5b74716e
BH
23364.74 KVM_PPC_GET_SMMU_INFO
2337
2338Capability: KVM_CAP_PPC_GET_SMMU_INFO
2339Architectures: powerpc
2340Type: vm ioctl
2341Parameters: None
2342Returns: 0 on success, -1 on error
2343
2344This populates and returns a structure describing the features of
2345the "Server" class MMU emulation supported by KVM.
cc22c354 2346This can in turn be used by userspace to generate the appropriate
5b74716e
BH
2347device-tree properties for the guest operating system.
2348
c98be0c9 2349The structure contains some global information, followed by an
5b74716e
BH
2350array of supported segment page sizes:
2351
2352 struct kvm_ppc_smmu_info {
2353 __u64 flags;
2354 __u32 slb_size;
2355 __u32 pad;
2356 struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
2357 };
2358
2359The supported flags are:
2360
2361 - KVM_PPC_PAGE_SIZES_REAL:
2362 When that flag is set, guest page sizes must "fit" the backing
2363 store page sizes. When not set, any page size in the list can
2364 be used regardless of how they are backed by userspace.
2365
2366 - KVM_PPC_1T_SEGMENTS
2367 The emulated MMU supports 1T segments in addition to the
2368 standard 256M ones.
2369
901f8c3f
PM
2370 - KVM_PPC_NO_HASH
2371 This flag indicates that HPT guests are not supported by KVM,
2372 thus all guests must use radix MMU mode.
2373
5b74716e
BH
2374The "slb_size" field indicates how many SLB entries are supported
2375
2376The "sps" array contains 8 entries indicating the supported base
2377page sizes for a segment in increasing order. Each entry is defined
2378as follow:
2379
2380 struct kvm_ppc_one_seg_page_size {
2381 __u32 page_shift; /* Base page shift of segment (or 0) */
2382 __u32 slb_enc; /* SLB encoding for BookS */
2383 struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ];
2384 };
2385
2386An entry with a "page_shift" of 0 is unused. Because the array is
2387organized in increasing order, a lookup can stop when encoutering
2388such an entry.
2389
2390The "slb_enc" field provides the encoding to use in the SLB for the
2391page size. The bits are in positions such as the value can directly
2392be OR'ed into the "vsid" argument of the slbmte instruction.
2393
2394The "enc" array is a list which for each of those segment base page
2395size provides the list of supported actual page sizes (which can be
2396only larger or equal to the base page size), along with the
f884ab15 2397corresponding encoding in the hash PTE. Similarly, the array is
5b74716e
BH
23988 entries sorted by increasing sizes and an entry with a "0" shift
2399is an empty entry and a terminator:
2400
2401 struct kvm_ppc_one_page_size {
2402 __u32 page_shift; /* Page shift (or 0) */
2403 __u32 pte_enc; /* Encoding in the HPTE (>>12) */
2404 };
2405
2406The "pte_enc" field provides a value that can OR'ed into the hash
2407PTE's RPN field (ie, it needs to be shifted left by 12 to OR it
2408into the hash PTE second double word).
2409
f36992e3
AW
24104.75 KVM_IRQFD
2411
2412Capability: KVM_CAP_IRQFD
174178fe 2413Architectures: x86 s390 arm arm64
f36992e3
AW
2414Type: vm ioctl
2415Parameters: struct kvm_irqfd (in)
2416Returns: 0 on success, -1 on error
2417
2418Allows setting an eventfd to directly trigger a guest interrupt.
2419kvm_irqfd.fd specifies the file descriptor to use as the eventfd and
2420kvm_irqfd.gsi specifies the irqchip pin toggled by this event. When
17180032 2421an event is triggered on the eventfd, an interrupt is injected into
f36992e3
AW
2422the guest using the specified gsi pin. The irqfd is removed using
2423the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
2424and kvm_irqfd.gsi.
2425
7a84428a
AW
2426With KVM_CAP_IRQFD_RESAMPLE, KVM_IRQFD supports a de-assert and notify
2427mechanism allowing emulation of level-triggered, irqfd-based
2428interrupts. When KVM_IRQFD_FLAG_RESAMPLE is set the user must pass an
2429additional eventfd in the kvm_irqfd.resamplefd field. When operating
2430in resample mode, posting of an interrupt through kvm_irq.fd asserts
2431the specified gsi in the irqchip. When the irqchip is resampled, such
17180032 2432as from an EOI, the gsi is de-asserted and the user is notified via
7a84428a
AW
2433kvm_irqfd.resamplefd. It is the user's responsibility to re-queue
2434the interrupt if the device making use of it still requires service.
2435Note that closing the resamplefd is not sufficient to disable the
2436irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment
2437and need not be specified with KVM_IRQFD_FLAG_DEASSIGN.
2438
180ae7b1
EA
2439On arm/arm64, gsi routing being supported, the following can happen:
2440- in case no routing entry is associated to this gsi, injection fails
2441- in case the gsi is associated to an irqchip routing entry,
2442 irqchip.pin + 32 corresponds to the injected SPI ID.
995a0ee9
EA
2443- in case the gsi is associated to an MSI routing entry, the MSI
2444 message and device ID are translated into an LPI (support restricted
2445 to GICv3 ITS in-kernel emulation).
174178fe 2446
5fecc9d8 24474.76 KVM_PPC_ALLOCATE_HTAB
32fad281
PM
2448
2449Capability: KVM_CAP_PPC_ALLOC_HTAB
2450Architectures: powerpc
2451Type: vm ioctl
2452Parameters: Pointer to u32 containing hash table order (in/out)
2453Returns: 0 on success, -1 on error
2454
2455This requests the host kernel to allocate an MMU hash table for a
2456guest using the PAPR paravirtualization interface. This only does
2457anything if the kernel is configured to use the Book 3S HV style of
2458virtualization. Otherwise the capability doesn't exist and the ioctl
2459returns an ENOTTY error. The rest of this description assumes Book 3S
2460HV.
2461
2462There must be no vcpus running when this ioctl is called; if there
2463are, it will do nothing and return an EBUSY error.
2464
2465The parameter is a pointer to a 32-bit unsigned integer variable
2466containing the order (log base 2) of the desired size of the hash
2467table, which must be between 18 and 46. On successful return from the
f98a8bf9 2468ioctl, the value will not be changed by the kernel.
32fad281
PM
2469
2470If no hash table has been allocated when any vcpu is asked to run
2471(with the KVM_RUN ioctl), the host kernel will allocate a
2472default-sized hash table (16 MB).
2473
2474If this ioctl is called when a hash table has already been allocated,
f98a8bf9
DG
2475with a different order from the existing hash table, the existing hash
2476table will be freed and a new one allocated. If this is ioctl is
2477called when a hash table has already been allocated of the same order
2478as specified, the kernel will clear out the existing hash table (zero
2479all HPTEs). In either case, if the guest is using the virtualized
2480real-mode area (VRMA) facility, the kernel will re-create the VMRA
2481HPTEs on the next KVM_RUN of any vcpu.
32fad281 2482
416ad65f
CH
24834.77 KVM_S390_INTERRUPT
2484
2485Capability: basic
2486Architectures: s390
2487Type: vm ioctl, vcpu ioctl
2488Parameters: struct kvm_s390_interrupt (in)
2489Returns: 0 on success, -1 on error
2490
2491Allows to inject an interrupt to the guest. Interrupts can be floating
2492(vm ioctl) or per cpu (vcpu ioctl), depending on the interrupt type.
2493
2494Interrupt parameters are passed via kvm_s390_interrupt:
2495
2496struct kvm_s390_interrupt {
2497 __u32 type;
2498 __u32 parm;
2499 __u64 parm64;
2500};
2501
2502type can be one of the following:
2503
2822545f 2504KVM_S390_SIGP_STOP (vcpu) - sigp stop; optional flags in parm
416ad65f
CH
2505KVM_S390_PROGRAM_INT (vcpu) - program check; code in parm
2506KVM_S390_SIGP_SET_PREFIX (vcpu) - sigp set prefix; prefix address in parm
2507KVM_S390_RESTART (vcpu) - restart
e029ae5b
TH
2508KVM_S390_INT_CLOCK_COMP (vcpu) - clock comparator interrupt
2509KVM_S390_INT_CPU_TIMER (vcpu) - CPU timer interrupt
416ad65f
CH
2510KVM_S390_INT_VIRTIO (vm) - virtio external interrupt; external interrupt
2511 parameters in parm and parm64
2512KVM_S390_INT_SERVICE (vm) - sclp external interrupt; sclp parameter in parm
2513KVM_S390_INT_EMERGENCY (vcpu) - sigp emergency; source cpu in parm
2514KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; source cpu in parm
d8346b7d
CH
2515KVM_S390_INT_IO(ai,cssid,ssid,schid) (vm) - compound value to indicate an
2516 I/O interrupt (ai - adapter interrupt; cssid,ssid,schid - subchannel);
2517 I/O interruption parameters in parm (subchannel) and parm64 (intparm,
2518 interruption subclass)
48a3e950
CH
2519KVM_S390_MCHK (vm, vcpu) - machine check interrupt; cr 14 bits in parm,
2520 machine check interrupt code in parm64 (note that
2521 machine checks needing further payload are not
2522 supported by this ioctl)
416ad65f 2523
5e124900 2524This is an asynchronous vcpu ioctl and can be invoked from any thread.
416ad65f 2525
a2932923
PM
25264.78 KVM_PPC_GET_HTAB_FD
2527
2528Capability: KVM_CAP_PPC_HTAB_FD
2529Architectures: powerpc
2530Type: vm ioctl
2531Parameters: Pointer to struct kvm_get_htab_fd (in)
2532Returns: file descriptor number (>= 0) on success, -1 on error
2533
2534This returns a file descriptor that can be used either to read out the
2535entries in the guest's hashed page table (HPT), or to write entries to
2536initialize the HPT. The returned fd can only be written to if the
2537KVM_GET_HTAB_WRITE bit is set in the flags field of the argument, and
2538can only be read if that bit is clear. The argument struct looks like
2539this:
2540
2541/* For KVM_PPC_GET_HTAB_FD */
2542struct kvm_get_htab_fd {
2543 __u64 flags;
2544 __u64 start_index;
2545 __u64 reserved[2];
2546};
2547
2548/* Values for kvm_get_htab_fd.flags */
2549#define KVM_GET_HTAB_BOLTED_ONLY ((__u64)0x1)
2550#define KVM_GET_HTAB_WRITE ((__u64)0x2)
2551
2552The `start_index' field gives the index in the HPT of the entry at
2553which to start reading. It is ignored when writing.
2554
2555Reads on the fd will initially supply information about all
2556"interesting" HPT entries. Interesting entries are those with the
2557bolted bit set, if the KVM_GET_HTAB_BOLTED_ONLY bit is set, otherwise
2558all entries. When the end of the HPT is reached, the read() will
2559return. If read() is called again on the fd, it will start again from
2560the beginning of the HPT, but will only return HPT entries that have
2561changed since they were last read.
2562
2563Data read or written is structured as a header (8 bytes) followed by a
2564series of valid HPT entries (16 bytes) each. The header indicates how
2565many valid HPT entries there are and how many invalid entries follow
2566the valid entries. The invalid entries are not represented explicitly
2567in the stream. The header format is:
2568
2569struct kvm_get_htab_header {
2570 __u32 index;
2571 __u16 n_valid;
2572 __u16 n_invalid;
2573};
2574
2575Writes to the fd create HPT entries starting at the index given in the
2576header; first `n_valid' valid entries with contents from the data
2577written, then `n_invalid' invalid entries, invalidating any previously
2578valid entries found.
2579
852b6d57
SW
25804.79 KVM_CREATE_DEVICE
2581
2582Capability: KVM_CAP_DEVICE_CTRL
2583Type: vm ioctl
2584Parameters: struct kvm_create_device (in/out)
2585Returns: 0 on success, -1 on error
2586Errors:
2587 ENODEV: The device type is unknown or unsupported
2588 EEXIST: Device already created, and this type of device may not
2589 be instantiated multiple times
2590
2591 Other error conditions may be defined by individual device types or
2592 have their standard meanings.
2593
2594Creates an emulated device in the kernel. The file descriptor returned
2595in fd can be used with KVM_SET/GET/HAS_DEVICE_ATTR.
2596
2597If the KVM_CREATE_DEVICE_TEST flag is set, only test whether the
2598device type is supported (not necessarily whether it can be created
2599in the current vm).
2600
2601Individual devices should not define flags. Attributes should be used
2602for specifying any behavior that is not implied by the device type
2603number.
2604
2605struct kvm_create_device {
2606 __u32 type; /* in: KVM_DEV_TYPE_xxx */
2607 __u32 fd; /* out: device handle */
2608 __u32 flags; /* in: KVM_CREATE_DEVICE_xxx */
2609};
2610
26114.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR
2612
f577f6c2
SZ
2613Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device,
2614 KVM_CAP_VCPU_ATTRIBUTES for vcpu device
2615Type: device ioctl, vm ioctl, vcpu ioctl
852b6d57
SW
2616Parameters: struct kvm_device_attr
2617Returns: 0 on success, -1 on error
2618Errors:
2619 ENXIO: The group or attribute is unknown/unsupported for this device
f9cbd9b0 2620 or hardware support is missing.
852b6d57
SW
2621 EPERM: The attribute cannot (currently) be accessed this way
2622 (e.g. read-only attribute, or attribute that only makes
2623 sense when the device is in a different state)
2624
2625 Other error conditions may be defined by individual device types.
2626
2627Gets/sets a specified piece of device configuration and/or state. The
2628semantics are device-specific. See individual device documentation in
2629the "devices" directory. As with ONE_REG, the size of the data
2630transferred is defined by the particular attribute.
2631
2632struct kvm_device_attr {
2633 __u32 flags; /* no flags currently defined */
2634 __u32 group; /* device-defined */
2635 __u64 attr; /* group-defined */
2636 __u64 addr; /* userspace address of attr data */
2637};
2638
26394.81 KVM_HAS_DEVICE_ATTR
2640
f577f6c2
SZ
2641Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device,
2642 KVM_CAP_VCPU_ATTRIBUTES for vcpu device
2643Type: device ioctl, vm ioctl, vcpu ioctl
852b6d57
SW
2644Parameters: struct kvm_device_attr
2645Returns: 0 on success, -1 on error
2646Errors:
2647 ENXIO: The group or attribute is unknown/unsupported for this device
f9cbd9b0 2648 or hardware support is missing.
852b6d57
SW
2649
2650Tests whether a device supports a particular attribute. A successful
2651return indicates the attribute is implemented. It does not necessarily
2652indicate that the attribute can be read or written in the device's
2653current state. "addr" is ignored.
f36992e3 2654
d8968f1f 26554.82 KVM_ARM_VCPU_INIT
749cf76c
CD
2656
2657Capability: basic
379e04c7 2658Architectures: arm, arm64
749cf76c 2659Type: vcpu ioctl
beb11fc7 2660Parameters: struct kvm_vcpu_init (in)
749cf76c
CD
2661Returns: 0 on success; -1 on error
2662Errors:
2663  EINVAL:    the target is unknown, or the combination of features is invalid.
2664  ENOENT:    a features bit specified is unknown.
2665
2666This tells KVM what type of CPU to present to the guest, and what
2667optional features it should have.  This will cause a reset of the cpu
2668registers to their initial values.  If this is not called, KVM_RUN will
2669return ENOEXEC for that vcpu.
2670
2671Note that because some registers reflect machine topology, all vcpus
2672should be created before this ioctl is invoked.
2673
f7fa034d
CD
2674Userspace can call this function multiple times for a given vcpu, including
2675after the vcpu has been run. This will reset the vcpu to its initial
2676state. All calls to this function after the initial call must use the same
2677target and same set of feature flags, otherwise EINVAL will be returned.
2678
aa024c2f
MZ
2679Possible features:
2680 - KVM_ARM_VCPU_POWER_OFF: Starts the CPU in a power-off state.
3ad8b3de
CD
2681 Depends on KVM_CAP_ARM_PSCI. If not set, the CPU will be powered on
2682 and execute guest code when KVM_RUN is called.
379e04c7
MZ
2683 - KVM_ARM_VCPU_EL1_32BIT: Starts the CPU in a 32bit mode.
2684 Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only).
85bd0ba1
MZ
2685 - KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 (or a future revision
2686 backward compatible with v0.2) for the CPU.
50bb0c94 2687 Depends on KVM_CAP_ARM_PSCI_0_2.
808e7381
SZ
2688 - KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU.
2689 Depends on KVM_CAP_ARM_PMU_V3.
aa024c2f 2690
749cf76c 2691
740edfc0
AP
26924.83 KVM_ARM_PREFERRED_TARGET
2693
2694Capability: basic
2695Architectures: arm, arm64
2696Type: vm ioctl
2697Parameters: struct struct kvm_vcpu_init (out)
2698Returns: 0 on success; -1 on error
2699Errors:
a7265fb1 2700 ENODEV: no preferred target available for the host
740edfc0
AP
2701
2702This queries KVM for preferred CPU target type which can be emulated
2703by KVM on underlying host.
2704
2705The ioctl returns struct kvm_vcpu_init instance containing information
2706about preferred CPU target type and recommended features for it. The
2707kvm_vcpu_init->features bitmap returned will have feature bits set if
2708the preferred target recommends setting these features, but this is
2709not mandatory.
2710
2711The information returned by this ioctl can be used to prepare an instance
2712of struct kvm_vcpu_init for KVM_ARM_VCPU_INIT ioctl which will result in
2713in VCPU matching underlying host.
2714
2715
27164.84 KVM_GET_REG_LIST
749cf76c
CD
2717
2718Capability: basic
c2d2c21b 2719Architectures: arm, arm64, mips
749cf76c
CD
2720Type: vcpu ioctl
2721Parameters: struct kvm_reg_list (in/out)
2722Returns: 0 on success; -1 on error
2723Errors:
2724  E2BIG:     the reg index list is too big to fit in the array specified by
2725             the user (the number required will be written into n).
2726
2727struct kvm_reg_list {
2728 __u64 n; /* number of registers in reg[] */
2729 __u64 reg[0];
2730};
2731
2732This ioctl returns the guest registers that are supported for the
2733KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
2734
ce01e4e8
CD
2735
27364.85 KVM_ARM_SET_DEVICE_ADDR (deprecated)
3401d546
CD
2737
2738Capability: KVM_CAP_ARM_SET_DEVICE_ADDR
379e04c7 2739Architectures: arm, arm64
3401d546
CD
2740Type: vm ioctl
2741Parameters: struct kvm_arm_device_address (in)
2742Returns: 0 on success, -1 on error
2743Errors:
2744 ENODEV: The device id is unknown
2745 ENXIO: Device not supported on current system
2746 EEXIST: Address already set
2747 E2BIG: Address outside guest physical address space
330690cd 2748 EBUSY: Address overlaps with other device range
3401d546
CD
2749
2750struct kvm_arm_device_addr {
2751 __u64 id;
2752 __u64 addr;
2753};
2754
2755Specify a device address in the guest's physical address space where guests
2756can access emulated or directly exposed devices, which the host kernel needs
2757to know about. The id field is an architecture specific identifier for a
2758specific device.
2759
379e04c7
MZ
2760ARM/arm64 divides the id field into two parts, a device id and an
2761address type id specific to the individual device.
3401d546
CD
2762
2763  bits: | 63 ... 32 | 31 ... 16 | 15 ... 0 |
2764 field: | 0x00000000 | device id | addr type id |
2765
379e04c7
MZ
2766ARM/arm64 currently only require this when using the in-kernel GIC
2767support for the hardware VGIC features, using KVM_ARM_DEVICE_VGIC_V2
2768as the device id. When setting the base address for the guest's
2769mapping of the VGIC virtual CPU and distributor interface, the ioctl
2770must be called after calling KVM_CREATE_IRQCHIP, but before calling
2771KVM_RUN on any of the VCPUs. Calling this ioctl twice for any of the
2772base addresses will return -EEXIST.
3401d546 2773
ce01e4e8
CD
2774Note, this IOCTL is deprecated and the more flexible SET/GET_DEVICE_ATTR API
2775should be used instead.
2776
2777
740edfc0 27784.86 KVM_PPC_RTAS_DEFINE_TOKEN
8e591cb7
ME
2779
2780Capability: KVM_CAP_PPC_RTAS
2781Architectures: ppc
2782Type: vm ioctl
2783Parameters: struct kvm_rtas_token_args
2784Returns: 0 on success, -1 on error
2785
2786Defines a token value for a RTAS (Run Time Abstraction Services)
2787service in order to allow it to be handled in the kernel. The
2788argument struct gives the name of the service, which must be the name
2789of a service that has a kernel-side implementation. If the token
2790value is non-zero, it will be associated with that service, and
2791subsequent RTAS calls by the guest specifying that token will be
2792handled by the kernel. If the token value is 0, then any token
2793associated with the service will be forgotten, and subsequent RTAS
2794calls by the guest for that service will be passed to userspace to be
2795handled.
2796
4bd9d344
AB
27974.87 KVM_SET_GUEST_DEBUG
2798
2799Capability: KVM_CAP_SET_GUEST_DEBUG
0e6f07f2 2800Architectures: x86, s390, ppc, arm64
4bd9d344
AB
2801Type: vcpu ioctl
2802Parameters: struct kvm_guest_debug (in)
2803Returns: 0 on success; -1 on error
2804
2805struct kvm_guest_debug {
2806 __u32 control;
2807 __u32 pad;
2808 struct kvm_guest_debug_arch arch;
2809};
2810
2811Set up the processor specific debug registers and configure vcpu for
2812handling guest debug events. There are two parts to the structure, the
2813first a control bitfield indicates the type of debug events to handle
2814when running. Common control bits are:
2815
2816 - KVM_GUESTDBG_ENABLE: guest debugging is enabled
2817 - KVM_GUESTDBG_SINGLESTEP: the next run should single-step
2818
2819The top 16 bits of the control field are architecture specific control
2820flags which can include the following:
2821
4bd611ca 2822 - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64]
834bf887 2823 - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390, arm64]
4bd9d344
AB
2824 - KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86]
2825 - KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86]
2826 - KVM_GUESTDBG_EXIT_PENDING: trigger an immediate guest exit [s390]
2827
2828For example KVM_GUESTDBG_USE_SW_BP indicates that software breakpoints
2829are enabled in memory so we need to ensure breakpoint exceptions are
2830correctly trapped and the KVM run loop exits at the breakpoint and not
2831running off into the normal guest vector. For KVM_GUESTDBG_USE_HW_BP
2832we need to ensure the guest vCPUs architecture specific registers are
2833updated to the correct (supplied) values.
2834
2835The second part of the structure is architecture specific and
2836typically contains a set of debug registers.
2837
834bf887
AB
2838For arm64 the number of debug registers is implementation defined and
2839can be determined by querying the KVM_CAP_GUEST_DEBUG_HW_BPS and
2840KVM_CAP_GUEST_DEBUG_HW_WPS capabilities which return a positive number
2841indicating the number of supported registers.
2842
4bd9d344
AB
2843When debug events exit the main run loop with the reason
2844KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run
2845structure containing architecture specific debug information.
3401d546 2846
209cf19f
AB
28474.88 KVM_GET_EMULATED_CPUID
2848
2849Capability: KVM_CAP_EXT_EMUL_CPUID
2850Architectures: x86
2851Type: system ioctl
2852Parameters: struct kvm_cpuid2 (in/out)
2853Returns: 0 on success, -1 on error
2854
2855struct kvm_cpuid2 {
2856 __u32 nent;
2857 __u32 flags;
2858 struct kvm_cpuid_entry2 entries[0];
2859};
2860
2861The member 'flags' is used for passing flags from userspace.
2862
2863#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
2864#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
2865#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
2866
2867struct kvm_cpuid_entry2 {
2868 __u32 function;
2869 __u32 index;
2870 __u32 flags;
2871 __u32 eax;
2872 __u32 ebx;
2873 __u32 ecx;
2874 __u32 edx;
2875 __u32 padding[3];
2876};
2877
2878This ioctl returns x86 cpuid features which are emulated by
2879kvm.Userspace can use the information returned by this ioctl to query
2880which features are emulated by kvm instead of being present natively.
2881
2882Userspace invokes KVM_GET_EMULATED_CPUID by passing a kvm_cpuid2
2883structure with the 'nent' field indicating the number of entries in
2884the variable-size array 'entries'. If the number of entries is too low
2885to describe the cpu capabilities, an error (E2BIG) is returned. If the
2886number is too high, the 'nent' field is adjusted and an error (ENOMEM)
2887is returned. If the number is just right, the 'nent' field is adjusted
2888to the number of valid entries in the 'entries' array, which is then
2889filled.
2890
2891The entries returned are the set CPUID bits of the respective features
2892which kvm emulates, as returned by the CPUID instruction, with unknown
2893or unsupported feature bits cleared.
2894
2895Features like x2apic, for example, may not be present in the host cpu
2896but are exposed by kvm in KVM_GET_SUPPORTED_CPUID because they can be
2897emulated efficiently and thus not included here.
2898
2899The fields in each entry are defined as follows:
2900
2901 function: the eax value used to obtain the entry
2902 index: the ecx value used to obtain the entry (for entries that are
2903 affected by ecx)
2904 flags: an OR of zero or more of the following:
2905 KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
2906 if the index field is valid
2907 KVM_CPUID_FLAG_STATEFUL_FUNC:
2908 if cpuid for this function returns different values for successive
2909 invocations; there will be several entries with the same function,
2910 all with this flag set
2911 KVM_CPUID_FLAG_STATE_READ_NEXT:
2912 for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
2913 the first entry to be read by a cpu
2914 eax, ebx, ecx, edx: the values returned by the cpuid instruction for
2915 this function/index combination
2916
41408c28
TH
29174.89 KVM_S390_MEM_OP
2918
2919Capability: KVM_CAP_S390_MEM_OP
2920Architectures: s390
2921Type: vcpu ioctl
2922Parameters: struct kvm_s390_mem_op (in)
2923Returns: = 0 on success,
2924 < 0 on generic error (e.g. -EFAULT or -ENOMEM),
2925 > 0 if an exception occurred while walking the page tables
2926
5d4f6f3d 2927Read or write data from/to the logical (virtual) memory of a VCPU.
41408c28
TH
2928
2929Parameters are specified via the following structure:
2930
2931struct kvm_s390_mem_op {
2932 __u64 gaddr; /* the guest address */
2933 __u64 flags; /* flags */
2934 __u32 size; /* amount of bytes */
2935 __u32 op; /* type of operation */
2936 __u64 buf; /* buffer in userspace */
2937 __u8 ar; /* the access register number */
2938 __u8 reserved[31]; /* should be set to 0 */
2939};
2940
2941The type of operation is specified in the "op" field. It is either
2942KVM_S390_MEMOP_LOGICAL_READ for reading from logical memory space or
2943KVM_S390_MEMOP_LOGICAL_WRITE for writing to logical memory space. The
2944KVM_S390_MEMOP_F_CHECK_ONLY flag can be set in the "flags" field to check
2945whether the corresponding memory access would create an access exception
2946(without touching the data in the memory at the destination). In case an
2947access exception occurred while walking the MMU tables of the guest, the
2948ioctl returns a positive error number to indicate the type of exception.
2949This exception is also raised directly at the corresponding VCPU if the
2950flag KVM_S390_MEMOP_F_INJECT_EXCEPTION is set in the "flags" field.
2951
2952The start address of the memory region has to be specified in the "gaddr"
2953field, and the length of the region in the "size" field. "buf" is the buffer
2954supplied by the userspace application where the read data should be written
2955to for KVM_S390_MEMOP_LOGICAL_READ, or where the data that should be written
2956is stored for a KVM_S390_MEMOP_LOGICAL_WRITE. "buf" is unused and can be NULL
2957when KVM_S390_MEMOP_F_CHECK_ONLY is specified. "ar" designates the access
2958register number to be used.
2959
2960The "reserved" field is meant for future extensions. It is not used by
2961KVM with the currently defined set of flags.
2962
30ee2a98
JH
29634.90 KVM_S390_GET_SKEYS
2964
2965Capability: KVM_CAP_S390_SKEYS
2966Architectures: s390
2967Type: vm ioctl
2968Parameters: struct kvm_s390_skeys
2969Returns: 0 on success, KVM_S390_GET_KEYS_NONE if guest is not using storage
2970 keys, negative value on error
2971
2972This ioctl is used to get guest storage key values on the s390
2973architecture. The ioctl takes parameters via the kvm_s390_skeys struct.
2974
2975struct kvm_s390_skeys {
2976 __u64 start_gfn;
2977 __u64 count;
2978 __u64 skeydata_addr;
2979 __u32 flags;
2980 __u32 reserved[9];
2981};
2982
2983The start_gfn field is the number of the first guest frame whose storage keys
2984you want to get.
2985
2986The count field is the number of consecutive frames (starting from start_gfn)
2987whose storage keys to get. The count field must be at least 1 and the maximum
2988allowed value is defined as KVM_S390_SKEYS_ALLOC_MAX. Values outside this range
2989will cause the ioctl to return -EINVAL.
2990
2991The skeydata_addr field is the address to a buffer large enough to hold count
2992bytes. This buffer will be filled with storage key data by the ioctl.
2993
29944.91 KVM_S390_SET_SKEYS
2995
2996Capability: KVM_CAP_S390_SKEYS
2997Architectures: s390
2998Type: vm ioctl
2999Parameters: struct kvm_s390_skeys
3000Returns: 0 on success, negative value on error
3001
3002This ioctl is used to set guest storage key values on the s390
3003architecture. The ioctl takes parameters via the kvm_s390_skeys struct.
3004See section on KVM_S390_GET_SKEYS for struct definition.
3005
3006The start_gfn field is the number of the first guest frame whose storage keys
3007you want to set.
3008
3009The count field is the number of consecutive frames (starting from start_gfn)
3010whose storage keys to get. The count field must be at least 1 and the maximum
3011allowed value is defined as KVM_S390_SKEYS_ALLOC_MAX. Values outside this range
3012will cause the ioctl to return -EINVAL.
3013
3014The skeydata_addr field is the address to a buffer containing count bytes of
3015storage keys. Each byte in the buffer will be set as the storage key for a
3016single frame starting at start_gfn for count frames.
3017
3018Note: If any architecturally invalid key value is found in the given data then
3019the ioctl will return -EINVAL.
3020
47b43c52
JF
30214.92 KVM_S390_IRQ
3022
3023Capability: KVM_CAP_S390_INJECT_IRQ
3024Architectures: s390
3025Type: vcpu ioctl
3026Parameters: struct kvm_s390_irq (in)
3027Returns: 0 on success, -1 on error
3028Errors:
3029 EINVAL: interrupt type is invalid
3030 type is KVM_S390_SIGP_STOP and flag parameter is invalid value
3031 type is KVM_S390_INT_EXTERNAL_CALL and code is bigger
3032 than the maximum of VCPUs
3033 EBUSY: type is KVM_S390_SIGP_SET_PREFIX and vcpu is not stopped
3034 type is KVM_S390_SIGP_STOP and a stop irq is already pending
3035 type is KVM_S390_INT_EXTERNAL_CALL and an external call interrupt
3036 is already pending
3037
3038Allows to inject an interrupt to the guest.
3039
3040Using struct kvm_s390_irq as a parameter allows
3041to inject additional payload which is not
3042possible via KVM_S390_INTERRUPT.
3043
3044Interrupt parameters are passed via kvm_s390_irq:
3045
3046struct kvm_s390_irq {
3047 __u64 type;
3048 union {
3049 struct kvm_s390_io_info io;
3050 struct kvm_s390_ext_info ext;
3051 struct kvm_s390_pgm_info pgm;
3052 struct kvm_s390_emerg_info emerg;
3053 struct kvm_s390_extcall_info extcall;
3054 struct kvm_s390_prefix_info prefix;
3055 struct kvm_s390_stop_info stop;
3056 struct kvm_s390_mchk_info mchk;
3057 char reserved[64];
3058 } u;
3059};
3060
3061type can be one of the following:
3062
3063KVM_S390_SIGP_STOP - sigp stop; parameter in .stop
3064KVM_S390_PROGRAM_INT - program check; parameters in .pgm
3065KVM_S390_SIGP_SET_PREFIX - sigp set prefix; parameters in .prefix
3066KVM_S390_RESTART - restart; no parameters
3067KVM_S390_INT_CLOCK_COMP - clock comparator interrupt; no parameters
3068KVM_S390_INT_CPU_TIMER - CPU timer interrupt; no parameters
3069KVM_S390_INT_EMERGENCY - sigp emergency; parameters in .emerg
3070KVM_S390_INT_EXTERNAL_CALL - sigp external call; parameters in .extcall
3071KVM_S390_MCHK - machine check interrupt; parameters in .mchk
3072
5e124900 3073This is an asynchronous vcpu ioctl and can be invoked from any thread.
47b43c52 3074
816c7667
JF
30754.94 KVM_S390_GET_IRQ_STATE
3076
3077Capability: KVM_CAP_S390_IRQ_STATE
3078Architectures: s390
3079Type: vcpu ioctl
3080Parameters: struct kvm_s390_irq_state (out)
3081Returns: >= number of bytes copied into buffer,
3082 -EINVAL if buffer size is 0,
3083 -ENOBUFS if buffer size is too small to fit all pending interrupts,
3084 -EFAULT if the buffer address was invalid
3085
3086This ioctl allows userspace to retrieve the complete state of all currently
3087pending interrupts in a single buffer. Use cases include migration
3088and introspection. The parameter structure contains the address of a
3089userspace buffer and its length:
3090
3091struct kvm_s390_irq_state {
3092 __u64 buf;
bb64da9a 3093 __u32 flags; /* will stay unused for compatibility reasons */
816c7667 3094 __u32 len;
bb64da9a 3095 __u32 reserved[4]; /* will stay unused for compatibility reasons */
816c7667
JF
3096};
3097
3098Userspace passes in the above struct and for each pending interrupt a
3099struct kvm_s390_irq is copied to the provided buffer.
3100
bb64da9a
CB
3101The structure contains a flags and a reserved field for future extensions. As
3102the kernel never checked for flags == 0 and QEMU never pre-zeroed flags and
3103reserved, these fields can not be used in the future without breaking
3104compatibility.
3105
816c7667
JF
3106If -ENOBUFS is returned the buffer provided was too small and userspace
3107may retry with a bigger buffer.
3108
31094.95 KVM_S390_SET_IRQ_STATE
3110
3111Capability: KVM_CAP_S390_IRQ_STATE
3112Architectures: s390
3113Type: vcpu ioctl
3114Parameters: struct kvm_s390_irq_state (in)
3115Returns: 0 on success,
3116 -EFAULT if the buffer address was invalid,
3117 -EINVAL for an invalid buffer length (see below),
3118 -EBUSY if there were already interrupts pending,
3119 errors occurring when actually injecting the
3120 interrupt. See KVM_S390_IRQ.
3121
3122This ioctl allows userspace to set the complete state of all cpu-local
3123interrupts currently pending for the vcpu. It is intended for restoring
3124interrupt state after a migration. The input parameter is a userspace buffer
3125containing a struct kvm_s390_irq_state:
3126
3127struct kvm_s390_irq_state {
3128 __u64 buf;
bb64da9a 3129 __u32 flags; /* will stay unused for compatibility reasons */
816c7667 3130 __u32 len;
bb64da9a 3131 __u32 reserved[4]; /* will stay unused for compatibility reasons */
816c7667
JF
3132};
3133
bb64da9a
CB
3134The restrictions for flags and reserved apply as well.
3135(see KVM_S390_GET_IRQ_STATE)
3136
816c7667
JF
3137The userspace memory referenced by buf contains a struct kvm_s390_irq
3138for each interrupt to be injected into the guest.
3139If one of the interrupts could not be injected for some reason the
3140ioctl aborts.
3141
3142len must be a multiple of sizeof(struct kvm_s390_irq). It must be > 0
3143and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq),
3144which is the maximum number of possibly pending cpu-local interrupts.
47b43c52 3145
ed8e5a24 31464.96 KVM_SMI
f077825a
PB
3147
3148Capability: KVM_CAP_X86_SMM
3149Architectures: x86
3150Type: vcpu ioctl
3151Parameters: none
3152Returns: 0 on success, -1 on error
3153
3154Queues an SMI on the thread's vcpu.
3155
d3695aa4
AK
31564.97 KVM_CAP_PPC_MULTITCE
3157
3158Capability: KVM_CAP_PPC_MULTITCE
3159Architectures: ppc
3160Type: vm
3161
3162This capability means the kernel is capable of handling hypercalls
3163H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
3164space. This significantly accelerates DMA operations for PPC KVM guests.
3165User space should expect that its handlers for these hypercalls
3166are not going to be called if user space previously registered LIOBN
3167in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
3168
3169In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
3170user space might have to advertise it for the guest. For example,
3171IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
3172present in the "ibm,hypertas-functions" device-tree property.
3173
3174The hypercalls mentioned above may or may not be processed successfully
3175in the kernel based fast path. If they can not be handled by the kernel,
3176they will get passed on to user space. So user space still has to have
3177an implementation for these despite the in kernel acceleration.
3178
3179This capability is always enabled.
3180
58ded420
AK
31814.98 KVM_CREATE_SPAPR_TCE_64
3182
3183Capability: KVM_CAP_SPAPR_TCE_64
3184Architectures: powerpc
3185Type: vm ioctl
3186Parameters: struct kvm_create_spapr_tce_64 (in)
3187Returns: file descriptor for manipulating the created TCE table
3188
3189This is an extension for KVM_CAP_SPAPR_TCE which only supports 32bit
3190windows, described in 4.62 KVM_CREATE_SPAPR_TCE
3191
3192This capability uses extended struct in ioctl interface:
3193
3194/* for KVM_CAP_SPAPR_TCE_64 */
3195struct kvm_create_spapr_tce_64 {
3196 __u64 liobn;
3197 __u32 page_shift;
3198 __u32 flags;
3199 __u64 offset; /* in pages */
3200 __u64 size; /* in pages */
3201};
3202
3203The aim of extension is to support an additional bigger DMA window with
3204a variable page size.
3205KVM_CREATE_SPAPR_TCE_64 receives a 64bit window size, an IOMMU page shift and
3206a bus offset of the corresponding DMA window, @size and @offset are numbers
3207of IOMMU pages.
3208
3209@flags are not used at the moment.
3210
3211The rest of functionality is identical to KVM_CREATE_SPAPR_TCE.
3212
ccc4df4e 32134.99 KVM_REINJECT_CONTROL
107d44a2
RK
3214
3215Capability: KVM_CAP_REINJECT_CONTROL
3216Architectures: x86
3217Type: vm ioctl
3218Parameters: struct kvm_reinject_control (in)
3219Returns: 0 on success,
3220 -EFAULT if struct kvm_reinject_control cannot be read,
3221 -ENXIO if KVM_CREATE_PIT or KVM_CREATE_PIT2 didn't succeed earlier.
3222
3223i8254 (PIT) has two modes, reinject and !reinject. The default is reinject,
3224where KVM queues elapsed i8254 ticks and monitors completion of interrupt from
3225vector(s) that i8254 injects. Reinject mode dequeues a tick and injects its
3226interrupt whenever there isn't a pending interrupt from i8254.
3227!reinject mode injects an interrupt as soon as a tick arrives.
3228
3229struct kvm_reinject_control {
3230 __u8 pit_reinject;
3231 __u8 reserved[31];
3232};
3233
3234pit_reinject = 0 (!reinject mode) is recommended, unless running an old
3235operating system that uses the PIT for timing (e.g. Linux 2.4.x).
3236
ccc4df4e 32374.100 KVM_PPC_CONFIGURE_V3_MMU
c9270132
PM
3238
3239Capability: KVM_CAP_PPC_RADIX_MMU or KVM_CAP_PPC_HASH_MMU_V3
3240Architectures: ppc
3241Type: vm ioctl
3242Parameters: struct kvm_ppc_mmuv3_cfg (in)
3243Returns: 0 on success,
3244 -EFAULT if struct kvm_ppc_mmuv3_cfg cannot be read,
3245 -EINVAL if the configuration is invalid
3246
3247This ioctl controls whether the guest will use radix or HPT (hashed
3248page table) translation, and sets the pointer to the process table for
3249the guest.
3250
3251struct kvm_ppc_mmuv3_cfg {
3252 __u64 flags;
3253 __u64 process_table;
3254};
3255
3256There are two bits that can be set in flags; KVM_PPC_MMUV3_RADIX and
3257KVM_PPC_MMUV3_GTSE. KVM_PPC_MMUV3_RADIX, if set, configures the guest
3258to use radix tree translation, and if clear, to use HPT translation.
3259KVM_PPC_MMUV3_GTSE, if set and if KVM permits it, configures the guest
3260to be able to use the global TLB and SLB invalidation instructions;
3261if clear, the guest may not use these instructions.
3262
3263The process_table field specifies the address and size of the guest
3264process table, which is in the guest's space. This field is formatted
3265as the second doubleword of the partition table entry, as defined in
3266the Power ISA V3.00, Book III section 5.7.6.1.
3267
ccc4df4e 32684.101 KVM_PPC_GET_RMMU_INFO
c9270132
PM
3269
3270Capability: KVM_CAP_PPC_RADIX_MMU
3271Architectures: ppc
3272Type: vm ioctl
3273Parameters: struct kvm_ppc_rmmu_info (out)
3274Returns: 0 on success,
3275 -EFAULT if struct kvm_ppc_rmmu_info cannot be written,
3276 -EINVAL if no useful information can be returned
3277
3278This ioctl returns a structure containing two things: (a) a list
3279containing supported radix tree geometries, and (b) a list that maps
3280page sizes to put in the "AP" (actual page size) field for the tlbie
3281(TLB invalidate entry) instruction.
3282
3283struct kvm_ppc_rmmu_info {
3284 struct kvm_ppc_radix_geom {
3285 __u8 page_shift;
3286 __u8 level_bits[4];
3287 __u8 pad[3];
3288 } geometries[8];
3289 __u32 ap_encodings[8];
3290};
3291
3292The geometries[] field gives up to 8 supported geometries for the
3293radix page table, in terms of the log base 2 of the smallest page
3294size, and the number of bits indexed at each level of the tree, from
3295the PTE level up to the PGD level in that order. Any unused entries
3296will have 0 in the page_shift field.
3297
3298The ap_encodings gives the supported page sizes and their AP field
3299encodings, encoded with the AP value in the top 3 bits and the log
3300base 2 of the page size in the bottom 6 bits.
3301
ef1ead0c
DG
33024.102 KVM_PPC_RESIZE_HPT_PREPARE
3303
3304Capability: KVM_CAP_SPAPR_RESIZE_HPT
3305Architectures: powerpc
3306Type: vm ioctl
3307Parameters: struct kvm_ppc_resize_hpt (in)
3308Returns: 0 on successful completion,
3309 >0 if a new HPT is being prepared, the value is an estimated
3310 number of milliseconds until preparation is complete
3311 -EFAULT if struct kvm_reinject_control cannot be read,
3312 -EINVAL if the supplied shift or flags are invalid
3313 -ENOMEM if unable to allocate the new HPT
3314 -ENOSPC if there was a hash collision when moving existing
3315 HPT entries to the new HPT
3316 -EIO on other error conditions
3317
3318Used to implement the PAPR extension for runtime resizing of a guest's
3319Hashed Page Table (HPT). Specifically this starts, stops or monitors
3320the preparation of a new potential HPT for the guest, essentially
3321implementing the H_RESIZE_HPT_PREPARE hypercall.
3322
3323If called with shift > 0 when there is no pending HPT for the guest,
3324this begins preparation of a new pending HPT of size 2^(shift) bytes.
3325It then returns a positive integer with the estimated number of
3326milliseconds until preparation is complete.
3327
3328If called when there is a pending HPT whose size does not match that
3329requested in the parameters, discards the existing pending HPT and
3330creates a new one as above.
3331
3332If called when there is a pending HPT of the size requested, will:
3333 * If preparation of the pending HPT is already complete, return 0
3334 * If preparation of the pending HPT has failed, return an error
3335 code, then discard the pending HPT.
3336 * If preparation of the pending HPT is still in progress, return an
3337 estimated number of milliseconds until preparation is complete.
3338
3339If called with shift == 0, discards any currently pending HPT and
3340returns 0 (i.e. cancels any in-progress preparation).
3341
3342flags is reserved for future expansion, currently setting any bits in
3343flags will result in an -EINVAL.
3344
3345Normally this will be called repeatedly with the same parameters until
3346it returns <= 0. The first call will initiate preparation, subsequent
3347ones will monitor preparation until it completes or fails.
3348
3349struct kvm_ppc_resize_hpt {
3350 __u64 flags;
3351 __u32 shift;
3352 __u32 pad;
3353};
3354
33554.103 KVM_PPC_RESIZE_HPT_COMMIT
3356
3357Capability: KVM_CAP_SPAPR_RESIZE_HPT
3358Architectures: powerpc
3359Type: vm ioctl
3360Parameters: struct kvm_ppc_resize_hpt (in)
3361Returns: 0 on successful completion,
3362 -EFAULT if struct kvm_reinject_control cannot be read,
3363 -EINVAL if the supplied shift or flags are invalid
3364 -ENXIO is there is no pending HPT, or the pending HPT doesn't
3365 have the requested size
3366 -EBUSY if the pending HPT is not fully prepared
3367 -ENOSPC if there was a hash collision when moving existing
3368 HPT entries to the new HPT
3369 -EIO on other error conditions
3370
3371Used to implement the PAPR extension for runtime resizing of a guest's
3372Hashed Page Table (HPT). Specifically this requests that the guest be
3373transferred to working with the new HPT, essentially implementing the
3374H_RESIZE_HPT_COMMIT hypercall.
3375
3376This should only be called after KVM_PPC_RESIZE_HPT_PREPARE has
3377returned 0 with the same parameters. In other cases
3378KVM_PPC_RESIZE_HPT_COMMIT will return an error (usually -ENXIO or
3379-EBUSY, though others may be possible if the preparation was started,
3380but failed).
3381
3382This will have undefined effects on the guest if it has not already
3383placed itself in a quiescent state where no vcpu will make MMU enabled
3384memory accesses.
3385
3386On succsful completion, the pending HPT will become the guest's active
3387HPT and the previous HPT will be discarded.
3388
3389On failure, the guest will still be operating on its previous HPT.
3390
3391struct kvm_ppc_resize_hpt {
3392 __u64 flags;
3393 __u32 shift;
3394 __u32 pad;
3395};
3396
3aa53859
LC
33974.104 KVM_X86_GET_MCE_CAP_SUPPORTED
3398
3399Capability: KVM_CAP_MCE
3400Architectures: x86
3401Type: system ioctl
3402Parameters: u64 mce_cap (out)
3403Returns: 0 on success, -1 on error
3404
3405Returns supported MCE capabilities. The u64 mce_cap parameter
3406has the same format as the MSR_IA32_MCG_CAP register. Supported
3407capabilities will have the corresponding bits set.
3408
34094.105 KVM_X86_SETUP_MCE
3410
3411Capability: KVM_CAP_MCE
3412Architectures: x86
3413Type: vcpu ioctl
3414Parameters: u64 mcg_cap (in)
3415Returns: 0 on success,
3416 -EFAULT if u64 mcg_cap cannot be read,
3417 -EINVAL if the requested number of banks is invalid,
3418 -EINVAL if requested MCE capability is not supported.
3419
3420Initializes MCE support for use. The u64 mcg_cap parameter
3421has the same format as the MSR_IA32_MCG_CAP register and
3422specifies which capabilities should be enabled. The maximum
3423supported number of error-reporting banks can be retrieved when
3424checking for KVM_CAP_MCE. The supported capabilities can be
3425retrieved with KVM_X86_GET_MCE_CAP_SUPPORTED.
3426
34274.106 KVM_X86_SET_MCE
3428
3429Capability: KVM_CAP_MCE
3430Architectures: x86
3431Type: vcpu ioctl
3432Parameters: struct kvm_x86_mce (in)
3433Returns: 0 on success,
3434 -EFAULT if struct kvm_x86_mce cannot be read,
3435 -EINVAL if the bank number is invalid,
3436 -EINVAL if VAL bit is not set in status field.
3437
3438Inject a machine check error (MCE) into the guest. The input
3439parameter is:
3440
3441struct kvm_x86_mce {
3442 __u64 status;
3443 __u64 addr;
3444 __u64 misc;
3445 __u64 mcg_status;
3446 __u8 bank;
3447 __u8 pad1[7];
3448 __u64 pad2[3];
3449};
3450
3451If the MCE being reported is an uncorrected error, KVM will
3452inject it as an MCE exception into the guest. If the guest
3453MCG_STATUS register reports that an MCE is in progress, KVM
3454causes an KVM_EXIT_SHUTDOWN vmexit.
3455
3456Otherwise, if the MCE is a corrected error, KVM will just
3457store it in the corresponding bank (provided this bank is
3458not holding a previously reported uncorrected error).
3459
4036e387
CI
34604.107 KVM_S390_GET_CMMA_BITS
3461
3462Capability: KVM_CAP_S390_CMMA_MIGRATION
3463Architectures: s390
3464Type: vm ioctl
3465Parameters: struct kvm_s390_cmma_log (in, out)
3466Returns: 0 on success, a negative value on error
3467
3468This ioctl is used to get the values of the CMMA bits on the s390
3469architecture. It is meant to be used in two scenarios:
3470- During live migration to save the CMMA values. Live migration needs
3471 to be enabled via the KVM_REQ_START_MIGRATION VM property.
3472- To non-destructively peek at the CMMA values, with the flag
3473 KVM_S390_CMMA_PEEK set.
3474
3475The ioctl takes parameters via the kvm_s390_cmma_log struct. The desired
3476values are written to a buffer whose location is indicated via the "values"
3477member in the kvm_s390_cmma_log struct. The values in the input struct are
3478also updated as needed.
3479Each CMMA value takes up one byte.
3480
3481struct kvm_s390_cmma_log {
3482 __u64 start_gfn;
3483 __u32 count;
3484 __u32 flags;
3485 union {
3486 __u64 remaining;
3487 __u64 mask;
3488 };
3489 __u64 values;
3490};
3491
3492start_gfn is the number of the first guest frame whose CMMA values are
3493to be retrieved,
3494
3495count is the length of the buffer in bytes,
3496
3497values points to the buffer where the result will be written to.
3498
3499If count is greater than KVM_S390_SKEYS_MAX, then it is considered to be
3500KVM_S390_SKEYS_MAX. KVM_S390_SKEYS_MAX is re-used for consistency with
3501other ioctls.
3502
3503The result is written in the buffer pointed to by the field values, and
3504the values of the input parameter are updated as follows.
3505
3506Depending on the flags, different actions are performed. The only
3507supported flag so far is KVM_S390_CMMA_PEEK.
3508
3509The default behaviour if KVM_S390_CMMA_PEEK is not set is:
3510start_gfn will indicate the first page frame whose CMMA bits were dirty.
3511It is not necessarily the same as the one passed as input, as clean pages
3512are skipped.
3513
3514count will indicate the number of bytes actually written in the buffer.
3515It can (and very often will) be smaller than the input value, since the
3516buffer is only filled until 16 bytes of clean values are found (which
3517are then not copied in the buffer). Since a CMMA migration block needs
3518the base address and the length, for a total of 16 bytes, we will send
3519back some clean data if there is some dirty data afterwards, as long as
3520the size of the clean data does not exceed the size of the header. This
3521allows to minimize the amount of data to be saved or transferred over
3522the network at the expense of more roundtrips to userspace. The next
3523invocation of the ioctl will skip over all the clean values, saving
3524potentially more than just the 16 bytes we found.
3525
3526If KVM_S390_CMMA_PEEK is set:
3527the existing storage attributes are read even when not in migration
3528mode, and no other action is performed;
3529
3530the output start_gfn will be equal to the input start_gfn,
3531
3532the output count will be equal to the input count, except if the end of
3533memory has been reached.
3534
3535In both cases:
3536the field "remaining" will indicate the total number of dirty CMMA values
3537still remaining, or 0 if KVM_S390_CMMA_PEEK is set and migration mode is
3538not enabled.
3539
3540mask is unused.
3541
3542values points to the userspace buffer where the result will be stored.
3543
3544This ioctl can fail with -ENOMEM if not enough memory can be allocated to
3545complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
3546KVM_S390_CMMA_PEEK is not set but migration mode was not enabled, with
3547-EFAULT if the userspace address is invalid or if no page table is
3548present for the addresses (e.g. when using hugepages).
3549
35504.108 KVM_S390_SET_CMMA_BITS
3551
3552Capability: KVM_CAP_S390_CMMA_MIGRATION
3553Architectures: s390
3554Type: vm ioctl
3555Parameters: struct kvm_s390_cmma_log (in)
3556Returns: 0 on success, a negative value on error
3557
3558This ioctl is used to set the values of the CMMA bits on the s390
3559architecture. It is meant to be used during live migration to restore
3560the CMMA values, but there are no restrictions on its use.
3561The ioctl takes parameters via the kvm_s390_cmma_values struct.
3562Each CMMA value takes up one byte.
3563
3564struct kvm_s390_cmma_log {
3565 __u64 start_gfn;
3566 __u32 count;
3567 __u32 flags;
3568 union {
3569 __u64 remaining;
3570 __u64 mask;
3571 };
3572 __u64 values;
3573};
3574
3575start_gfn indicates the starting guest frame number,
3576
3577count indicates how many values are to be considered in the buffer,
3578
3579flags is not used and must be 0.
3580
3581mask indicates which PGSTE bits are to be considered.
3582
3583remaining is not used.
3584
3585values points to the buffer in userspace where to store the values.
3586
3587This ioctl can fail with -ENOMEM if not enough memory can be allocated to
3588complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
3589the count field is too large (e.g. more than KVM_S390_CMMA_SIZE_MAX) or
3590if the flags field was not 0, with -EFAULT if the userspace address is
3591invalid, if invalid pages are written to (e.g. after the end of memory)
3592or if no page table is present for the addresses (e.g. when using
3593hugepages).
3594
7bf14c28 35954.109 KVM_PPC_GET_CPU_CHAR
3214d01f
PM
3596
3597Capability: KVM_CAP_PPC_GET_CPU_CHAR
3598Architectures: powerpc
3599Type: vm ioctl
3600Parameters: struct kvm_ppc_cpu_char (out)
3601Returns: 0 on successful completion
3602 -EFAULT if struct kvm_ppc_cpu_char cannot be written
3603
3604This ioctl gives userspace information about certain characteristics
3605of the CPU relating to speculative execution of instructions and
3606possible information leakage resulting from speculative execution (see
3607CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754). The information is
3608returned in struct kvm_ppc_cpu_char, which looks like this:
3609
3610struct kvm_ppc_cpu_char {
3611 __u64 character; /* characteristics of the CPU */
3612 __u64 behaviour; /* recommended software behaviour */
3613 __u64 character_mask; /* valid bits in character */
3614 __u64 behaviour_mask; /* valid bits in behaviour */
3615};
3616
3617For extensibility, the character_mask and behaviour_mask fields
3618indicate which bits of character and behaviour have been filled in by
3619the kernel. If the set of defined bits is extended in future then
3620userspace will be able to tell whether it is running on a kernel that
3621knows about the new bits.
3622
3623The character field describes attributes of the CPU which can help
3624with preventing inadvertent information disclosure - specifically,
3625whether there is an instruction to flash-invalidate the L1 data cache
3626(ori 30,30,0 or mtspr SPRN_TRIG2,rN), whether the L1 data cache is set
3627to a mode where entries can only be used by the thread that created
3628them, whether the bcctr[l] instruction prevents speculation, and
3629whether a speculation barrier instruction (ori 31,31,0) is provided.
3630
3631The behaviour field describes actions that software should take to
3632prevent inadvertent information disclosure, and thus describes which
3633vulnerabilities the hardware is subject to; specifically whether the
3634L1 data cache should be flushed when returning to user mode from the
3635kernel, and whether a speculation barrier should be placed between an
3636array bounds check and the array access.
3637
3638These fields use the same bit definitions as the new
3639H_GET_CPU_CHARACTERISTICS hypercall.
3640
7bf14c28 36414.110 KVM_MEMORY_ENCRYPT_OP
5acc5c06
BS
3642
3643Capability: basic
3644Architectures: x86
3645Type: system
3646Parameters: an opaque platform specific structure (in/out)
3647Returns: 0 on success; -1 on error
3648
3649If the platform supports creating encrypted VMs then this ioctl can be used
3650for issuing platform-specific memory encryption commands to manage those
3651encrypted VMs.
3652
3653Currently, this ioctl is used for issuing Secure Encrypted Virtualization
3654(SEV) commands on AMD Processors. The SEV commands are defined in
21e94aca 3655Documentation/virtual/kvm/amd-memory-encryption.rst.
5acc5c06 3656
7bf14c28 36574.111 KVM_MEMORY_ENCRYPT_REG_REGION
69eaedee
BS
3658
3659Capability: basic
3660Architectures: x86
3661Type: system
3662Parameters: struct kvm_enc_region (in)
3663Returns: 0 on success; -1 on error
3664
3665This ioctl can be used to register a guest memory region which may
3666contain encrypted data (e.g. guest RAM, SMRAM etc).
3667
3668It is used in the SEV-enabled guest. When encryption is enabled, a guest
3669memory region may contain encrypted data. The SEV memory encryption
3670engine uses a tweak such that two identical plaintext pages, each at
3671different locations will have differing ciphertexts. So swapping or
3672moving ciphertext of those pages will not result in plaintext being
3673swapped. So relocating (or migrating) physical backing pages for the SEV
3674guest will require some additional steps.
3675
3676Note: The current SEV key management spec does not provide commands to
3677swap or migrate (move) ciphertext pages. Hence, for now we pin the guest
3678memory region registered with the ioctl.
3679
7bf14c28 36804.112 KVM_MEMORY_ENCRYPT_UNREG_REGION
69eaedee
BS
3681
3682Capability: basic
3683Architectures: x86
3684Type: system
3685Parameters: struct kvm_enc_region (in)
3686Returns: 0 on success; -1 on error
3687
3688This ioctl can be used to unregister the guest memory region registered
3689with KVM_MEMORY_ENCRYPT_REG_REGION ioctl above.
3690
faeb7833
RK
36914.113 KVM_HYPERV_EVENTFD
3692
3693Capability: KVM_CAP_HYPERV_EVENTFD
3694Architectures: x86
3695Type: vm ioctl
3696Parameters: struct kvm_hyperv_eventfd (in)
3697
3698This ioctl (un)registers an eventfd to receive notifications from the guest on
3699the specified Hyper-V connection id through the SIGNAL_EVENT hypercall, without
3700causing a user exit. SIGNAL_EVENT hypercall with non-zero event flag number
3701(bits 24-31) still triggers a KVM_EXIT_HYPERV_HCALL user exit.
3702
3703struct kvm_hyperv_eventfd {
3704 __u32 conn_id;
3705 __s32 fd;
3706 __u32 flags;
3707 __u32 padding[3];
3708};
3709
3710The conn_id field should fit within 24 bits:
3711
3712#define KVM_HYPERV_CONN_ID_MASK 0x00ffffff
3713
3714The acceptable values for the flags field are:
3715
3716#define KVM_HYPERV_EVENTFD_DEASSIGN (1 << 0)
3717
3718Returns: 0 on success,
3719 -EINVAL if conn_id or flags is outside the allowed range
3720 -ENOENT on deassign if the conn_id isn't registered
3721 -EEXIST on assign if the conn_id is already registered
3722
8fcc4b59
JM
37234.114 KVM_GET_NESTED_STATE
3724
3725Capability: KVM_CAP_NESTED_STATE
3726Architectures: x86
3727Type: vcpu ioctl
3728Parameters: struct kvm_nested_state (in/out)
3729Returns: 0 on success, -1 on error
3730Errors:
3731 E2BIG: the total state size (including the fixed-size part of struct
3732 kvm_nested_state) exceeds the value of 'size' specified by
3733 the user; the size required will be written into size.
3734
3735struct kvm_nested_state {
3736 __u16 flags;
3737 __u16 format;
3738 __u32 size;
3739 union {
3740 struct kvm_vmx_nested_state vmx;
3741 struct kvm_svm_nested_state svm;
3742 __u8 pad[120];
3743 };
3744 __u8 data[0];
3745};
3746
3747#define KVM_STATE_NESTED_GUEST_MODE 0x00000001
3748#define KVM_STATE_NESTED_RUN_PENDING 0x00000002
3749
3750#define KVM_STATE_NESTED_SMM_GUEST_MODE 0x00000001
3751#define KVM_STATE_NESTED_SMM_VMXON 0x00000002
3752
3753struct kvm_vmx_nested_state {
3754 __u64 vmxon_pa;
3755 __u64 vmcs_pa;
3756
3757 struct {
3758 __u16 flags;
3759 } smm;
3760};
3761
3762This ioctl copies the vcpu's nested virtualization state from the kernel to
3763userspace.
3764
3765The maximum size of the state, including the fixed-size part of struct
3766kvm_nested_state, can be retrieved by passing KVM_CAP_NESTED_STATE to
3767the KVM_CHECK_EXTENSION ioctl().
3768
37694.115 KVM_SET_NESTED_STATE
3770
3771Capability: KVM_CAP_NESTED_STATE
3772Architectures: x86
3773Type: vcpu ioctl
3774Parameters: struct kvm_nested_state (in)
3775Returns: 0 on success, -1 on error
3776
3777This copies the vcpu's kvm_nested_state struct from userspace to the kernel. For
3778the definition of struct kvm_nested_state, see KVM_GET_NESTED_STATE.
7bf14c28 3779
9943450b
PH
37804.116 KVM_(UN)REGISTER_COALESCED_MMIO
3781
0804c849
PH
3782Capability: KVM_CAP_COALESCED_MMIO (for coalesced mmio)
3783 KVM_CAP_COALESCED_PIO (for coalesced pio)
9943450b
PH
3784Architectures: all
3785Type: vm ioctl
3786Parameters: struct kvm_coalesced_mmio_zone
3787Returns: 0 on success, < 0 on error
3788
0804c849 3789Coalesced I/O is a performance optimization that defers hardware
9943450b
PH
3790register write emulation so that userspace exits are avoided. It is
3791typically used to reduce the overhead of emulating frequently accessed
3792hardware registers.
3793
0804c849 3794When a hardware register is configured for coalesced I/O, write accesses
9943450b
PH
3795do not exit to userspace and their value is recorded in a ring buffer
3796that is shared between kernel and userspace.
3797
0804c849 3798Coalesced I/O is used if one or more write accesses to a hardware
9943450b
PH
3799register can be deferred until a read or a write to another hardware
3800register on the same device. This last access will cause a vmexit and
3801userspace will process accesses from the ring buffer before emulating
0804c849
PH
3802it. That will avoid exiting to userspace on repeated writes.
3803
3804Coalesced pio is based on coalesced mmio. There is little difference
3805between coalesced mmio and pio except that coalesced pio records accesses
3806to I/O ports.
9943450b 3807
2a31b9db
PB
38084.117 KVM_CLEAR_DIRTY_LOG (vm ioctl)
3809
3810Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
3811Architectures: x86
3812Type: vm ioctl
3813Parameters: struct kvm_dirty_log (in)
3814Returns: 0 on success, -1 on error
3815
3816/* for KVM_CLEAR_DIRTY_LOG */
3817struct kvm_clear_dirty_log {
3818 __u32 slot;
3819 __u32 num_pages;
3820 __u64 first_page;
3821 union {
3822 void __user *dirty_bitmap; /* one bit per page */
3823 __u64 padding;
3824 };
3825};
3826
3827The ioctl clears the dirty status of pages in a memory slot, according to
3828the bitmap that is passed in struct kvm_clear_dirty_log's dirty_bitmap
3829field. Bit 0 of the bitmap corresponds to page "first_page" in the
3830memory slot, and num_pages is the size in bits of the input bitmap.
3831Both first_page and num_pages must be a multiple of 64. For each bit
3832that is set in the input bitmap, the corresponding page is marked "clean"
3833in KVM's dirty bitmap, and dirty tracking is re-enabled for that page
3834(for example via write-protection, or by clearing the dirty bit in
3835a page table entry).
3836
3837If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 specifies
3838the address space for which you want to return the dirty bitmap.
3839They must be less than the value that KVM_CHECK_EXTENSION returns for
3840the KVM_CAP_MULTI_ADDRESS_SPACE capability.
3841
3842This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
3843is enabled; for more information, see the description of the capability.
3844However, it can always be used as long as KVM_CHECK_EXTENSION confirms
3845that KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is present.
3846
2bc39970
VK
38474.118 KVM_GET_SUPPORTED_HV_CPUID
3848
3849Capability: KVM_CAP_HYPERV_CPUID
3850Architectures: x86
3851Type: vcpu ioctl
3852Parameters: struct kvm_cpuid2 (in/out)
3853Returns: 0 on success, -1 on error
3854
3855struct kvm_cpuid2 {
3856 __u32 nent;
3857 __u32 padding;
3858 struct kvm_cpuid_entry2 entries[0];
3859};
3860
3861struct kvm_cpuid_entry2 {
3862 __u32 function;
3863 __u32 index;
3864 __u32 flags;
3865 __u32 eax;
3866 __u32 ebx;
3867 __u32 ecx;
3868 __u32 edx;
3869 __u32 padding[3];
3870};
3871
3872This ioctl returns x86 cpuid features leaves related to Hyper-V emulation in
3873KVM. Userspace can use the information returned by this ioctl to construct
3874cpuid information presented to guests consuming Hyper-V enlightenments (e.g.
3875Windows or Hyper-V guests).
3876
3877CPUID feature leaves returned by this ioctl are defined by Hyper-V Top Level
3878Functional Specification (TLFS). These leaves can't be obtained with
3879KVM_GET_SUPPORTED_CPUID ioctl because some of them intersect with KVM feature
3880leaves (0x40000000, 0x40000001).
3881
3882Currently, the following list of CPUID leaves are returned:
3883 HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS
3884 HYPERV_CPUID_INTERFACE
3885 HYPERV_CPUID_VERSION
3886 HYPERV_CPUID_FEATURES
3887 HYPERV_CPUID_ENLIGHTMENT_INFO
3888 HYPERV_CPUID_IMPLEMENT_LIMITS
3889 HYPERV_CPUID_NESTED_FEATURES
3890
3891HYPERV_CPUID_NESTED_FEATURES leaf is only exposed when Enlightened VMCS was
3892enabled on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS).
3893
3894Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure
3895with the 'nent' field indicating the number of entries in the variable-size
3896array 'entries'. If the number of entries is too low to describe all Hyper-V
3897feature leaves, an error (E2BIG) is returned. If the number is more or equal
3898to the number of Hyper-V feature leaves, the 'nent' field is adjusted to the
3899number of valid entries in the 'entries' array, which is then filled.
3900
3901'index' and 'flags' fields in 'struct kvm_cpuid_entry2' are currently reserved,
3902userspace should not expect to get any particular value there.
2a31b9db 3903
9c1b96e3 39045. The kvm_run structure
414fa985 3905------------------------
9c1b96e3
AK
3906
3907Application code obtains a pointer to the kvm_run structure by
3908mmap()ing a vcpu fd. From that point, application code can control
3909execution by changing fields in kvm_run prior to calling the KVM_RUN
3910ioctl, and obtain information about the reason KVM_RUN returned by
3911looking up structure members.
3912
3913struct kvm_run {
3914 /* in */
3915 __u8 request_interrupt_window;
3916
3917Request that KVM_RUN return when it becomes possible to inject external
3918interrupts into the guest. Useful in conjunction with KVM_INTERRUPT.
3919
460df4c1
PB
3920 __u8 immediate_exit;
3921
3922This field is polled once when KVM_RUN starts; if non-zero, KVM_RUN
3923exits immediately, returning -EINTR. In the common scenario where a
3924signal is used to "kick" a VCPU out of KVM_RUN, this field can be used
3925to avoid usage of KVM_SET_SIGNAL_MASK, which has worse scalability.
3926Rather than blocking the signal outside KVM_RUN, userspace can set up
3927a signal handler that sets run->immediate_exit to a non-zero value.
3928
3929This field is ignored if KVM_CAP_IMMEDIATE_EXIT is not available.
3930
3931 __u8 padding1[6];
9c1b96e3
AK
3932
3933 /* out */
3934 __u32 exit_reason;
3935
3936When KVM_RUN has returned successfully (return value 0), this informs
3937application code why KVM_RUN has returned. Allowable values for this
3938field are detailed below.
3939
3940 __u8 ready_for_interrupt_injection;
3941
3942If request_interrupt_window has been specified, this field indicates
3943an interrupt can be injected now with KVM_INTERRUPT.
3944
3945 __u8 if_flag;
3946
3947The value of the current interrupt flag. Only valid if in-kernel
3948local APIC is not used.
3949
f077825a
PB
3950 __u16 flags;
3951
3952More architecture-specific flags detailing state of the VCPU that may
3953affect the device's behavior. The only currently defined flag is
3954KVM_RUN_X86_SMM, which is valid on x86 machines and is set if the
3955VCPU is in system management mode.
9c1b96e3
AK
3956
3957 /* in (pre_kvm_run), out (post_kvm_run) */
3958 __u64 cr8;
3959
3960The value of the cr8 register. Only valid if in-kernel local APIC is
3961not used. Both input and output.
3962
3963 __u64 apic_base;
3964
3965The value of the APIC BASE msr. Only valid if in-kernel local
3966APIC is not used. Both input and output.
3967
3968 union {
3969 /* KVM_EXIT_UNKNOWN */
3970 struct {
3971 __u64 hardware_exit_reason;
3972 } hw;
3973
3974If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown
3975reasons. Further architecture-specific information is available in
3976hardware_exit_reason.
3977
3978 /* KVM_EXIT_FAIL_ENTRY */
3979 struct {
3980 __u64 hardware_entry_failure_reason;
3981 } fail_entry;
3982
3983If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due
3984to unknown reasons. Further architecture-specific information is
3985available in hardware_entry_failure_reason.
3986
3987 /* KVM_EXIT_EXCEPTION */
3988 struct {
3989 __u32 exception;
3990 __u32 error_code;
3991 } ex;
3992
3993Unused.
3994
3995 /* KVM_EXIT_IO */
3996 struct {
3997#define KVM_EXIT_IO_IN 0
3998#define KVM_EXIT_IO_OUT 1
3999 __u8 direction;
4000 __u8 size; /* bytes */
4001 __u16 port;
4002 __u32 count;
4003 __u64 data_offset; /* relative to kvm_run start */
4004 } io;
4005
2044892d 4006If exit_reason is KVM_EXIT_IO, then the vcpu has
9c1b96e3
AK
4007executed a port I/O instruction which could not be satisfied by kvm.
4008data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
4009where kvm expects application code to place the data for the next
2044892d 4010KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array.
9c1b96e3 4011
8ab30c15 4012 /* KVM_EXIT_DEBUG */
9c1b96e3
AK
4013 struct {
4014 struct kvm_debug_exit_arch arch;
4015 } debug;
4016
8ab30c15
AB
4017If the exit_reason is KVM_EXIT_DEBUG, then a vcpu is processing a debug event
4018for which architecture specific information is returned.
9c1b96e3
AK
4019
4020 /* KVM_EXIT_MMIO */
4021 struct {
4022 __u64 phys_addr;
4023 __u8 data[8];
4024 __u32 len;
4025 __u8 is_write;
4026 } mmio;
4027
2044892d 4028If exit_reason is KVM_EXIT_MMIO, then the vcpu has
9c1b96e3
AK
4029executed a memory-mapped I/O instruction which could not be satisfied
4030by kvm. The 'data' member contains the written data if 'is_write' is
4031true, and should be filled by application code otherwise.
4032
6acdb160
CD
4033The 'data' member contains, in its first 'len' bytes, the value as it would
4034appear if the VCPU performed a load or store of the appropriate width directly
4035to the byte array.
4036
cc568ead 4037NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and
ce91ddc4 4038 KVM_EXIT_EPR the corresponding
ad0a048b
AG
4039operations are complete (and guest state is consistent) only after userspace
4040has re-entered the kernel with KVM_RUN. The kernel side will first finish
67961344
MT
4041incomplete operations and then check for pending signals. Userspace
4042can re-enter the guest with an unmasked signal pending to complete
4043pending operations.
4044
9c1b96e3
AK
4045 /* KVM_EXIT_HYPERCALL */
4046 struct {
4047 __u64 nr;
4048 __u64 args[6];
4049 __u64 ret;
4050 __u32 longmode;
4051 __u32 pad;
4052 } hypercall;
4053
647dc49e
AK
4054Unused. This was once used for 'hypercall to userspace'. To implement
4055such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).
4056Note KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO.
9c1b96e3
AK
4057
4058 /* KVM_EXIT_TPR_ACCESS */
4059 struct {
4060 __u64 rip;
4061 __u32 is_write;
4062 __u32 pad;
4063 } tpr_access;
4064
4065To be documented (KVM_TPR_ACCESS_REPORTING).
4066
4067 /* KVM_EXIT_S390_SIEIC */
4068 struct {
4069 __u8 icptcode;
4070 __u64 mask; /* psw upper half */
4071 __u64 addr; /* psw lower half */
4072 __u16 ipa;
4073 __u32 ipb;
4074 } s390_sieic;
4075
4076s390 specific.
4077
4078 /* KVM_EXIT_S390_RESET */
4079#define KVM_S390_RESET_POR 1
4080#define KVM_S390_RESET_CLEAR 2
4081#define KVM_S390_RESET_SUBSYSTEM 4
4082#define KVM_S390_RESET_CPU_INIT 8
4083#define KVM_S390_RESET_IPL 16
4084 __u64 s390_reset_flags;
4085
4086s390 specific.
4087
e168bf8d
CO
4088 /* KVM_EXIT_S390_UCONTROL */
4089 struct {
4090 __u64 trans_exc_code;
4091 __u32 pgm_code;
4092 } s390_ucontrol;
4093
4094s390 specific. A page fault has occurred for a user controlled virtual
4095machine (KVM_VM_S390_UNCONTROL) on it's host page table that cannot be
4096resolved by the kernel.
4097The program code and the translation exception code that were placed
4098in the cpu's lowcore are presented here as defined by the z Architecture
4099Principles of Operation Book in the Chapter for Dynamic Address Translation
4100(DAT)
4101
9c1b96e3
AK
4102 /* KVM_EXIT_DCR */
4103 struct {
4104 __u32 dcrn;
4105 __u32 data;
4106 __u8 is_write;
4107 } dcr;
4108
ce91ddc4 4109Deprecated - was used for 440 KVM.
9c1b96e3 4110
ad0a048b
AG
4111 /* KVM_EXIT_OSI */
4112 struct {
4113 __u64 gprs[32];
4114 } osi;
4115
4116MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch
4117hypercalls and exit with this exit struct that contains all the guest gprs.
4118
4119If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall.
4120Userspace can now handle the hypercall and when it's done modify the gprs as
4121necessary. Upon guest entry all guest GPRs will then be replaced by the values
4122in this struct.
4123
de56a948
PM
4124 /* KVM_EXIT_PAPR_HCALL */
4125 struct {
4126 __u64 nr;
4127 __u64 ret;
4128 __u64 args[9];
4129 } papr_hcall;
4130
4131This is used on 64-bit PowerPC when emulating a pSeries partition,
4132e.g. with the 'pseries' machine type in qemu. It occurs when the
4133guest does a hypercall using the 'sc 1' instruction. The 'nr' field
4134contains the hypercall number (from the guest R3), and 'args' contains
4135the arguments (from the guest R4 - R12). Userspace should put the
4136return code in 'ret' and any extra returned values in args[].
4137The possible hypercalls are defined in the Power Architecture Platform
4138Requirements (PAPR) document available from www.power.org (free
4139developer registration required to access it).
4140
fa6b7fe9
CH
4141 /* KVM_EXIT_S390_TSCH */
4142 struct {
4143 __u16 subchannel_id;
4144 __u16 subchannel_nr;
4145 __u32 io_int_parm;
4146 __u32 io_int_word;
4147 __u32 ipb;
4148 __u8 dequeued;
4149 } s390_tsch;
4150
4151s390 specific. This exit occurs when KVM_CAP_S390_CSS_SUPPORT has been enabled
4152and TEST SUBCHANNEL was intercepted. If dequeued is set, a pending I/O
4153interrupt for the target subchannel has been dequeued and subchannel_id,
4154subchannel_nr, io_int_parm and io_int_word contain the parameters for that
4155interrupt. ipb is needed for instruction parameter decoding.
4156
1c810636
AG
4157 /* KVM_EXIT_EPR */
4158 struct {
4159 __u32 epr;
4160 } epr;
4161
4162On FSL BookE PowerPC chips, the interrupt controller has a fast patch
4163interrupt acknowledge path to the core. When the core successfully
4164delivers an interrupt, it automatically populates the EPR register with
4165the interrupt vector number and acknowledges the interrupt inside
4166the interrupt controller.
4167
4168In case the interrupt controller lives in user space, we need to do
4169the interrupt acknowledge cycle through it to fetch the next to be
4170delivered interrupt vector using this exit.
4171
4172It gets triggered whenever both KVM_CAP_PPC_EPR are enabled and an
4173external interrupt has just been delivered into the guest. User space
4174should put the acknowledged interrupt vector into the 'epr' field.
4175
8ad6b634
AP
4176 /* KVM_EXIT_SYSTEM_EVENT */
4177 struct {
4178#define KVM_SYSTEM_EVENT_SHUTDOWN 1
4179#define KVM_SYSTEM_EVENT_RESET 2
2ce79189 4180#define KVM_SYSTEM_EVENT_CRASH 3
8ad6b634
AP
4181 __u32 type;
4182 __u64 flags;
4183 } system_event;
4184
4185If exit_reason is KVM_EXIT_SYSTEM_EVENT then the vcpu has triggered
4186a system-level event using some architecture specific mechanism (hypercall
4187or some special instruction). In case of ARM/ARM64, this is triggered using
4188HVC instruction based PSCI call from the vcpu. The 'type' field describes
4189the system-level event type. The 'flags' field describes architecture
4190specific flags for the system-level event.
4191
cf5d3188
CD
4192Valid values for 'type' are:
4193 KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the
4194 VM. Userspace is not obliged to honour this, and if it does honour
4195 this does not need to destroy the VM synchronously (ie it may call
4196 KVM_RUN again before shutdown finally occurs).
4197 KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM.
4198 As with SHUTDOWN, userspace can choose to ignore the request, or
4199 to schedule the reset to occur in the future and may call KVM_RUN again.
2ce79189
AS
4200 KVM_SYSTEM_EVENT_CRASH -- the guest crash occurred and the guest
4201 has requested a crash condition maintenance. Userspace can choose
4202 to ignore the request, or to gather VM memory core dump and/or
4203 reset/shutdown of the VM.
cf5d3188 4204
7543a635
SR
4205 /* KVM_EXIT_IOAPIC_EOI */
4206 struct {
4207 __u8 vector;
4208 } eoi;
4209
4210Indicates that the VCPU's in-kernel local APIC received an EOI for a
4211level-triggered IOAPIC interrupt. This exit only triggers when the
4212IOAPIC is implemented in userspace (i.e. KVM_CAP_SPLIT_IRQCHIP is enabled);
4213the userspace IOAPIC should process the EOI and retrigger the interrupt if
4214it is still asserted. Vector is the LAPIC interrupt vector for which the
4215EOI was received.
4216
db397571
AS
4217 struct kvm_hyperv_exit {
4218#define KVM_EXIT_HYPERV_SYNIC 1
83326e43 4219#define KVM_EXIT_HYPERV_HCALL 2
db397571
AS
4220 __u32 type;
4221 union {
4222 struct {
4223 __u32 msr;
4224 __u64 control;
4225 __u64 evt_page;
4226 __u64 msg_page;
4227 } synic;
83326e43
AS
4228 struct {
4229 __u64 input;
4230 __u64 result;
4231 __u64 params[2];
4232 } hcall;
db397571
AS
4233 } u;
4234 };
4235 /* KVM_EXIT_HYPERV */
4236 struct kvm_hyperv_exit hyperv;
4237Indicates that the VCPU exits into userspace to process some tasks
4238related to Hyper-V emulation.
4239Valid values for 'type' are:
4240 KVM_EXIT_HYPERV_SYNIC -- synchronously notify user-space about
4241Hyper-V SynIC state change. Notification is used to remap SynIC
4242event/message pages and to enable/disable SynIC messages/events processing
4243in userspace.
4244
9c1b96e3
AK
4245 /* Fix the size of the union. */
4246 char padding[256];
4247 };
b9e5dc8d
CB
4248
4249 /*
4250 * shared registers between kvm and userspace.
4251 * kvm_valid_regs specifies the register classes set by the host
4252 * kvm_dirty_regs specified the register classes dirtied by userspace
4253 * struct kvm_sync_regs is architecture specific, as well as the
4254 * bits for kvm_valid_regs and kvm_dirty_regs
4255 */
4256 __u64 kvm_valid_regs;
4257 __u64 kvm_dirty_regs;
4258 union {
4259 struct kvm_sync_regs regs;
7b7e3952 4260 char padding[SYNC_REGS_SIZE_BYTES];
b9e5dc8d
CB
4261 } s;
4262
4263If KVM_CAP_SYNC_REGS is defined, these fields allow userspace to access
4264certain guest registers without having to call SET/GET_*REGS. Thus we can
4265avoid some system call overhead if userspace has to handle the exit.
4266Userspace can query the validity of the structure by checking
4267kvm_valid_regs for specific bits. These bits are architecture specific
4268and usually define the validity of a groups of registers. (e.g. one bit
4269 for general purpose registers)
4270
d8482c0d
DH
4271Please note that the kernel is allowed to use the kvm_run structure as the
4272primary storage for certain register types. Therefore, the kernel may use the
4273values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
4274
9c1b96e3 4275};
821246a5 4276
414fa985 4277
9c15bb1d 4278
699a0ea0
PM
42796. Capabilities that can be enabled on vCPUs
4280--------------------------------------------
821246a5 4281
0907c855
CH
4282There are certain capabilities that change the behavior of the virtual CPU or
4283the virtual machine when enabled. To enable them, please see section 4.37.
4284Below you can find a list of capabilities and what their effect on the vCPU or
4285the virtual machine is when enabling them.
821246a5
AG
4286
4287The following information is provided along with the description:
4288
4289 Architectures: which instruction set architectures provide this ioctl.
4290 x86 includes both i386 and x86_64.
4291
0907c855
CH
4292 Target: whether this is a per-vcpu or per-vm capability.
4293
821246a5
AG
4294 Parameters: what parameters are accepted by the capability.
4295
4296 Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
4297 are not detailed, but errors with specific meanings are.
4298
414fa985 4299
821246a5
AG
43006.1 KVM_CAP_PPC_OSI
4301
4302Architectures: ppc
0907c855 4303Target: vcpu
821246a5
AG
4304Parameters: none
4305Returns: 0 on success; -1 on error
4306
4307This capability enables interception of OSI hypercalls that otherwise would
4308be treated as normal system calls to be injected into the guest. OSI hypercalls
4309were invented by Mac-on-Linux to have a standardized communication mechanism
4310between the guest and the host.
4311
4312When this capability is enabled, KVM_EXIT_OSI can occur.
4313
414fa985 4314
821246a5
AG
43156.2 KVM_CAP_PPC_PAPR
4316
4317Architectures: ppc
0907c855 4318Target: vcpu
821246a5
AG
4319Parameters: none
4320Returns: 0 on success; -1 on error
4321
4322This capability enables interception of PAPR hypercalls. PAPR hypercalls are
4323done using the hypercall instruction "sc 1".
4324
4325It also sets the guest privilege level to "supervisor" mode. Usually the guest
4326runs in "hypervisor" privilege mode with a few missing features.
4327
4328In addition to the above, it changes the semantics of SDR1. In this mode, the
4329HTAB address part of SDR1 contains an HVA instead of a GPA, as PAPR keeps the
4330HTAB invisible to the guest.
4331
4332When this capability is enabled, KVM_EXIT_PAPR_HCALL can occur.
dc83b8bc 4333
414fa985 4334
dc83b8bc
SW
43356.3 KVM_CAP_SW_TLB
4336
4337Architectures: ppc
0907c855 4338Target: vcpu
dc83b8bc
SW
4339Parameters: args[0] is the address of a struct kvm_config_tlb
4340Returns: 0 on success; -1 on error
4341
4342struct kvm_config_tlb {
4343 __u64 params;
4344 __u64 array;
4345 __u32 mmu_type;
4346 __u32 array_len;
4347};
4348
4349Configures the virtual CPU's TLB array, establishing a shared memory area
4350between userspace and KVM. The "params" and "array" fields are userspace
4351addresses of mmu-type-specific data structures. The "array_len" field is an
4352safety mechanism, and should be set to the size in bytes of the memory that
4353userspace has reserved for the array. It must be at least the size dictated
4354by "mmu_type" and "params".
4355
4356While KVM_RUN is active, the shared region is under control of KVM. Its
4357contents are undefined, and any modification by userspace results in
4358boundedly undefined behavior.
4359
4360On return from KVM_RUN, the shared region will reflect the current state of
4361the guest's TLB. If userspace makes any changes, it must call KVM_DIRTY_TLB
4362to tell KVM which entries have been changed, prior to calling KVM_RUN again
4363on this vcpu.
4364
4365For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
4366 - The "params" field is of type "struct kvm_book3e_206_tlb_params".
4367 - The "array" field points to an array of type "struct
4368 kvm_book3e_206_tlb_entry".
4369 - The array consists of all entries in the first TLB, followed by all
4370 entries in the second TLB.
4371 - Within a TLB, entries are ordered first by increasing set number. Within a
4372 set, entries are ordered by way (increasing ESEL).
4373 - The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1)
4374 where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value.
4375 - The tsize field of mas1 shall be set to 4K on TLB0, even though the
4376 hardware ignores this value for TLB0.
fa6b7fe9
CH
4377
43786.4 KVM_CAP_S390_CSS_SUPPORT
4379
4380Architectures: s390
0907c855 4381Target: vcpu
fa6b7fe9
CH
4382Parameters: none
4383Returns: 0 on success; -1 on error
4384
4385This capability enables support for handling of channel I/O instructions.
4386
4387TEST PENDING INTERRUPTION and the interrupt portion of TEST SUBCHANNEL are
4388handled in-kernel, while the other I/O instructions are passed to userspace.
4389
4390When this capability is enabled, KVM_EXIT_S390_TSCH will occur on TEST
4391SUBCHANNEL intercepts.
1c810636 4392
0907c855
CH
4393Note that even though this capability is enabled per-vcpu, the complete
4394virtual machine is affected.
4395
1c810636
AG
43966.5 KVM_CAP_PPC_EPR
4397
4398Architectures: ppc
0907c855 4399Target: vcpu
1c810636
AG
4400Parameters: args[0] defines whether the proxy facility is active
4401Returns: 0 on success; -1 on error
4402
4403This capability enables or disables the delivery of interrupts through the
4404external proxy facility.
4405
4406When enabled (args[0] != 0), every time the guest gets an external interrupt
4407delivered, it automatically exits into user space with a KVM_EXIT_EPR exit
4408to receive the topmost interrupt vector.
4409
4410When disabled (args[0] == 0), behavior is as if this facility is unsupported.
4411
4412When this capability is enabled, KVM_EXIT_EPR can occur.
eb1e4f43
SW
4413
44146.6 KVM_CAP_IRQ_MPIC
4415
4416Architectures: ppc
4417Parameters: args[0] is the MPIC device fd
4418 args[1] is the MPIC CPU number for this vcpu
4419
4420This capability connects the vcpu to an in-kernel MPIC device.
5975a2e0
PM
4421
44226.7 KVM_CAP_IRQ_XICS
4423
4424Architectures: ppc
0907c855 4425Target: vcpu
5975a2e0
PM
4426Parameters: args[0] is the XICS device fd
4427 args[1] is the XICS CPU number (server ID) for this vcpu
4428
4429This capability connects the vcpu to an in-kernel XICS device.
8a366a4b
CH
4430
44316.8 KVM_CAP_S390_IRQCHIP
4432
4433Architectures: s390
4434Target: vm
4435Parameters: none
4436
4437This capability enables the in-kernel irqchip for s390. Please refer to
4438"4.24 KVM_CREATE_IRQCHIP" for details.
699a0ea0 4439
5fafd874
JH
44406.9 KVM_CAP_MIPS_FPU
4441
4442Architectures: mips
4443Target: vcpu
4444Parameters: args[0] is reserved for future use (should be 0).
4445
4446This capability allows the use of the host Floating Point Unit by the guest. It
4447allows the Config1.FP bit to be set to enable the FPU in the guest. Once this is
4448done the KVM_REG_MIPS_FPR_* and KVM_REG_MIPS_FCR_* registers can be accessed
4449(depending on the current guest FPU register mode), and the Status.FR,
4450Config5.FRE bits are accessible via the KVM API and also from the guest,
4451depending on them being supported by the FPU.
4452
d952bd07
JH
44536.10 KVM_CAP_MIPS_MSA
4454
4455Architectures: mips
4456Target: vcpu
4457Parameters: args[0] is reserved for future use (should be 0).
4458
4459This capability allows the use of the MIPS SIMD Architecture (MSA) by the guest.
4460It allows the Config3.MSAP bit to be set to enable the use of MSA by the guest.
4461Once this is done the KVM_REG_MIPS_VEC_* and KVM_REG_MIPS_MSA_* registers can be
4462accessed, and the Config5.MSAEn bit is accessible via the KVM API and also from
4463the guest.
4464
01643c51
KH
44656.74 KVM_CAP_SYNC_REGS
4466Architectures: s390, x86
4467Target: s390: always enabled, x86: vcpu
4468Parameters: none
4469Returns: x86: KVM_CHECK_EXTENSION returns a bit-array indicating which register
4470sets are supported (bitfields defined in arch/x86/include/uapi/asm/kvm.h).
4471
4472As described above in the kvm_sync_regs struct info in section 5 (kvm_run):
4473KVM_CAP_SYNC_REGS "allow[s] userspace to access certain guest registers
4474without having to call SET/GET_*REGS". This reduces overhead by eliminating
4475repeated ioctl calls for setting and/or getting register values. This is
4476particularly important when userspace is making synchronous guest state
4477modifications, e.g. when emulating and/or intercepting instructions in
4478userspace.
4479
4480For s390 specifics, please refer to the source code.
4481
4482For x86:
4483- the register sets to be copied out to kvm_run are selectable
4484 by userspace (rather that all sets being copied out for every exit).
4485- vcpu_events are available in addition to regs and sregs.
4486
4487For x86, the 'kvm_valid_regs' field of struct kvm_run is overloaded to
4488function as an input bit-array field set by userspace to indicate the
4489specific register sets to be copied out on the next exit.
4490
4491To indicate when userspace has modified values that should be copied into
4492the vCPU, the all architecture bitarray field, 'kvm_dirty_regs' must be set.
4493This is done using the same bitflags as for the 'kvm_valid_regs' field.
4494If the dirty bit is not set, then the register set values will not be copied
4495into the vCPU even if they've been modified.
4496
4497Unused bitfields in the bitarrays must be set to zero.
4498
4499struct kvm_sync_regs {
4500 struct kvm_regs regs;
4501 struct kvm_sregs sregs;
4502 struct kvm_vcpu_events events;
4503};
4504
699a0ea0
PM
45057. Capabilities that can be enabled on VMs
4506------------------------------------------
4507
4508There are certain capabilities that change the behavior of the virtual
4509machine when enabled. To enable them, please see section 4.37. Below
4510you can find a list of capabilities and what their effect on the VM
4511is when enabling them.
4512
4513The following information is provided along with the description:
4514
4515 Architectures: which instruction set architectures provide this ioctl.
4516 x86 includes both i386 and x86_64.
4517
4518 Parameters: what parameters are accepted by the capability.
4519
4520 Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
4521 are not detailed, but errors with specific meanings are.
4522
4523
45247.1 KVM_CAP_PPC_ENABLE_HCALL
4525
4526Architectures: ppc
4527Parameters: args[0] is the sPAPR hcall number
4528 args[1] is 0 to disable, 1 to enable in-kernel handling
4529
4530This capability controls whether individual sPAPR hypercalls (hcalls)
4531get handled by the kernel or not. Enabling or disabling in-kernel
4532handling of an hcall is effective across the VM. On creation, an
4533initial set of hcalls are enabled for in-kernel handling, which
4534consists of those hcalls for which in-kernel handlers were implemented
4535before this capability was implemented. If disabled, the kernel will
4536not to attempt to handle the hcall, but will always exit to userspace
4537to handle it. Note that it may not make sense to enable some and
4538disable others of a group of related hcalls, but KVM does not prevent
4539userspace from doing that.
ae2113a4
PM
4540
4541If the hcall number specified is not one that has an in-kernel
4542implementation, the KVM_ENABLE_CAP ioctl will fail with an EINVAL
4543error.
2444b352
DH
4544
45457.2 KVM_CAP_S390_USER_SIGP
4546
4547Architectures: s390
4548Parameters: none
4549
4550This capability controls which SIGP orders will be handled completely in user
4551space. With this capability enabled, all fast orders will be handled completely
4552in the kernel:
4553- SENSE
4554- SENSE RUNNING
4555- EXTERNAL CALL
4556- EMERGENCY SIGNAL
4557- CONDITIONAL EMERGENCY SIGNAL
4558
4559All other orders will be handled completely in user space.
4560
4561Only privileged operation exceptions will be checked for in the kernel (or even
4562in the hardware prior to interception). If this capability is not enabled, the
4563old way of handling SIGP orders is used (partially in kernel and user space).
68c55750
EF
4564
45657.3 KVM_CAP_S390_VECTOR_REGISTERS
4566
4567Architectures: s390
4568Parameters: none
4569Returns: 0 on success, negative value on error
4570
4571Allows use of the vector registers introduced with z13 processor, and
4572provides for the synchronization between host and user space. Will
4573return -EINVAL if the machine does not support vectors.
e44fc8c9
ET
4574
45757.4 KVM_CAP_S390_USER_STSI
4576
4577Architectures: s390
4578Parameters: none
4579
4580This capability allows post-handlers for the STSI instruction. After
4581initial handling in the kernel, KVM exits to user space with
4582KVM_EXIT_S390_STSI to allow user space to insert further data.
4583
4584Before exiting to userspace, kvm handlers should fill in s390_stsi field of
4585vcpu->run:
4586struct {
4587 __u64 addr;
4588 __u8 ar;
4589 __u8 reserved;
4590 __u8 fc;
4591 __u8 sel1;
4592 __u16 sel2;
4593} s390_stsi;
4594
4595@addr - guest address of STSI SYSIB
4596@fc - function code
4597@sel1 - selector 1
4598@sel2 - selector 2
4599@ar - access register number
4600
4601KVM handlers should exit to userspace with rc = -EREMOTE.
e928e9cb 4602
49df6397
SR
46037.5 KVM_CAP_SPLIT_IRQCHIP
4604
4605Architectures: x86
b053b2ae 4606Parameters: args[0] - number of routes reserved for userspace IOAPICs
49df6397
SR
4607Returns: 0 on success, -1 on error
4608
4609Create a local apic for each processor in the kernel. This can be used
4610instead of KVM_CREATE_IRQCHIP if the userspace VMM wishes to emulate the
4611IOAPIC and PIC (and also the PIT, even though this has to be enabled
4612separately).
4613
b053b2ae
SR
4614This capability also enables in kernel routing of interrupt requests;
4615when KVM_CAP_SPLIT_IRQCHIP only routes of KVM_IRQ_ROUTING_MSI type are
4616used in the IRQ routing table. The first args[0] MSI routes are reserved
4617for the IOAPIC pins. Whenever the LAPIC receives an EOI for these routes,
4618a KVM_EXIT_IOAPIC_EOI vmexit will be reported to userspace.
49df6397
SR
4619
4620Fails if VCPU has already been created, or if the irqchip is already in the
4621kernel (i.e. KVM_CREATE_IRQCHIP has already been called).
4622
051c87f7
DH
46237.6 KVM_CAP_S390_RI
4624
4625Architectures: s390
4626Parameters: none
4627
4628Allows use of runtime-instrumentation introduced with zEC12 processor.
4629Will return -EINVAL if the machine does not support runtime-instrumentation.
4630Will return -EBUSY if a VCPU has already been created.
e928e9cb 4631
37131313
RK
46327.7 KVM_CAP_X2APIC_API
4633
4634Architectures: x86
4635Parameters: args[0] - features that should be enabled
4636Returns: 0 on success, -EINVAL when args[0] contains invalid features
4637
4638Valid feature flags in args[0] are
4639
4640#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
c519265f 4641#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
37131313
RK
4642
4643Enabling KVM_X2APIC_API_USE_32BIT_IDS changes the behavior of
4644KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC,
4645allowing the use of 32-bit APIC IDs. See KVM_CAP_X2APIC_API in their
4646respective sections.
4647
c519265f
RK
4648KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK must be enabled for x2APIC to work
4649in logical mode or with more than 255 VCPUs. Otherwise, KVM treats 0xff
4650as a broadcast even in x2APIC mode in order to support physical x2APIC
4651without interrupt remapping. This is undesirable in logical mode,
4652where 0xff represents CPUs 0-7 in cluster 0.
37131313 4653
6502a34c
DH
46547.8 KVM_CAP_S390_USER_INSTR0
4655
4656Architectures: s390
4657Parameters: none
4658
4659With this capability enabled, all illegal instructions 0x0000 (2 bytes) will
4660be intercepted and forwarded to user space. User space can use this
4661mechanism e.g. to realize 2-byte software breakpoints. The kernel will
4662not inject an operating exception for these instructions, user space has
4663to take care of that.
4664
4665This capability can be enabled dynamically even if VCPUs were already
4666created and are running.
37131313 4667
4e0b1ab7
FZ
46687.9 KVM_CAP_S390_GS
4669
4670Architectures: s390
4671Parameters: none
4672Returns: 0 on success; -EINVAL if the machine does not support
4673 guarded storage; -EBUSY if a VCPU has already been created.
4674
4675Allows use of guarded storage for the KVM guest.
4676
47a4693e
YMZ
46777.10 KVM_CAP_S390_AIS
4678
4679Architectures: s390
4680Parameters: none
4681
4682Allow use of adapter-interruption suppression.
4683Returns: 0 on success; -EBUSY if a VCPU has already been created.
4684
3c313524
PM
46857.11 KVM_CAP_PPC_SMT
4686
4687Architectures: ppc
4688Parameters: vsmt_mode, flags
4689
4690Enabling this capability on a VM provides userspace with a way to set
4691the desired virtual SMT mode (i.e. the number of virtual CPUs per
4692virtual core). The virtual SMT mode, vsmt_mode, must be a power of 2
4693between 1 and 8. On POWER8, vsmt_mode must also be no greater than
4694the number of threads per subcore for the host. Currently flags must
4695be 0. A successful call to enable this capability will result in
4696vsmt_mode being returned when the KVM_CAP_PPC_SMT capability is
4697subsequently queried for the VM. This capability is only supported by
4698HV KVM, and can only be set before any VCPUs have been created.
2ed4f9dd
PM
4699The KVM_CAP_PPC_SMT_POSSIBLE capability indicates which virtual SMT
4700modes are available.
3c313524 4701
134764ed
AP
47027.12 KVM_CAP_PPC_FWNMI
4703
4704Architectures: ppc
4705Parameters: none
4706
4707With this capability a machine check exception in the guest address
4708space will cause KVM to exit the guest with NMI exit reason. This
4709enables QEMU to build error log and branch to guest kernel registered
4710machine check handling routine. Without this capability KVM will
4711branch to guests' 0x200 interrupt vector.
4712
4d5422ce
WL
47137.13 KVM_CAP_X86_DISABLE_EXITS
4714
4715Architectures: x86
4716Parameters: args[0] defines which exits are disabled
4717Returns: 0 on success, -EINVAL when args[0] contains invalid exits
4718
4719Valid bits in args[0] are
4720
4721#define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0)
caa057a2 4722#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
4d5422ce
WL
4723
4724Enabling this capability on a VM provides userspace with a way to no
4725longer intercept some instructions for improved latency in some
4726workloads, and is suggested when vCPUs are associated to dedicated
4727physical CPUs. More bits can be added in the future; userspace can
4728just pass the KVM_CHECK_EXTENSION result to KVM_ENABLE_CAP to disable
4729all such vmexits.
4730
caa057a2 4731Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
4d5422ce 4732
a4499382
JF
47337.14 KVM_CAP_S390_HPAGE_1M
4734
4735Architectures: s390
4736Parameters: none
4737Returns: 0 on success, -EINVAL if hpage module parameter was not set
40ebdb8e
JF
4738 or cmma is enabled, or the VM has the KVM_VM_S390_UCONTROL
4739 flag set
a4499382
JF
4740
4741With this capability the KVM support for memory backing with 1m pages
4742through hugetlbfs can be enabled for a VM. After the capability is
4743enabled, cmma can't be enabled anymore and pfmfi and the storage key
4744interpretation are disabled. If cmma has already been enabled or the
4745hpage module parameter is not set to 1, -EINVAL is returned.
4746
4747While it is generally possible to create a huge page backed VM without
4748this capability, the VM will not be able to run.
4749
c4f55198 47507.15 KVM_CAP_MSR_PLATFORM_INFO
6fbbde9a
DS
4751
4752Architectures: x86
4753Parameters: args[0] whether feature should be enabled or not
4754
4755With this capability, a guest may read the MSR_PLATFORM_INFO MSR. Otherwise,
4756a #GP would be raised when the guest tries to access. Currently, this
4757capability does not enable write permissions of this MSR for the guest.
4758
aa069a99
PM
47597.16 KVM_CAP_PPC_NESTED_HV
4760
4761Architectures: ppc
4762Parameters: none
4763Returns: 0 on success, -EINVAL when the implementation doesn't support
4764 nested-HV virtualization.
4765
4766HV-KVM on POWER9 and later systems allows for "nested-HV"
4767virtualization, which provides a way for a guest VM to run guests that
4768can run using the CPU's supervisor mode (privileged non-hypervisor
4769state). Enabling this capability on a VM depends on the CPU having
4770the necessary functionality and on the facility being enabled with a
4771kvm-hv module parameter.
4772
c4f55198
JM
47737.17 KVM_CAP_EXCEPTION_PAYLOAD
4774
4775Architectures: x86
4776Parameters: args[0] whether feature should be enabled or not
4777
4778With this capability enabled, CR2 will not be modified prior to the
4779emulated VM-exit when L1 intercepts a #PF exception that occurs in
4780L2. Similarly, for kvm-intel only, DR6 will not be modified prior to
4781the emulated VM-exit when L1 intercepts a #DB exception that occurs in
4782L2. As a result, when KVM_GET_VCPU_EVENTS reports a pending #PF (or
4783#DB) exception for L2, exception.has_payload will be set and the
4784faulting address (or the new DR6 bits*) will be reported in the
4785exception_payload field. Similarly, when userspace injects a #PF (or
4786#DB) into L2 using KVM_SET_VCPU_EVENTS, it is expected to set
4787exception.has_payload and to put the faulting address (or the new DR6
4788bits*) in the exception_payload field.
4789
4790This capability also enables exception.pending in struct
4791kvm_vcpu_events, which allows userspace to distinguish between pending
4792and injected exceptions.
4793
4794
4795* For the new DR6 bits, note that bit 16 is set iff the #DB exception
4796 will clear DR6.RTM.
4797
2a31b9db
PB
47987.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
4799
4800Architectures: all
4801Parameters: args[0] whether feature should be enabled or not
4802
4803With this capability enabled, KVM_GET_DIRTY_LOG will not automatically
4804clear and write-protect all pages that are returned as dirty.
4805Rather, userspace will have to do this operation separately using
4806KVM_CLEAR_DIRTY_LOG.
4807
4808At the cost of a slightly more complicated operation, this provides better
4809scalability and responsiveness for two reasons. First,
4810KVM_CLEAR_DIRTY_LOG ioctl can operate on a 64-page granularity rather
4811than requiring to sync a full memslot; this ensures that KVM does not
4812take spinlocks for an extended period of time. Second, in some cases a
4813large amount of time can pass between a call to KVM_GET_DIRTY_LOG and
4814userspace actually using the data in the page. Pages can be modified
4815during this time, which is inefficint for both the guest and userspace:
4816the guest will incur a higher penalty due to write protection faults,
4817while userspace can see false reports of dirty pages. Manual reprotection
4818helps reducing this time, improving guest performance and reducing the
4819number of dirty log false positives.
4820
4821
e928e9cb
ME
48228. Other capabilities.
4823----------------------
4824
4825This section lists capabilities that give information about other
4826features of the KVM implementation.
4827
48288.1 KVM_CAP_PPC_HWRNG
4829
4830Architectures: ppc
4831
4832This capability, if KVM_CHECK_EXTENSION indicates that it is
4833available, means that that the kernel has an implementation of the
4834H_RANDOM hypercall backed by a hardware random-number generator.
4835If present, the kernel H_RANDOM handler can be enabled for guest use
4836with the KVM_CAP_PPC_ENABLE_HCALL capability.
5c919412
AS
4837
48388.2 KVM_CAP_HYPERV_SYNIC
4839
4840Architectures: x86
4841This capability, if KVM_CHECK_EXTENSION indicates that it is
4842available, means that that the kernel has an implementation of the
4843Hyper-V Synthetic interrupt controller(SynIC). Hyper-V SynIC is
4844used to support Windows Hyper-V based guest paravirt drivers(VMBus).
4845
4846In order to use SynIC, it has to be activated by setting this
4847capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
4848will disable the use of APIC hardware virtualization even if supported
4849by the CPU, as it's incompatible with SynIC auto-EOI behavior.
c9270132
PM
4850
48518.3 KVM_CAP_PPC_RADIX_MMU
4852
4853Architectures: ppc
4854
4855This capability, if KVM_CHECK_EXTENSION indicates that it is
4856available, means that that the kernel can support guests using the
4857radix MMU defined in Power ISA V3.00 (as implemented in the POWER9
4858processor).
4859
48608.4 KVM_CAP_PPC_HASH_MMU_V3
4861
4862Architectures: ppc
4863
4864This capability, if KVM_CHECK_EXTENSION indicates that it is
4865available, means that that the kernel can support guests using the
4866hashed page table MMU defined in Power ISA V3.00 (as implemented in
4867the POWER9 processor), including in-memory segment tables.
a8a3c426
JH
4868
48698.5 KVM_CAP_MIPS_VZ
4870
4871Architectures: mips
4872
4873This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that
4874it is available, means that full hardware assisted virtualization capabilities
4875of the hardware are available for use through KVM. An appropriate
4876KVM_VM_MIPS_* type must be passed to KVM_CREATE_VM to create a VM which
4877utilises it.
4878
4879If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is
4880available, it means that the VM is using full hardware assisted virtualization
4881capabilities of the hardware. This is useful to check after creating a VM with
4882KVM_VM_MIPS_DEFAULT.
4883
4884The value returned by KVM_CHECK_EXTENSION should be compared against known
4885values (see below). All other values are reserved. This is to allow for the
4886possibility of other hardware assisted virtualization implementations which
4887may be incompatible with the MIPS VZ ASE.
4888
4889 0: The trap & emulate implementation is in use to run guest code in user
4890 mode. Guest virtual memory segments are rearranged to fit the guest in the
4891 user mode address space.
4892
4893 1: The MIPS VZ ASE is in use, providing full hardware assisted
4894 virtualization, including standard guest virtual memory segments.
4895
48968.6 KVM_CAP_MIPS_TE
4897
4898Architectures: mips
4899
4900This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that
4901it is available, means that the trap & emulate implementation is available to
4902run guest code in user mode, even if KVM_CAP_MIPS_VZ indicates that hardware
4903assisted virtualisation is also available. KVM_VM_MIPS_TE (0) must be passed
4904to KVM_CREATE_VM to create a VM which utilises it.
4905
4906If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is
4907available, it means that the VM is using trap & emulate.
578fd61d
JH
4908
49098.7 KVM_CAP_MIPS_64BIT
4910
4911Architectures: mips
4912
4913This capability indicates the supported architecture type of the guest, i.e. the
4914supported register and address width.
4915
4916The values returned when this capability is checked by KVM_CHECK_EXTENSION on a
4917kvm VM handle correspond roughly to the CP0_Config.AT register field, and should
4918be checked specifically against known values (see below). All other values are
4919reserved.
4920
4921 0: MIPS32 or microMIPS32.
4922 Both registers and addresses are 32-bits wide.
4923 It will only be possible to run 32-bit guest code.
4924
4925 1: MIPS64 or microMIPS64 with access only to 32-bit compatibility segments.
4926 Registers are 64-bits wide, but addresses are 32-bits wide.
4927 64-bit guest code may run but cannot access MIPS64 memory segments.
4928 It will also be possible to run 32-bit guest code.
4929
4930 2: MIPS64 or microMIPS64 with access to all address segments.
4931 Both registers and addresses are 64-bits wide.
4932 It will be possible to run 64-bit or 32-bit guest code.
668fffa3 4933
c24a7be2 49348.9 KVM_CAP_ARM_USER_IRQ
3fe17e68
AG
4935
4936Architectures: arm, arm64
4937This capability, if KVM_CHECK_EXTENSION indicates that it is available, means
4938that if userspace creates a VM without an in-kernel interrupt controller, it
4939will be notified of changes to the output level of in-kernel emulated devices,
4940which can generate virtual interrupts, presented to the VM.
4941For such VMs, on every return to userspace, the kernel
4942updates the vcpu's run->s.regs.device_irq_level field to represent the actual
4943output level of the device.
4944
4945Whenever kvm detects a change in the device output level, kvm guarantees at
4946least one return to userspace before running the VM. This exit could either
4947be a KVM_EXIT_INTR or any other exit event, like KVM_EXIT_MMIO. This way,
4948userspace can always sample the device output level and re-compute the state of
4949the userspace interrupt controller. Userspace should always check the state
4950of run->s.regs.device_irq_level on every kvm exit.
4951The value in run->s.regs.device_irq_level can represent both level and edge
4952triggered interrupt signals, depending on the device. Edge triggered interrupt
4953signals will exit to userspace with the bit in run->s.regs.device_irq_level
4954set exactly once per edge signal.
4955
4956The field run->s.regs.device_irq_level is available independent of
4957run->kvm_valid_regs or run->kvm_dirty_regs bits.
4958
4959If KVM_CAP_ARM_USER_IRQ is supported, the KVM_CHECK_EXTENSION ioctl returns a
4960number larger than 0 indicating the version of this capability is implemented
4961and thereby which bits in in run->s.regs.device_irq_level can signal values.
4962
4963Currently the following bits are defined for the device_irq_level bitmap:
4964
4965 KVM_CAP_ARM_USER_IRQ >= 1:
4966
4967 KVM_ARM_DEV_EL1_VTIMER - EL1 virtual timer
4968 KVM_ARM_DEV_EL1_PTIMER - EL1 physical timer
4969 KVM_ARM_DEV_PMU - ARM PMU overflow interrupt signal
4970
4971Future versions of kvm may implement additional events. These will get
4972indicated by returning a higher number from KVM_CHECK_EXTENSION and will be
4973listed above.
2ed4f9dd
PM
4974
49758.10 KVM_CAP_PPC_SMT_POSSIBLE
4976
4977Architectures: ppc
4978
4979Querying this capability returns a bitmap indicating the possible
4980virtual SMT modes that can be set using KVM_CAP_PPC_SMT. If bit N
4981(counting from the right) is set, then a virtual SMT mode of 2^N is
4982available.
efc479e6
RK
4983
49848.11 KVM_CAP_HYPERV_SYNIC2
4985
4986Architectures: x86
4987
4988This capability enables a newer version of Hyper-V Synthetic interrupt
4989controller (SynIC). The only difference with KVM_CAP_HYPERV_SYNIC is that KVM
4990doesn't clear SynIC message and event flags pages when they are enabled by
4991writing to the respective MSRs.
d3457c87
RK
4992
49938.12 KVM_CAP_HYPERV_VP_INDEX
4994
4995Architectures: x86
4996
4997This capability indicates that userspace can load HV_X64_MSR_VP_INDEX msr. Its
4998value is used to denote the target vcpu for a SynIC interrupt. For
4999compatibilty, KVM initializes this msr to KVM's internal vcpu index. When this
5000capability is absent, userspace can still query this msr's value.
da9a1446
CB
5001
50028.13 KVM_CAP_S390_AIS_MIGRATION
5003
5004Architectures: s390
5005Parameters: none
5006
5007This capability indicates if the flic device will be able to get/set the
5008AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows
5009to discover this without having to create a flic device.
5c2b4d5b
CB
5010
50118.14 KVM_CAP_S390_PSW
5012
5013Architectures: s390
5014
5015This capability indicates that the PSW is exposed via the kvm_run structure.
5016
50178.15 KVM_CAP_S390_GMAP
5018
5019Architectures: s390
5020
5021This capability indicates that the user space memory used as guest mapping can
5022be anywhere in the user memory address space, as long as the memory slots are
5023aligned and sized to a segment (1MB) boundary.
5024
50258.16 KVM_CAP_S390_COW
5026
5027Architectures: s390
5028
5029This capability indicates that the user space memory used as guest mapping can
5030use copy-on-write semantics as well as dirty pages tracking via read-only page
5031tables.
5032
50338.17 KVM_CAP_S390_BPB
5034
5035Architectures: s390
5036
5037This capability indicates that kvm will implement the interfaces to handle
5038reset, migration and nested KVM for branch prediction blocking. The stfle
5039facility 82 should not be provided to the guest without this capability.
c1aea919 5040
2ddc6498 50418.18 KVM_CAP_HYPERV_TLBFLUSH
c1aea919
VK
5042
5043Architectures: x86
5044
5045This capability indicates that KVM supports paravirtualized Hyper-V TLB Flush
5046hypercalls:
5047HvFlushVirtualAddressSpace, HvFlushVirtualAddressSpaceEx,
5048HvFlushVirtualAddressList, HvFlushVirtualAddressListEx.
be26b3a7 5049
688e0581 50508.19 KVM_CAP_ARM_INJECT_SERROR_ESR
be26b3a7
DG
5051
5052Architectures: arm, arm64
5053
5054This capability indicates that userspace can specify (via the
5055KVM_SET_VCPU_EVENTS ioctl) the syndrome value reported to the guest when it
5056takes a virtual SError interrupt exception.
5057If KVM advertises this capability, userspace can only specify the ISS field for
5058the ESR syndrome. Other parts of the ESR, such as the EC are generated by the
5059CPU when the exception is taken. If this virtual SError is taken to EL1 using
5060AArch64, this value will be reported in the ISS field of ESR_ELx.
5061
5062See KVM_CAP_VCPU_EVENTS for more details.
214ff83d
VK
50638.20 KVM_CAP_HYPERV_SEND_IPI
5064
5065Architectures: x86
5066
5067This capability indicates that KVM supports paravirtualized Hyper-V IPI send
5068hypercalls:
5069HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx.