]> git.proxmox.com Git - mirror_ubuntu-jammy-kernel.git/blame - Documentation/virtual/kvm/api.txt
KVM: arm/arm64: Clean up vcpu finalization function parameter naming
[mirror_ubuntu-jammy-kernel.git] / Documentation / virtual / kvm / api.txt
CommitLineData
9c1b96e3
AK
1The Definitive KVM (Kernel-based Virtual Machine) API Documentation
2===================================================================
3
41. General description
414fa985 5----------------------
9c1b96e3
AK
6
7The kvm API is a set of ioctls that are issued to control various aspects
8of a virtual machine. The ioctls belong to three classes
9
10 - System ioctls: These query and set global attributes which affect the
11 whole kvm subsystem. In addition a system ioctl is used to create
12 virtual machines
13
14 - VM ioctls: These query and set attributes that affect an entire virtual
15 machine, for example memory layout. In addition a VM ioctl is used to
16 create virtual cpus (vcpus).
17
18 Only run VM ioctls from the same process (address space) that was used
19 to create the VM.
20
21 - vcpu ioctls: These query and set attributes that control the operation
22 of a single virtual cpu.
23
24 Only run vcpu ioctls from the same thread that was used to create the
25 vcpu.
26
414fa985 27
2044892d 282. File descriptors
414fa985 29-------------------
9c1b96e3
AK
30
31The kvm API is centered around file descriptors. An initial
32open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
33can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this
2044892d 34handle will create a VM file descriptor which can be used to issue VM
9c1b96e3
AK
35ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu
36and return a file descriptor pointing to it. Finally, ioctls on a vcpu
37fd can be used to control the vcpu, including the important task of
38actually running guest code.
39
40In general file descriptors can be migrated among processes by means
41of fork() and the SCM_RIGHTS facility of unix domain socket. These
42kinds of tricks are explicitly not supported by kvm. While they will
43not cause harm to the host, their actual behavior is not guaranteed by
44the API. The only supported use is one virtual machine per process,
45and one vcpu per thread.
46
eca6be56
SC
47
48It is important to note that althought VM ioctls may only be issued from
49the process that created the VM, a VM's lifecycle is associated with its
50file descriptor, not its creator (process). In other words, the VM and
51its resources, *including the associated address space*, are not freed
52until the last reference to the VM's file descriptor has been released.
53For example, if fork() is issued after ioctl(KVM_CREATE_VM), the VM will
54not be freed until both the parent (original) process and its child have
55put their references to the VM's file descriptor.
56
57Because a VM's resources are not freed until the last reference to its
58file descriptor is released, creating additional references to a VM via
59via fork(), dup(), etc... without careful consideration is strongly
60discouraged and may have unwanted side effects, e.g. memory allocated
61by and on behalf of the VM's process may not be freed/unaccounted when
62the VM is shut down.
63
414fa985 64
9c1b96e3 653. Extensions
414fa985 66-------------
9c1b96e3
AK
67
68As of Linux 2.6.22, the KVM ABI has been stabilized: no backward
69incompatible change are allowed. However, there is an extension
70facility that allows backward-compatible extensions to the API to be
71queried and used.
72
c9f3f2d8 73The extension mechanism is not based on the Linux version number.
9c1b96e3
AK
74Instead, kvm defines extension identifiers and a facility to query
75whether a particular extension identifier is available. If it is, a
76set of ioctls is available for application use.
77
414fa985 78
9c1b96e3 794. API description
414fa985 80------------------
9c1b96e3
AK
81
82This section describes ioctls that can be used to control kvm guests.
83For each ioctl, the following information is provided along with a
84description:
85
86 Capability: which KVM extension provides this ioctl. Can be 'basic',
87 which means that is will be provided by any kernel that supports
7f05db6a 88 API version 12 (see section 4.1), a KVM_CAP_xyz constant, which
9c1b96e3 89 means availability needs to be checked with KVM_CHECK_EXTENSION
7f05db6a
MT
90 (see section 4.4), or 'none' which means that while not all kernels
91 support this ioctl, there's no capability bit to check its
92 availability: for kernels that don't support the ioctl,
93 the ioctl returns -ENOTTY.
9c1b96e3
AK
94
95 Architectures: which instruction set architectures provide this ioctl.
96 x86 includes both i386 and x86_64.
97
98 Type: system, vm, or vcpu.
99
100 Parameters: what parameters are accepted by the ioctl.
101
102 Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
103 are not detailed, but errors with specific meanings are.
104
414fa985 105
9c1b96e3
AK
1064.1 KVM_GET_API_VERSION
107
108Capability: basic
109Architectures: all
110Type: system ioctl
111Parameters: none
112Returns: the constant KVM_API_VERSION (=12)
113
114This identifies the API version as the stable kvm API. It is not
115expected that this number will change. However, Linux 2.6.20 and
1162.6.21 report earlier versions; these are not documented and not
117supported. Applications should refuse to run if KVM_GET_API_VERSION
118returns a value other than 12. If this check passes, all ioctls
119described as 'basic' will be available.
120
414fa985 121
9c1b96e3
AK
1224.2 KVM_CREATE_VM
123
124Capability: basic
125Architectures: all
126Type: system ioctl
e08b9637 127Parameters: machine type identifier (KVM_VM_*)
9c1b96e3
AK
128Returns: a VM fd that can be used to control the new virtual machine.
129
bcb85c88 130The new VM has no virtual cpus and no memory.
a8a3c426 131You probably want to use 0 as machine type.
e08b9637
CO
132
133In order to create user controlled virtual machines on S390, check
134KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
135privileged user (CAP_SYS_ADMIN).
9c1b96e3 136
a8a3c426
JH
137To use hardware assisted virtualization on MIPS (VZ ASE) rather than
138the default trap & emulate implementation (which changes the virtual
139memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the
140flag KVM_VM_MIPS_VZ.
141
414fa985 142
233a7cb2
SP
143On arm64, the physical address size for a VM (IPA Size limit) is limited
144to 40bits by default. The limit can be configured if the host supports the
145extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
146KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type
147identifier, where IPA_Bits is the maximum width of any physical
148address used by the VM. The IPA_Bits is encoded in bits[7-0] of the
149machine type identifier.
150
151e.g, to configure a guest to use 48bit physical address size :
152
153 vm_fd = ioctl(dev_fd, KVM_CREATE_VM, KVM_VM_TYPE_ARM_IPA_SIZE(48));
154
155The requested size (IPA_Bits) must be :
156 0 - Implies default size, 40bits (for backward compatibility)
157
158 or
159
160 N - Implies N bits, where N is a positive integer such that,
161 32 <= N <= Host_IPA_Limit
162
163Host_IPA_Limit is the maximum possible value for IPA_Bits on the host and
164is dependent on the CPU capability and the kernel configuration. The limit can
165be retrieved using KVM_CAP_ARM_VM_IPA_SIZE of the KVM_CHECK_EXTENSION
166ioctl() at run-time.
167
168Please note that configuring the IPA size does not affect the capability
169exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects
170size of the address translated by the stage2 level (guest physical to
171host physical address translations).
172
173
801e459a 1744.3 KVM_GET_MSR_INDEX_LIST, KVM_GET_MSR_FEATURE_INDEX_LIST
9c1b96e3 175
801e459a 176Capability: basic, KVM_CAP_GET_MSR_FEATURES for KVM_GET_MSR_FEATURE_INDEX_LIST
9c1b96e3 177Architectures: x86
801e459a 178Type: system ioctl
9c1b96e3
AK
179Parameters: struct kvm_msr_list (in/out)
180Returns: 0 on success; -1 on error
181Errors:
801e459a 182 EFAULT: the msr index list cannot be read from or written to
9c1b96e3
AK
183 E2BIG: the msr index list is to be to fit in the array specified by
184 the user.
185
186struct kvm_msr_list {
187 __u32 nmsrs; /* number of msrs in entries */
188 __u32 indices[0];
189};
190
801e459a
TL
191The user fills in the size of the indices array in nmsrs, and in return
192kvm adjusts nmsrs to reflect the actual number of msrs and fills in the
193indices array with their numbers.
194
195KVM_GET_MSR_INDEX_LIST returns the guest msrs that are supported. The list
196varies by kvm version and host processor, but does not change otherwise.
9c1b96e3 197
2e2602ca
AK
198Note: if kvm indicates supports MCE (KVM_CAP_MCE), then the MCE bank MSRs are
199not returned in the MSR list, as different vcpus can have a different number
200of banks, as set via the KVM_X86_SETUP_MCE ioctl.
201
801e459a
TL
202KVM_GET_MSR_FEATURE_INDEX_LIST returns the list of MSRs that can be passed
203to the KVM_GET_MSRS system ioctl. This lets userspace probe host capabilities
204and processor features that are exposed via MSRs (e.g., VMX capabilities).
205This list also varies by kvm version and host processor, but does not change
206otherwise.
207
414fa985 208
9c1b96e3
AK
2094.4 KVM_CHECK_EXTENSION
210
92b591a4 211Capability: basic, KVM_CAP_CHECK_EXTENSION_VM for vm ioctl
9c1b96e3 212Architectures: all
92b591a4 213Type: system ioctl, vm ioctl
9c1b96e3
AK
214Parameters: extension identifier (KVM_CAP_*)
215Returns: 0 if unsupported; 1 (or some other positive integer) if supported
216
217The API allows the application to query about extensions to the core
218kvm API. Userspace passes an extension identifier (an integer) and
219receives an integer that describes the extension availability.
220Generally 0 means no and 1 means yes, but some extensions may report
221additional information in the integer return value.
222
92b591a4
AG
223Based on their initialization different VMs may have different capabilities.
224It is thus encouraged to use the vm ioctl to query for capabilities (available
225with KVM_CAP_CHECK_EXTENSION_VM on the vm fd)
414fa985 226
9c1b96e3
AK
2274.5 KVM_GET_VCPU_MMAP_SIZE
228
229Capability: basic
230Architectures: all
231Type: system ioctl
232Parameters: none
233Returns: size of vcpu mmap area, in bytes
234
235The KVM_RUN ioctl (cf.) communicates with userspace via a shared
236memory region. This ioctl returns the size of that region. See the
237KVM_RUN documentation for details.
238
414fa985 239
9c1b96e3
AK
2404.6 KVM_SET_MEMORY_REGION
241
242Capability: basic
243Architectures: all
244Type: vm ioctl
245Parameters: struct kvm_memory_region (in)
246Returns: 0 on success, -1 on error
247
b74a07be 248This ioctl is obsolete and has been removed.
9c1b96e3 249
414fa985 250
68ba6974 2514.7 KVM_CREATE_VCPU
9c1b96e3
AK
252
253Capability: basic
254Architectures: all
255Type: vm ioctl
256Parameters: vcpu id (apic id on x86)
257Returns: vcpu fd on success, -1 on error
258
0b1b1dfd
GK
259This API adds a vcpu to a virtual machine. No more than max_vcpus may be added.
260The vcpu id is an integer in the range [0, max_vcpu_id).
8c3ba334
SL
261
262The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
263the KVM_CHECK_EXTENSION ioctl() at run-time.
264The maximum possible value for max_vcpus can be retrieved using the
265KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
266
76d25402
PE
267If the KVM_CAP_NR_VCPUS does not exist, you should assume that max_vcpus is 4
268cpus max.
8c3ba334
SL
269If the KVM_CAP_MAX_VCPUS does not exist, you should assume that max_vcpus is
270same as the value returned from KVM_CAP_NR_VCPUS.
9c1b96e3 271
0b1b1dfd
GK
272The maximum possible value for max_vcpu_id can be retrieved using the
273KVM_CAP_MAX_VCPU_ID of the KVM_CHECK_EXTENSION ioctl() at run-time.
274
275If the KVM_CAP_MAX_VCPU_ID does not exist, you should assume that max_vcpu_id
276is the same as the value returned from KVM_CAP_MAX_VCPUS.
277
371fefd6
PM
278On powerpc using book3s_hv mode, the vcpus are mapped onto virtual
279threads in one or more virtual CPU cores. (This is because the
280hardware requires all the hardware threads in a CPU core to be in the
281same partition.) The KVM_CAP_PPC_SMT capability indicates the number
36442687
AK
282of vcpus per virtual core (vcore). The vcore id is obtained by
283dividing the vcpu id by the number of vcpus per vcore. The vcpus in a
284given vcore will always be in the same physical core as each other
285(though that might be a different physical core from time to time).
286Userspace can control the threading (SMT) mode of the guest by its
287allocation of vcpu ids. For example, if userspace wants
288single-threaded guest vcpus, it should make all vcpu ids be a multiple
289of the number of vcpus per vcore.
290
5b1c1493
CO
291For virtual cpus that have been created with S390 user controlled virtual
292machines, the resulting vcpu fd can be memory mapped at page offset
293KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual
294cpu's hardware control block.
295
414fa985 296
68ba6974 2974.8 KVM_GET_DIRTY_LOG (vm ioctl)
9c1b96e3
AK
298
299Capability: basic
300Architectures: x86
301Type: vm ioctl
302Parameters: struct kvm_dirty_log (in/out)
303Returns: 0 on success, -1 on error
304
305/* for KVM_GET_DIRTY_LOG */
306struct kvm_dirty_log {
307 __u32 slot;
308 __u32 padding;
309 union {
310 void __user *dirty_bitmap; /* one bit per page */
311 __u64 padding;
312 };
313};
314
315Given a memory slot, return a bitmap containing any pages dirtied
316since the last call to this ioctl. Bit 0 is the first page in the
317memory slot. Ensure the entire structure is cleared to avoid padding
318issues.
319
f481b069
PB
320If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 specifies
321the address space for which you want to return the dirty bitmap.
322They must be less than the value that KVM_CHECK_EXTENSION returns for
323the KVM_CAP_MULTI_ADDRESS_SPACE capability.
324
2a31b9db
PB
325The bits in the dirty bitmap are cleared before the ioctl returns, unless
326KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is enabled. For more information,
327see the description of the capability.
414fa985 328
68ba6974 3294.9 KVM_SET_MEMORY_ALIAS
9c1b96e3
AK
330
331Capability: basic
332Architectures: x86
333Type: vm ioctl
334Parameters: struct kvm_memory_alias (in)
335Returns: 0 (success), -1 (error)
336
a1f4d395 337This ioctl is obsolete and has been removed.
9c1b96e3 338
414fa985 339
68ba6974 3404.10 KVM_RUN
9c1b96e3
AK
341
342Capability: basic
343Architectures: all
344Type: vcpu ioctl
345Parameters: none
346Returns: 0 on success, -1 on error
347Errors:
348 EINTR: an unmasked signal is pending
349
350This ioctl is used to run a guest virtual cpu. While there are no
351explicit parameters, there is an implicit parameter block that can be
352obtained by mmap()ing the vcpu fd at offset 0, with the size given by
353KVM_GET_VCPU_MMAP_SIZE. The parameter block is formatted as a 'struct
354kvm_run' (see below).
355
414fa985 356
68ba6974 3574.11 KVM_GET_REGS
9c1b96e3
AK
358
359Capability: basic
379e04c7 360Architectures: all except ARM, arm64
9c1b96e3
AK
361Type: vcpu ioctl
362Parameters: struct kvm_regs (out)
363Returns: 0 on success, -1 on error
364
365Reads the general purpose registers from the vcpu.
366
367/* x86 */
368struct kvm_regs {
369 /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
370 __u64 rax, rbx, rcx, rdx;
371 __u64 rsi, rdi, rsp, rbp;
372 __u64 r8, r9, r10, r11;
373 __u64 r12, r13, r14, r15;
374 __u64 rip, rflags;
375};
376
c2d2c21b
JH
377/* mips */
378struct kvm_regs {
379 /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
380 __u64 gpr[32];
381 __u64 hi;
382 __u64 lo;
383 __u64 pc;
384};
385
414fa985 386
68ba6974 3874.12 KVM_SET_REGS
9c1b96e3
AK
388
389Capability: basic
379e04c7 390Architectures: all except ARM, arm64
9c1b96e3
AK
391Type: vcpu ioctl
392Parameters: struct kvm_regs (in)
393Returns: 0 on success, -1 on error
394
395Writes the general purpose registers into the vcpu.
396
397See KVM_GET_REGS for the data structure.
398
414fa985 399
68ba6974 4004.13 KVM_GET_SREGS
9c1b96e3
AK
401
402Capability: basic
5ce941ee 403Architectures: x86, ppc
9c1b96e3
AK
404Type: vcpu ioctl
405Parameters: struct kvm_sregs (out)
406Returns: 0 on success, -1 on error
407
408Reads special registers from the vcpu.
409
410/* x86 */
411struct kvm_sregs {
412 struct kvm_segment cs, ds, es, fs, gs, ss;
413 struct kvm_segment tr, ldt;
414 struct kvm_dtable gdt, idt;
415 __u64 cr0, cr2, cr3, cr4, cr8;
416 __u64 efer;
417 __u64 apic_base;
418 __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64];
419};
420
68e2ffed 421/* ppc -- see arch/powerpc/include/uapi/asm/kvm.h */
5ce941ee 422
9c1b96e3
AK
423interrupt_bitmap is a bitmap of pending external interrupts. At most
424one bit may be set. This interrupt has been acknowledged by the APIC
425but not yet injected into the cpu core.
426
414fa985 427
68ba6974 4284.14 KVM_SET_SREGS
9c1b96e3
AK
429
430Capability: basic
5ce941ee 431Architectures: x86, ppc
9c1b96e3
AK
432Type: vcpu ioctl
433Parameters: struct kvm_sregs (in)
434Returns: 0 on success, -1 on error
435
436Writes special registers into the vcpu. See KVM_GET_SREGS for the
437data structures.
438
414fa985 439
68ba6974 4404.15 KVM_TRANSLATE
9c1b96e3
AK
441
442Capability: basic
443Architectures: x86
444Type: vcpu ioctl
445Parameters: struct kvm_translation (in/out)
446Returns: 0 on success, -1 on error
447
448Translates a virtual address according to the vcpu's current address
449translation mode.
450
451struct kvm_translation {
452 /* in */
453 __u64 linear_address;
454
455 /* out */
456 __u64 physical_address;
457 __u8 valid;
458 __u8 writeable;
459 __u8 usermode;
460 __u8 pad[5];
461};
462
414fa985 463
68ba6974 4644.16 KVM_INTERRUPT
9c1b96e3
AK
465
466Capability: basic
c2d2c21b 467Architectures: x86, ppc, mips
9c1b96e3
AK
468Type: vcpu ioctl
469Parameters: struct kvm_interrupt (in)
1c1a9ce9 470Returns: 0 on success, negative on failure.
9c1b96e3 471
1c1a9ce9 472Queues a hardware interrupt vector to be injected.
9c1b96e3
AK
473
474/* for KVM_INTERRUPT */
475struct kvm_interrupt {
476 /* in */
477 __u32 irq;
478};
479
6f7a2bd4
AG
480X86:
481
1c1a9ce9
SR
482Returns: 0 on success,
483 -EEXIST if an interrupt is already enqueued
484 -EINVAL the the irq number is invalid
485 -ENXIO if the PIC is in the kernel
486 -EFAULT if the pointer is invalid
487
488Note 'irq' is an interrupt vector, not an interrupt pin or line. This
489ioctl is useful if the in-kernel PIC is not used.
9c1b96e3 490
6f7a2bd4
AG
491PPC:
492
493Queues an external interrupt to be injected. This ioctl is overleaded
494with 3 different irq values:
495
496a) KVM_INTERRUPT_SET
497
498 This injects an edge type external interrupt into the guest once it's ready
499 to receive interrupts. When injected, the interrupt is done.
500
501b) KVM_INTERRUPT_UNSET
502
503 This unsets any pending interrupt.
504
505 Only available with KVM_CAP_PPC_UNSET_IRQ.
506
507c) KVM_INTERRUPT_SET_LEVEL
508
509 This injects a level type external interrupt into the guest context. The
510 interrupt stays pending until a specific ioctl with KVM_INTERRUPT_UNSET
511 is triggered.
512
513 Only available with KVM_CAP_PPC_IRQ_LEVEL.
514
515Note that any value for 'irq' other than the ones stated above is invalid
516and incurs unexpected behavior.
517
c2d2c21b
JH
518MIPS:
519
520Queues an external interrupt to be injected into the virtual CPU. A negative
521interrupt number dequeues the interrupt.
522
414fa985 523
68ba6974 5244.17 KVM_DEBUG_GUEST
9c1b96e3
AK
525
526Capability: basic
527Architectures: none
528Type: vcpu ioctl
529Parameters: none)
530Returns: -1 on error
531
532Support for this has been removed. Use KVM_SET_GUEST_DEBUG instead.
533
414fa985 534
68ba6974 5354.18 KVM_GET_MSRS
9c1b96e3 536
801e459a 537Capability: basic (vcpu), KVM_CAP_GET_MSR_FEATURES (system)
9c1b96e3 538Architectures: x86
801e459a 539Type: system ioctl, vcpu ioctl
9c1b96e3 540Parameters: struct kvm_msrs (in/out)
801e459a
TL
541Returns: number of msrs successfully returned;
542 -1 on error
543
544When used as a system ioctl:
545Reads the values of MSR-based features that are available for the VM. This
546is similar to KVM_GET_SUPPORTED_CPUID, but it returns MSR indices and values.
547The list of msr-based features can be obtained using KVM_GET_MSR_FEATURE_INDEX_LIST
548in a system ioctl.
9c1b96e3 549
801e459a 550When used as a vcpu ioctl:
9c1b96e3 551Reads model-specific registers from the vcpu. Supported msr indices can
801e459a 552be obtained using KVM_GET_MSR_INDEX_LIST in a system ioctl.
9c1b96e3
AK
553
554struct kvm_msrs {
555 __u32 nmsrs; /* number of msrs in entries */
556 __u32 pad;
557
558 struct kvm_msr_entry entries[0];
559};
560
561struct kvm_msr_entry {
562 __u32 index;
563 __u32 reserved;
564 __u64 data;
565};
566
567Application code should set the 'nmsrs' member (which indicates the
568size of the entries array) and the 'index' member of each array entry.
569kvm will fill in the 'data' member.
570
414fa985 571
68ba6974 5724.19 KVM_SET_MSRS
9c1b96e3
AK
573
574Capability: basic
575Architectures: x86
576Type: vcpu ioctl
577Parameters: struct kvm_msrs (in)
578Returns: 0 on success, -1 on error
579
580Writes model-specific registers to the vcpu. See KVM_GET_MSRS for the
581data structures.
582
583Application code should set the 'nmsrs' member (which indicates the
584size of the entries array), and the 'index' and 'data' members of each
585array entry.
586
414fa985 587
68ba6974 5884.20 KVM_SET_CPUID
9c1b96e3
AK
589
590Capability: basic
591Architectures: x86
592Type: vcpu ioctl
593Parameters: struct kvm_cpuid (in)
594Returns: 0 on success, -1 on error
595
596Defines the vcpu responses to the cpuid instruction. Applications
597should use the KVM_SET_CPUID2 ioctl if available.
598
599
600struct kvm_cpuid_entry {
601 __u32 function;
602 __u32 eax;
603 __u32 ebx;
604 __u32 ecx;
605 __u32 edx;
606 __u32 padding;
607};
608
609/* for KVM_SET_CPUID */
610struct kvm_cpuid {
611 __u32 nent;
612 __u32 padding;
613 struct kvm_cpuid_entry entries[0];
614};
615
414fa985 616
68ba6974 6174.21 KVM_SET_SIGNAL_MASK
9c1b96e3
AK
618
619Capability: basic
572e0929 620Architectures: all
9c1b96e3
AK
621Type: vcpu ioctl
622Parameters: struct kvm_signal_mask (in)
623Returns: 0 on success, -1 on error
624
625Defines which signals are blocked during execution of KVM_RUN. This
626signal mask temporarily overrides the threads signal mask. Any
627unblocked signal received (except SIGKILL and SIGSTOP, which retain
628their traditional behaviour) will cause KVM_RUN to return with -EINTR.
629
630Note the signal will only be delivered if not blocked by the original
631signal mask.
632
633/* for KVM_SET_SIGNAL_MASK */
634struct kvm_signal_mask {
635 __u32 len;
636 __u8 sigset[0];
637};
638
414fa985 639
68ba6974 6404.22 KVM_GET_FPU
9c1b96e3
AK
641
642Capability: basic
643Architectures: x86
644Type: vcpu ioctl
645Parameters: struct kvm_fpu (out)
646Returns: 0 on success, -1 on error
647
648Reads the floating point state from the vcpu.
649
650/* for KVM_GET_FPU and KVM_SET_FPU */
651struct kvm_fpu {
652 __u8 fpr[8][16];
653 __u16 fcw;
654 __u16 fsw;
655 __u8 ftwx; /* in fxsave format */
656 __u8 pad1;
657 __u16 last_opcode;
658 __u64 last_ip;
659 __u64 last_dp;
660 __u8 xmm[16][16];
661 __u32 mxcsr;
662 __u32 pad2;
663};
664
414fa985 665
68ba6974 6664.23 KVM_SET_FPU
9c1b96e3
AK
667
668Capability: basic
669Architectures: x86
670Type: vcpu ioctl
671Parameters: struct kvm_fpu (in)
672Returns: 0 on success, -1 on error
673
674Writes the floating point state to the vcpu.
675
676/* for KVM_GET_FPU and KVM_SET_FPU */
677struct kvm_fpu {
678 __u8 fpr[8][16];
679 __u16 fcw;
680 __u16 fsw;
681 __u8 ftwx; /* in fxsave format */
682 __u8 pad1;
683 __u16 last_opcode;
684 __u64 last_ip;
685 __u64 last_dp;
686 __u8 xmm[16][16];
687 __u32 mxcsr;
688 __u32 pad2;
689};
690
414fa985 691
68ba6974 6924.24 KVM_CREATE_IRQCHIP
5dadbfd6 693
84223598 694Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390)
c32a4272 695Architectures: x86, ARM, arm64, s390
5dadbfd6
AK
696Type: vm ioctl
697Parameters: none
698Returns: 0 on success, -1 on error
699
ac3d3735
AP
700Creates an interrupt controller model in the kernel.
701On x86, creates a virtual ioapic, a virtual PIC (two PICs, nested), and sets up
702future vcpus to have a local APIC. IRQ routing for GSIs 0-15 is set to both
703PIC and IOAPIC; GSI 16-23 only go to the IOAPIC.
704On ARM/arm64, a GICv2 is created. Any other GIC versions require the usage of
705KVM_CREATE_DEVICE, which also supports creating a GICv2. Using
706KVM_CREATE_DEVICE is preferred over KVM_CREATE_IRQCHIP for GICv2.
707On s390, a dummy irq routing table is created.
84223598
CH
708
709Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
710before KVM_CREATE_IRQCHIP can be used.
5dadbfd6 711
414fa985 712
68ba6974 7134.25 KVM_IRQ_LINE
5dadbfd6
AK
714
715Capability: KVM_CAP_IRQCHIP
c32a4272 716Architectures: x86, arm, arm64
5dadbfd6
AK
717Type: vm ioctl
718Parameters: struct kvm_irq_level
719Returns: 0 on success, -1 on error
720
721Sets the level of a GSI input to the interrupt controller model in the kernel.
86ce8535
CD
722On some architectures it is required that an interrupt controller model has
723been previously created with KVM_CREATE_IRQCHIP. Note that edge-triggered
724interrupts require the level to be set to 1 and then back to 0.
725
100943c5
GS
726On real hardware, interrupt pins can be active-low or active-high. This
727does not matter for the level field of struct kvm_irq_level: 1 always
728means active (asserted), 0 means inactive (deasserted).
729
730x86 allows the operating system to program the interrupt polarity
731(active-low/active-high) for level-triggered interrupts, and KVM used
732to consider the polarity. However, due to bitrot in the handling of
733active-low interrupts, the above convention is now valid on x86 too.
734This is signaled by KVM_CAP_X86_IOAPIC_POLARITY_IGNORED. Userspace
735should not present interrupts to the guest as active-low unless this
736capability is present (or unless it is not using the in-kernel irqchip,
737of course).
738
739
379e04c7
MZ
740ARM/arm64 can signal an interrupt either at the CPU level, or at the
741in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to
742use PPIs designated for specific cpus. The irq field is interpreted
743like this:
86ce8535
CD
744
745  bits: | 31 ... 24 | 23 ... 16 | 15 ... 0 |
746 field: | irq_type | vcpu_index | irq_id |
747
748The irq_type field has the following values:
749- irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
750- irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)
751 (the vcpu_index field is ignored)
752- irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)
753
754(The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs)
755
100943c5 756In both cases, level is used to assert/deassert the line.
5dadbfd6
AK
757
758struct kvm_irq_level {
759 union {
760 __u32 irq; /* GSI */
761 __s32 status; /* not used for KVM_IRQ_LEVEL */
762 };
763 __u32 level; /* 0 or 1 */
764};
765
414fa985 766
68ba6974 7674.26 KVM_GET_IRQCHIP
5dadbfd6
AK
768
769Capability: KVM_CAP_IRQCHIP
c32a4272 770Architectures: x86
5dadbfd6
AK
771Type: vm ioctl
772Parameters: struct kvm_irqchip (in/out)
773Returns: 0 on success, -1 on error
774
775Reads the state of a kernel interrupt controller created with
776KVM_CREATE_IRQCHIP into a buffer provided by the caller.
777
778struct kvm_irqchip {
779 __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
780 __u32 pad;
781 union {
782 char dummy[512]; /* reserving space */
783 struct kvm_pic_state pic;
784 struct kvm_ioapic_state ioapic;
785 } chip;
786};
787
414fa985 788
68ba6974 7894.27 KVM_SET_IRQCHIP
5dadbfd6
AK
790
791Capability: KVM_CAP_IRQCHIP
c32a4272 792Architectures: x86
5dadbfd6
AK
793Type: vm ioctl
794Parameters: struct kvm_irqchip (in)
795Returns: 0 on success, -1 on error
796
797Sets the state of a kernel interrupt controller created with
798KVM_CREATE_IRQCHIP from a buffer provided by the caller.
799
800struct kvm_irqchip {
801 __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
802 __u32 pad;
803 union {
804 char dummy[512]; /* reserving space */
805 struct kvm_pic_state pic;
806 struct kvm_ioapic_state ioapic;
807 } chip;
808};
809
414fa985 810
68ba6974 8114.28 KVM_XEN_HVM_CONFIG
ffde22ac
ES
812
813Capability: KVM_CAP_XEN_HVM
814Architectures: x86
815Type: vm ioctl
816Parameters: struct kvm_xen_hvm_config (in)
817Returns: 0 on success, -1 on error
818
819Sets the MSR that the Xen HVM guest uses to initialize its hypercall
820page, and provides the starting address and size of the hypercall
821blobs in userspace. When the guest writes the MSR, kvm copies one
822page of a blob (32- or 64-bit, depending on the vcpu mode) to guest
823memory.
824
825struct kvm_xen_hvm_config {
826 __u32 flags;
827 __u32 msr;
828 __u64 blob_addr_32;
829 __u64 blob_addr_64;
830 __u8 blob_size_32;
831 __u8 blob_size_64;
832 __u8 pad2[30];
833};
834
414fa985 835
68ba6974 8364.29 KVM_GET_CLOCK
afbcf7ab
GC
837
838Capability: KVM_CAP_ADJUST_CLOCK
839Architectures: x86
840Type: vm ioctl
841Parameters: struct kvm_clock_data (out)
842Returns: 0 on success, -1 on error
843
844Gets the current timestamp of kvmclock as seen by the current guest. In
845conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
846such as migration.
847
e3fd9a93
PB
848When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
849set of bits that KVM can return in struct kvm_clock_data's flag member.
850
851The only flag defined now is KVM_CLOCK_TSC_STABLE. If set, the returned
852value is the exact kvmclock value seen by all VCPUs at the instant
853when KVM_GET_CLOCK was called. If clear, the returned value is simply
854CLOCK_MONOTONIC plus a constant offset; the offset can be modified
855with KVM_SET_CLOCK. KVM will try to make all VCPUs follow this clock,
856but the exact value read by each VCPU could differ, because the host
857TSC is not stable.
858
afbcf7ab
GC
859struct kvm_clock_data {
860 __u64 clock; /* kvmclock current value */
861 __u32 flags;
862 __u32 pad[9];
863};
864
414fa985 865
68ba6974 8664.30 KVM_SET_CLOCK
afbcf7ab
GC
867
868Capability: KVM_CAP_ADJUST_CLOCK
869Architectures: x86
870Type: vm ioctl
871Parameters: struct kvm_clock_data (in)
872Returns: 0 on success, -1 on error
873
2044892d 874Sets the current timestamp of kvmclock to the value specified in its parameter.
afbcf7ab
GC
875In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
876such as migration.
877
878struct kvm_clock_data {
879 __u64 clock; /* kvmclock current value */
880 __u32 flags;
881 __u32 pad[9];
882};
883
414fa985 884
68ba6974 8854.31 KVM_GET_VCPU_EVENTS
3cfc3092
JK
886
887Capability: KVM_CAP_VCPU_EVENTS
48005f64 888Extended by: KVM_CAP_INTR_SHADOW
b0960b95 889Architectures: x86, arm, arm64
b7b27fac 890Type: vcpu ioctl
3cfc3092
JK
891Parameters: struct kvm_vcpu_event (out)
892Returns: 0 on success, -1 on error
893
b7b27fac
DG
894X86:
895
3cfc3092
JK
896Gets currently pending exceptions, interrupts, and NMIs as well as related
897states of the vcpu.
898
899struct kvm_vcpu_events {
900 struct {
901 __u8 injected;
902 __u8 nr;
903 __u8 has_error_code;
59073aaf 904 __u8 pending;
3cfc3092
JK
905 __u32 error_code;
906 } exception;
907 struct {
908 __u8 injected;
909 __u8 nr;
910 __u8 soft;
48005f64 911 __u8 shadow;
3cfc3092
JK
912 } interrupt;
913 struct {
914 __u8 injected;
915 __u8 pending;
916 __u8 masked;
917 __u8 pad;
918 } nmi;
919 __u32 sipi_vector;
dab4b911 920 __u32 flags;
f077825a
PB
921 struct {
922 __u8 smm;
923 __u8 pending;
924 __u8 smm_inside_nmi;
925 __u8 latched_init;
926 } smi;
59073aaf
JM
927 __u8 reserved[27];
928 __u8 exception_has_payload;
929 __u64 exception_payload;
3cfc3092
JK
930};
931
59073aaf 932The following bits are defined in the flags field:
f077825a 933
59073aaf 934- KVM_VCPUEVENT_VALID_SHADOW may be set to signal that
f077825a 935 interrupt.shadow contains a valid state.
48005f64 936
59073aaf
JM
937- KVM_VCPUEVENT_VALID_SMM may be set to signal that smi contains a
938 valid state.
939
940- KVM_VCPUEVENT_VALID_PAYLOAD may be set to signal that the
941 exception_has_payload, exception_payload, and exception.pending
942 fields contain a valid state. This bit will be set whenever
943 KVM_CAP_EXCEPTION_PAYLOAD is enabled.
414fa985 944
b0960b95 945ARM/ARM64:
b7b27fac
DG
946
947If the guest accesses a device that is being emulated by the host kernel in
948such a way that a real device would generate a physical SError, KVM may make
949a virtual SError pending for that VCPU. This system error interrupt remains
950pending until the guest takes the exception by unmasking PSTATE.A.
951
952Running the VCPU may cause it to take a pending SError, or make an access that
953causes an SError to become pending. The event's description is only valid while
954the VPCU is not running.
955
956This API provides a way to read and write the pending 'event' state that is not
957visible to the guest. To save, restore or migrate a VCPU the struct representing
958the state can be read then written using this GET/SET API, along with the other
959guest-visible registers. It is not possible to 'cancel' an SError that has been
960made pending.
961
962A device being emulated in user-space may also wish to generate an SError. To do
963this the events structure can be populated by user-space. The current state
964should be read first, to ensure no existing SError is pending. If an existing
965SError is pending, the architecture's 'Multiple SError interrupts' rules should
966be followed. (2.5.3 of DDI0587.a "ARM Reliability, Availability, and
967Serviceability (RAS) Specification").
968
be26b3a7
DG
969SError exceptions always have an ESR value. Some CPUs have the ability to
970specify what the virtual SError's ESR value should be. These systems will
688e0581 971advertise KVM_CAP_ARM_INJECT_SERROR_ESR. In this case exception.has_esr will
be26b3a7
DG
972always have a non-zero value when read, and the agent making an SError pending
973should specify the ISS field in the lower 24 bits of exception.serror_esr. If
688e0581 974the system supports KVM_CAP_ARM_INJECT_SERROR_ESR, but user-space sets the events
be26b3a7
DG
975with exception.has_esr as zero, KVM will choose an ESR.
976
977Specifying exception.has_esr on a system that does not support it will return
978-EINVAL. Setting anything other than the lower 24bits of exception.serror_esr
979will return -EINVAL.
980
b7b27fac
DG
981struct kvm_vcpu_events {
982 struct {
983 __u8 serror_pending;
984 __u8 serror_has_esr;
985 /* Align it to 8 bytes */
986 __u8 pad[6];
987 __u64 serror_esr;
988 } exception;
989 __u32 reserved[12];
990};
991
68ba6974 9924.32 KVM_SET_VCPU_EVENTS
3cfc3092
JK
993
994Capability: KVM_CAP_VCPU_EVENTS
48005f64 995Extended by: KVM_CAP_INTR_SHADOW
b0960b95 996Architectures: x86, arm, arm64
b7b27fac 997Type: vcpu ioctl
3cfc3092
JK
998Parameters: struct kvm_vcpu_event (in)
999Returns: 0 on success, -1 on error
1000
b7b27fac
DG
1001X86:
1002
3cfc3092
JK
1003Set pending exceptions, interrupts, and NMIs as well as related states of the
1004vcpu.
1005
1006See KVM_GET_VCPU_EVENTS for the data structure.
1007
dab4b911 1008Fields that may be modified asynchronously by running VCPUs can be excluded
f077825a
PB
1009from the update. These fields are nmi.pending, sipi_vector, smi.smm,
1010smi.pending. Keep the corresponding bits in the flags field cleared to
1011suppress overwriting the current in-kernel state. The bits are:
dab4b911
JK
1012
1013KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel
1014KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector
f077825a 1015KVM_VCPUEVENT_VALID_SMM - transfer the smi sub-struct.
dab4b911 1016
48005f64
JK
1017If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in
1018the flags field to signal that interrupt.shadow contains a valid state and
1019shall be written into the VCPU.
1020
f077825a
PB
1021KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
1022
59073aaf
JM
1023If KVM_CAP_EXCEPTION_PAYLOAD is enabled, KVM_VCPUEVENT_VALID_PAYLOAD
1024can be set in the flags field to signal that the
1025exception_has_payload, exception_payload, and exception.pending fields
1026contain a valid state and shall be written into the VCPU.
1027
b0960b95 1028ARM/ARM64:
b7b27fac
DG
1029
1030Set the pending SError exception state for this VCPU. It is not possible to
1031'cancel' an Serror that has been made pending.
1032
1033See KVM_GET_VCPU_EVENTS for the data structure.
1034
414fa985 1035
68ba6974 10364.33 KVM_GET_DEBUGREGS
a1efbe77
JK
1037
1038Capability: KVM_CAP_DEBUGREGS
1039Architectures: x86
1040Type: vm ioctl
1041Parameters: struct kvm_debugregs (out)
1042Returns: 0 on success, -1 on error
1043
1044Reads debug registers from the vcpu.
1045
1046struct kvm_debugregs {
1047 __u64 db[4];
1048 __u64 dr6;
1049 __u64 dr7;
1050 __u64 flags;
1051 __u64 reserved[9];
1052};
1053
414fa985 1054
68ba6974 10554.34 KVM_SET_DEBUGREGS
a1efbe77
JK
1056
1057Capability: KVM_CAP_DEBUGREGS
1058Architectures: x86
1059Type: vm ioctl
1060Parameters: struct kvm_debugregs (in)
1061Returns: 0 on success, -1 on error
1062
1063Writes debug registers into the vcpu.
1064
1065See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
1066yet and must be cleared on entry.
1067
414fa985 1068
68ba6974 10694.35 KVM_SET_USER_MEMORY_REGION
0f2d8f4d
AK
1070
1071Capability: KVM_CAP_USER_MEM
1072Architectures: all
1073Type: vm ioctl
1074Parameters: struct kvm_userspace_memory_region (in)
1075Returns: 0 on success, -1 on error
1076
1077struct kvm_userspace_memory_region {
1078 __u32 slot;
1079 __u32 flags;
1080 __u64 guest_phys_addr;
1081 __u64 memory_size; /* bytes */
1082 __u64 userspace_addr; /* start of the userspace allocated memory */
1083};
1084
1085/* for kvm_memory_region::flags */
4d8b81ab
XG
1086#define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0)
1087#define KVM_MEM_READONLY (1UL << 1)
0f2d8f4d
AK
1088
1089This ioctl allows the user to create or modify a guest physical memory
1090slot. When changing an existing slot, it may be moved in the guest
1091physical memory space, or its flags may be modified. It may not be
1092resized. Slots may not overlap in guest physical address space.
a677e704
LC
1093Bits 0-15 of "slot" specifies the slot id and this value should be
1094less than the maximum number of user memory slots supported per VM.
1095The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS,
1096if this capability is supported by the architecture.
0f2d8f4d 1097
f481b069
PB
1098If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot"
1099specifies the address space which is being modified. They must be
1100less than the value that KVM_CHECK_EXTENSION returns for the
1101KVM_CAP_MULTI_ADDRESS_SPACE capability. Slots in separate address spaces
1102are unrelated; the restriction on overlapping slots only applies within
1103each address space.
1104
0f2d8f4d
AK
1105Memory for the region is taken starting at the address denoted by the
1106field userspace_addr, which must point at user addressable memory for
1107the entire memory slot size. Any object may back this memory, including
1108anonymous memory, ordinary files, and hugetlbfs.
1109
1110It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
1111be identical. This allows large pages in the guest to be backed by large
1112pages in the host.
1113
75d61fbc
TY
1114The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
1115KVM_MEM_READONLY. The former can be set to instruct KVM to keep track of
1116writes to memory within the slot. See KVM_GET_DIRTY_LOG ioctl to know how to
1117use it. The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
1118to make a new slot read-only. In this case, writes to this memory will be
1119posted to userspace as KVM_EXIT_MMIO exits.
7efd8fa1
JK
1120
1121When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
1122the memory region are automatically reflected into the guest. For example, an
1123mmap() that affects the region will be made visible immediately. Another
1124example is madvise(MADV_DROP).
0f2d8f4d
AK
1125
1126It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl.
1127The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
1128allocation and is deprecated.
3cfc3092 1129
414fa985 1130
68ba6974 11314.36 KVM_SET_TSS_ADDR
8a5416db
AK
1132
1133Capability: KVM_CAP_SET_TSS_ADDR
1134Architectures: x86
1135Type: vm ioctl
1136Parameters: unsigned long tss_address (in)
1137Returns: 0 on success, -1 on error
1138
1139This ioctl defines the physical address of a three-page region in the guest
1140physical address space. The region must be within the first 4GB of the
1141guest physical address space and must not conflict with any memory slot
1142or any mmio address. The guest may malfunction if it accesses this memory
1143region.
1144
1145This ioctl is required on Intel-based hosts. This is needed on Intel hardware
1146because of a quirk in the virtualization implementation (see the internals
1147documentation when it pops into existence).
1148
414fa985 1149
68ba6974 11504.37 KVM_ENABLE_CAP
71fbfd5f 1151
e5d83c74
PB
1152Capability: KVM_CAP_ENABLE_CAP
1153Architectures: mips, ppc, s390
1154Type: vcpu ioctl
1155Parameters: struct kvm_enable_cap (in)
1156Returns: 0 on success; -1 on error
1157
1158Capability: KVM_CAP_ENABLE_CAP_VM
1159Architectures: all
1160Type: vcpu ioctl
71fbfd5f
AG
1161Parameters: struct kvm_enable_cap (in)
1162Returns: 0 on success; -1 on error
1163
1164+Not all extensions are enabled by default. Using this ioctl the application
1165can enable an extension, making it available to the guest.
1166
1167On systems that do not support this ioctl, it always fails. On systems that
1168do support it, it only works for extensions that are supported for enablement.
1169
1170To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should
1171be used.
1172
1173struct kvm_enable_cap {
1174 /* in */
1175 __u32 cap;
1176
1177The capability that is supposed to get enabled.
1178
1179 __u32 flags;
1180
1181A bitfield indicating future enhancements. Has to be 0 for now.
1182
1183 __u64 args[4];
1184
1185Arguments for enabling a feature. If a feature needs initial values to
1186function properly, this is the place to put them.
1187
1188 __u8 pad[64];
1189};
1190
d938dc55
CH
1191The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
1192for vm-wide capabilities.
414fa985 1193
68ba6974 11944.38 KVM_GET_MP_STATE
b843f065
AK
1195
1196Capability: KVM_CAP_MP_STATE
ecccf0cc 1197Architectures: x86, s390, arm, arm64
b843f065
AK
1198Type: vcpu ioctl
1199Parameters: struct kvm_mp_state (out)
1200Returns: 0 on success; -1 on error
1201
1202struct kvm_mp_state {
1203 __u32 mp_state;
1204};
1205
1206Returns the vcpu's current "multiprocessing state" (though also valid on
1207uniprocessor guests).
1208
1209Possible values are:
1210
ecccf0cc 1211 - KVM_MP_STATE_RUNNABLE: the vcpu is currently running [x86,arm/arm64]
b843f065 1212 - KVM_MP_STATE_UNINITIALIZED: the vcpu is an application processor (AP)
c32a4272 1213 which has not yet received an INIT signal [x86]
b843f065 1214 - KVM_MP_STATE_INIT_RECEIVED: the vcpu has received an INIT signal, and is
c32a4272 1215 now ready for a SIPI [x86]
b843f065 1216 - KVM_MP_STATE_HALTED: the vcpu has executed a HLT instruction and
c32a4272 1217 is waiting for an interrupt [x86]
b843f065 1218 - KVM_MP_STATE_SIPI_RECEIVED: the vcpu has just received a SIPI (vector
c32a4272 1219 accessible via KVM_GET_VCPU_EVENTS) [x86]
ecccf0cc 1220 - KVM_MP_STATE_STOPPED: the vcpu is stopped [s390,arm/arm64]
6352e4d2
DH
1221 - KVM_MP_STATE_CHECK_STOP: the vcpu is in a special error state [s390]
1222 - KVM_MP_STATE_OPERATING: the vcpu is operating (running or halted)
1223 [s390]
1224 - KVM_MP_STATE_LOAD: the vcpu is in a special load/startup state
1225 [s390]
b843f065 1226
c32a4272 1227On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
0b4820d6
DH
1228in-kernel irqchip, the multiprocessing state must be maintained by userspace on
1229these architectures.
b843f065 1230
ecccf0cc
AB
1231For arm/arm64:
1232
1233The only states that are valid are KVM_MP_STATE_STOPPED and
1234KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
414fa985 1235
68ba6974 12364.39 KVM_SET_MP_STATE
b843f065
AK
1237
1238Capability: KVM_CAP_MP_STATE
ecccf0cc 1239Architectures: x86, s390, arm, arm64
b843f065
AK
1240Type: vcpu ioctl
1241Parameters: struct kvm_mp_state (in)
1242Returns: 0 on success; -1 on error
1243
1244Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for
1245arguments.
1246
c32a4272 1247On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
0b4820d6
DH
1248in-kernel irqchip, the multiprocessing state must be maintained by userspace on
1249these architectures.
b843f065 1250
ecccf0cc
AB
1251For arm/arm64:
1252
1253The only states that are valid are KVM_MP_STATE_STOPPED and
1254KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
414fa985 1255
68ba6974 12564.40 KVM_SET_IDENTITY_MAP_ADDR
47dbb84f
AK
1257
1258Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
1259Architectures: x86
1260Type: vm ioctl
1261Parameters: unsigned long identity (in)
1262Returns: 0 on success, -1 on error
1263
1264This ioctl defines the physical address of a one-page region in the guest
1265physical address space. The region must be within the first 4GB of the
1266guest physical address space and must not conflict with any memory slot
1267or any mmio address. The guest may malfunction if it accesses this memory
1268region.
1269
726b99c4
DH
1270Setting the address to 0 will result in resetting the address to its default
1271(0xfffbc000).
1272
47dbb84f
AK
1273This ioctl is required on Intel-based hosts. This is needed on Intel hardware
1274because of a quirk in the virtualization implementation (see the internals
1275documentation when it pops into existence).
1276
1af1ac91 1277Fails if any VCPU has already been created.
414fa985 1278
68ba6974 12794.41 KVM_SET_BOOT_CPU_ID
57bc24cf
AK
1280
1281Capability: KVM_CAP_SET_BOOT_CPU_ID
c32a4272 1282Architectures: x86
57bc24cf
AK
1283Type: vm ioctl
1284Parameters: unsigned long vcpu_id
1285Returns: 0 on success, -1 on error
1286
1287Define which vcpu is the Bootstrap Processor (BSP). Values are the same
1288as the vcpu id in KVM_CREATE_VCPU. If this ioctl is not called, the default
1289is vcpu 0.
1290
414fa985 1291
68ba6974 12924.42 KVM_GET_XSAVE
2d5b5a66
SY
1293
1294Capability: KVM_CAP_XSAVE
1295Architectures: x86
1296Type: vcpu ioctl
1297Parameters: struct kvm_xsave (out)
1298Returns: 0 on success, -1 on error
1299
1300struct kvm_xsave {
1301 __u32 region[1024];
1302};
1303
1304This ioctl would copy current vcpu's xsave struct to the userspace.
1305
414fa985 1306
68ba6974 13074.43 KVM_SET_XSAVE
2d5b5a66
SY
1308
1309Capability: KVM_CAP_XSAVE
1310Architectures: x86
1311Type: vcpu ioctl
1312Parameters: struct kvm_xsave (in)
1313Returns: 0 on success, -1 on error
1314
1315struct kvm_xsave {
1316 __u32 region[1024];
1317};
1318
1319This ioctl would copy userspace's xsave struct to the kernel.
1320
414fa985 1321
68ba6974 13224.44 KVM_GET_XCRS
2d5b5a66
SY
1323
1324Capability: KVM_CAP_XCRS
1325Architectures: x86
1326Type: vcpu ioctl
1327Parameters: struct kvm_xcrs (out)
1328Returns: 0 on success, -1 on error
1329
1330struct kvm_xcr {
1331 __u32 xcr;
1332 __u32 reserved;
1333 __u64 value;
1334};
1335
1336struct kvm_xcrs {
1337 __u32 nr_xcrs;
1338 __u32 flags;
1339 struct kvm_xcr xcrs[KVM_MAX_XCRS];
1340 __u64 padding[16];
1341};
1342
1343This ioctl would copy current vcpu's xcrs to the userspace.
1344
414fa985 1345
68ba6974 13464.45 KVM_SET_XCRS
2d5b5a66
SY
1347
1348Capability: KVM_CAP_XCRS
1349Architectures: x86
1350Type: vcpu ioctl
1351Parameters: struct kvm_xcrs (in)
1352Returns: 0 on success, -1 on error
1353
1354struct kvm_xcr {
1355 __u32 xcr;
1356 __u32 reserved;
1357 __u64 value;
1358};
1359
1360struct kvm_xcrs {
1361 __u32 nr_xcrs;
1362 __u32 flags;
1363 struct kvm_xcr xcrs[KVM_MAX_XCRS];
1364 __u64 padding[16];
1365};
1366
1367This ioctl would set vcpu's xcr to the value userspace specified.
1368
414fa985 1369
68ba6974 13704.46 KVM_GET_SUPPORTED_CPUID
d153513d
AK
1371
1372Capability: KVM_CAP_EXT_CPUID
1373Architectures: x86
1374Type: system ioctl
1375Parameters: struct kvm_cpuid2 (in/out)
1376Returns: 0 on success, -1 on error
1377
1378struct kvm_cpuid2 {
1379 __u32 nent;
1380 __u32 padding;
1381 struct kvm_cpuid_entry2 entries[0];
1382};
1383
9c15bb1d
BP
1384#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
1385#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
1386#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
d153513d
AK
1387
1388struct kvm_cpuid_entry2 {
1389 __u32 function;
1390 __u32 index;
1391 __u32 flags;
1392 __u32 eax;
1393 __u32 ebx;
1394 __u32 ecx;
1395 __u32 edx;
1396 __u32 padding[3];
1397};
1398
df9cb9cc
JM
1399This ioctl returns x86 cpuid features which are supported by both the
1400hardware and kvm in its default configuration. Userspace can use the
1401information returned by this ioctl to construct cpuid information (for
1402KVM_SET_CPUID2) that is consistent with hardware, kernel, and
1403userspace capabilities, and with user requirements (for example, the
1404user may wish to constrain cpuid to emulate older hardware, or for
1405feature consistency across a cluster).
1406
1407Note that certain capabilities, such as KVM_CAP_X86_DISABLE_EXITS, may
1408expose cpuid features (e.g. MONITOR) which are not supported by kvm in
1409its default configuration. If userspace enables such capabilities, it
1410is responsible for modifying the results of this ioctl appropriately.
d153513d
AK
1411
1412Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure
1413with the 'nent' field indicating the number of entries in the variable-size
1414array 'entries'. If the number of entries is too low to describe the cpu
1415capabilities, an error (E2BIG) is returned. If the number is too high,
1416the 'nent' field is adjusted and an error (ENOMEM) is returned. If the
1417number is just right, the 'nent' field is adjusted to the number of valid
1418entries in the 'entries' array, which is then filled.
1419
1420The entries returned are the host cpuid as returned by the cpuid instruction,
c39cbd2a
AK
1421with unknown or unsupported features masked out. Some features (for example,
1422x2apic), may not be present in the host cpu, but are exposed by kvm if it can
1423emulate them efficiently. The fields in each entry are defined as follows:
d153513d
AK
1424
1425 function: the eax value used to obtain the entry
1426 index: the ecx value used to obtain the entry (for entries that are
1427 affected by ecx)
1428 flags: an OR of zero or more of the following:
1429 KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
1430 if the index field is valid
1431 KVM_CPUID_FLAG_STATEFUL_FUNC:
1432 if cpuid for this function returns different values for successive
1433 invocations; there will be several entries with the same function,
1434 all with this flag set
1435 KVM_CPUID_FLAG_STATE_READ_NEXT:
1436 for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
1437 the first entry to be read by a cpu
1438 eax, ebx, ecx, edx: the values returned by the cpuid instruction for
1439 this function/index combination
1440
4d25a066
JK
1441The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned
1442as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC
1443support. Instead it is reported via
1444
1445 ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER)
1446
1447if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
1448feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
1449
414fa985 1450
68ba6974 14514.47 KVM_PPC_GET_PVINFO
15711e9c
AG
1452
1453Capability: KVM_CAP_PPC_GET_PVINFO
1454Architectures: ppc
1455Type: vm ioctl
1456Parameters: struct kvm_ppc_pvinfo (out)
1457Returns: 0 on success, !0 on error
1458
1459struct kvm_ppc_pvinfo {
1460 __u32 flags;
1461 __u32 hcall[4];
1462 __u8 pad[108];
1463};
1464
1465This ioctl fetches PV specific information that need to be passed to the guest
1466using the device tree or other means from vm context.
1467
9202e076 1468The hcall array defines 4 instructions that make up a hypercall.
15711e9c
AG
1469
1470If any additional field gets added to this structure later on, a bit for that
1471additional piece of information will be set in the flags bitmap.
1472
9202e076
LYB
1473The flags bitmap is defined as:
1474
1475 /* the host supports the ePAPR idle hcall
1476 #define KVM_PPC_PVINFO_FLAGS_EV_IDLE (1<<0)
414fa985 1477
68ba6974 14784.52 KVM_SET_GSI_ROUTING
49f48172
JK
1479
1480Capability: KVM_CAP_IRQ_ROUTING
180ae7b1 1481Architectures: x86 s390 arm arm64
49f48172
JK
1482Type: vm ioctl
1483Parameters: struct kvm_irq_routing (in)
1484Returns: 0 on success, -1 on error
1485
1486Sets the GSI routing table entries, overwriting any previously set entries.
1487
180ae7b1
EA
1488On arm/arm64, GSI routing has the following limitation:
1489- GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD.
1490
49f48172
JK
1491struct kvm_irq_routing {
1492 __u32 nr;
1493 __u32 flags;
1494 struct kvm_irq_routing_entry entries[0];
1495};
1496
1497No flags are specified so far, the corresponding field must be set to zero.
1498
1499struct kvm_irq_routing_entry {
1500 __u32 gsi;
1501 __u32 type;
1502 __u32 flags;
1503 __u32 pad;
1504 union {
1505 struct kvm_irq_routing_irqchip irqchip;
1506 struct kvm_irq_routing_msi msi;
84223598 1507 struct kvm_irq_routing_s390_adapter adapter;
5c919412 1508 struct kvm_irq_routing_hv_sint hv_sint;
49f48172
JK
1509 __u32 pad[8];
1510 } u;
1511};
1512
1513/* gsi routing entry types */
1514#define KVM_IRQ_ROUTING_IRQCHIP 1
1515#define KVM_IRQ_ROUTING_MSI 2
84223598 1516#define KVM_IRQ_ROUTING_S390_ADAPTER 3
5c919412 1517#define KVM_IRQ_ROUTING_HV_SINT 4
49f48172 1518
76a10b86 1519flags:
6f49b2f3
PB
1520- KVM_MSI_VALID_DEVID: used along with KVM_IRQ_ROUTING_MSI routing entry
1521 type, specifies that the devid field contains a valid value. The per-VM
1522 KVM_CAP_MSI_DEVID capability advertises the requirement to provide
1523 the device ID. If this capability is not available, userspace should
1524 never set the KVM_MSI_VALID_DEVID flag as the ioctl might fail.
76a10b86 1525- zero otherwise
49f48172
JK
1526
1527struct kvm_irq_routing_irqchip {
1528 __u32 irqchip;
1529 __u32 pin;
1530};
1531
1532struct kvm_irq_routing_msi {
1533 __u32 address_lo;
1534 __u32 address_hi;
1535 __u32 data;
76a10b86
EA
1536 union {
1537 __u32 pad;
1538 __u32 devid;
1539 };
49f48172
JK
1540};
1541
6f49b2f3
PB
1542If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier
1543for the device that wrote the MSI message. For PCI, this is usually a
1544BFD identifier in the lower 16 bits.
76a10b86 1545
37131313
RK
1546On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS
1547feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled,
1548address_hi bits 31-8 provide bits 31-8 of the destination id. Bits 7-0 of
1549address_hi must be zero.
1550
84223598
CH
1551struct kvm_irq_routing_s390_adapter {
1552 __u64 ind_addr;
1553 __u64 summary_addr;
1554 __u64 ind_offset;
1555 __u32 summary_offset;
1556 __u32 adapter_id;
1557};
1558
5c919412
AS
1559struct kvm_irq_routing_hv_sint {
1560 __u32 vcpu;
1561 __u32 sint;
1562};
414fa985 1563
414fa985
JK
1564
15654.55 KVM_SET_TSC_KHZ
92a1f12d
JR
1566
1567Capability: KVM_CAP_TSC_CONTROL
1568Architectures: x86
1569Type: vcpu ioctl
1570Parameters: virtual tsc_khz
1571Returns: 0 on success, -1 on error
1572
1573Specifies the tsc frequency for the virtual machine. The unit of the
1574frequency is KHz.
1575
414fa985
JK
1576
15774.56 KVM_GET_TSC_KHZ
92a1f12d
JR
1578
1579Capability: KVM_CAP_GET_TSC_KHZ
1580Architectures: x86
1581Type: vcpu ioctl
1582Parameters: none
1583Returns: virtual tsc-khz on success, negative value on error
1584
1585Returns the tsc frequency of the guest. The unit of the return value is
1586KHz. If the host has unstable tsc this ioctl returns -EIO instead as an
1587error.
1588
414fa985
JK
1589
15904.57 KVM_GET_LAPIC
e7677933
AK
1591
1592Capability: KVM_CAP_IRQCHIP
1593Architectures: x86
1594Type: vcpu ioctl
1595Parameters: struct kvm_lapic_state (out)
1596Returns: 0 on success, -1 on error
1597
1598#define KVM_APIC_REG_SIZE 0x400
1599struct kvm_lapic_state {
1600 char regs[KVM_APIC_REG_SIZE];
1601};
1602
1603Reads the Local APIC registers and copies them into the input argument. The
1604data format and layout are the same as documented in the architecture manual.
1605
37131313
RK
1606If KVM_X2APIC_API_USE_32BIT_IDS feature of KVM_CAP_X2APIC_API is
1607enabled, then the format of APIC_ID register depends on the APIC mode
1608(reported by MSR_IA32_APICBASE) of its VCPU. x2APIC stores APIC ID in
1609the APIC_ID register (bytes 32-35). xAPIC only allows an 8-bit APIC ID
1610which is stored in bits 31-24 of the APIC register, or equivalently in
1611byte 35 of struct kvm_lapic_state's regs field. KVM_GET_LAPIC must then
1612be called after MSR_IA32_APICBASE has been set with KVM_SET_MSR.
1613
1614If KVM_X2APIC_API_USE_32BIT_IDS feature is disabled, struct kvm_lapic_state
1615always uses xAPIC format.
1616
414fa985
JK
1617
16184.58 KVM_SET_LAPIC
e7677933
AK
1619
1620Capability: KVM_CAP_IRQCHIP
1621Architectures: x86
1622Type: vcpu ioctl
1623Parameters: struct kvm_lapic_state (in)
1624Returns: 0 on success, -1 on error
1625
1626#define KVM_APIC_REG_SIZE 0x400
1627struct kvm_lapic_state {
1628 char regs[KVM_APIC_REG_SIZE];
1629};
1630
df5cbb27 1631Copies the input argument into the Local APIC registers. The data format
e7677933
AK
1632and layout are the same as documented in the architecture manual.
1633
37131313
RK
1634The format of the APIC ID register (bytes 32-35 of struct kvm_lapic_state's
1635regs field) depends on the state of the KVM_CAP_X2APIC_API capability.
1636See the note in KVM_GET_LAPIC.
1637
414fa985
JK
1638
16394.59 KVM_IOEVENTFD
55399a02
SL
1640
1641Capability: KVM_CAP_IOEVENTFD
1642Architectures: all
1643Type: vm ioctl
1644Parameters: struct kvm_ioeventfd (in)
1645Returns: 0 on success, !0 on error
1646
1647This ioctl attaches or detaches an ioeventfd to a legal pio/mmio address
1648within the guest. A guest write in the registered address will signal the
1649provided event instead of triggering an exit.
1650
1651struct kvm_ioeventfd {
1652 __u64 datamatch;
1653 __u64 addr; /* legal pio/mmio address */
e9ea5069 1654 __u32 len; /* 0, 1, 2, 4, or 8 bytes */
55399a02
SL
1655 __s32 fd;
1656 __u32 flags;
1657 __u8 pad[36];
1658};
1659
2b83451b
CH
1660For the special case of virtio-ccw devices on s390, the ioevent is matched
1661to a subchannel/virtqueue tuple instead.
1662
55399a02
SL
1663The following flags are defined:
1664
1665#define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch)
1666#define KVM_IOEVENTFD_FLAG_PIO (1 << kvm_ioeventfd_flag_nr_pio)
1667#define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign)
2b83451b
CH
1668#define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
1669 (1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)
55399a02
SL
1670
1671If datamatch flag is set, the event will be signaled only if the written value
1672to the registered address is equal to datamatch in struct kvm_ioeventfd.
1673
2b83451b
CH
1674For virtio-ccw devices, addr contains the subchannel id and datamatch the
1675virtqueue index.
1676
e9ea5069
JW
1677With KVM_CAP_IOEVENTFD_ANY_LENGTH, a zero length ioeventfd is allowed, and
1678the kernel will ignore the length of guest write and may get a faster vmexit.
1679The speedup may only apply to specific architectures, but the ioeventfd will
1680work anyway.
414fa985
JK
1681
16824.60 KVM_DIRTY_TLB
dc83b8bc
SW
1683
1684Capability: KVM_CAP_SW_TLB
1685Architectures: ppc
1686Type: vcpu ioctl
1687Parameters: struct kvm_dirty_tlb (in)
1688Returns: 0 on success, -1 on error
1689
1690struct kvm_dirty_tlb {
1691 __u64 bitmap;
1692 __u32 num_dirty;
1693};
1694
1695This must be called whenever userspace has changed an entry in the shared
1696TLB, prior to calling KVM_RUN on the associated vcpu.
1697
1698The "bitmap" field is the userspace address of an array. This array
1699consists of a number of bits, equal to the total number of TLB entries as
1700determined by the last successful call to KVM_CONFIG_TLB, rounded up to the
1701nearest multiple of 64.
1702
1703Each bit corresponds to one TLB entry, ordered the same as in the shared TLB
1704array.
1705
1706The array is little-endian: the bit 0 is the least significant bit of the
1707first byte, bit 8 is the least significant bit of the second byte, etc.
1708This avoids any complications with differing word sizes.
1709
1710The "num_dirty" field is a performance hint for KVM to determine whether it
1711should skip processing the bitmap and just invalidate everything. It must
1712be set to the number of set bits in the bitmap.
1713
414fa985 1714
54738c09
DG
17154.62 KVM_CREATE_SPAPR_TCE
1716
1717Capability: KVM_CAP_SPAPR_TCE
1718Architectures: powerpc
1719Type: vm ioctl
1720Parameters: struct kvm_create_spapr_tce (in)
1721Returns: file descriptor for manipulating the created TCE table
1722
1723This creates a virtual TCE (translation control entry) table, which
1724is an IOMMU for PAPR-style virtual I/O. It is used to translate
1725logical addresses used in virtual I/O into guest physical addresses,
1726and provides a scatter/gather capability for PAPR virtual I/O.
1727
1728/* for KVM_CAP_SPAPR_TCE */
1729struct kvm_create_spapr_tce {
1730 __u64 liobn;
1731 __u32 window_size;
1732};
1733
1734The liobn field gives the logical IO bus number for which to create a
1735TCE table. The window_size field specifies the size of the DMA window
1736which this TCE table will translate - the table will contain one 64
1737bit TCE entry for every 4kiB of the DMA window.
1738
1739When the guest issues an H_PUT_TCE hcall on a liobn for which a TCE
1740table has been created using this ioctl(), the kernel will handle it
1741in real mode, updating the TCE table. H_PUT_TCE calls for other
1742liobns will cause a vm exit and must be handled by userspace.
1743
1744The return value is a file descriptor which can be passed to mmap(2)
1745to map the created TCE table into userspace. This lets userspace read
1746the entries written by kernel-handled H_PUT_TCE calls, and also lets
1747userspace update the TCE table directly which is useful in some
1748circumstances.
1749
414fa985 1750
aa04b4cc
PM
17514.63 KVM_ALLOCATE_RMA
1752
1753Capability: KVM_CAP_PPC_RMA
1754Architectures: powerpc
1755Type: vm ioctl
1756Parameters: struct kvm_allocate_rma (out)
1757Returns: file descriptor for mapping the allocated RMA
1758
1759This allocates a Real Mode Area (RMA) from the pool allocated at boot
1760time by the kernel. An RMA is a physically-contiguous, aligned region
1761of memory used on older POWER processors to provide the memory which
1762will be accessed by real-mode (MMU off) accesses in a KVM guest.
1763POWER processors support a set of sizes for the RMA that usually
1764includes 64MB, 128MB, 256MB and some larger powers of two.
1765
1766/* for KVM_ALLOCATE_RMA */
1767struct kvm_allocate_rma {
1768 __u64 rma_size;
1769};
1770
1771The return value is a file descriptor which can be passed to mmap(2)
1772to map the allocated RMA into userspace. The mapped area can then be
1773passed to the KVM_SET_USER_MEMORY_REGION ioctl to establish it as the
1774RMA for a virtual machine. The size of the RMA in bytes (which is
1775fixed at host kernel boot time) is returned in the rma_size field of
1776the argument structure.
1777
1778The KVM_CAP_PPC_RMA capability is 1 or 2 if the KVM_ALLOCATE_RMA ioctl
1779is supported; 2 if the processor requires all virtual machines to have
1780an RMA, or 1 if the processor can use an RMA but doesn't require it,
1781because it supports the Virtual RMA (VRMA) facility.
1782
414fa985 1783
3f745f1e
AK
17844.64 KVM_NMI
1785
1786Capability: KVM_CAP_USER_NMI
1787Architectures: x86
1788Type: vcpu ioctl
1789Parameters: none
1790Returns: 0 on success, -1 on error
1791
1792Queues an NMI on the thread's vcpu. Note this is well defined only
1793when KVM_CREATE_IRQCHIP has not been called, since this is an interface
1794between the virtual cpu core and virtual local APIC. After KVM_CREATE_IRQCHIP
1795has been called, this interface is completely emulated within the kernel.
1796
1797To use this to emulate the LINT1 input with KVM_CREATE_IRQCHIP, use the
1798following algorithm:
1799
5d4f6f3d 1800 - pause the vcpu
3f745f1e
AK
1801 - read the local APIC's state (KVM_GET_LAPIC)
1802 - check whether changing LINT1 will queue an NMI (see the LVT entry for LINT1)
1803 - if so, issue KVM_NMI
1804 - resume the vcpu
1805
1806Some guests configure the LINT1 NMI input to cause a panic, aiding in
1807debugging.
1808
414fa985 1809
e24ed81f 18104.65 KVM_S390_UCAS_MAP
27e0393f
CO
1811
1812Capability: KVM_CAP_S390_UCONTROL
1813Architectures: s390
1814Type: vcpu ioctl
1815Parameters: struct kvm_s390_ucas_mapping (in)
1816Returns: 0 in case of success
1817
1818The parameter is defined like this:
1819 struct kvm_s390_ucas_mapping {
1820 __u64 user_addr;
1821 __u64 vcpu_addr;
1822 __u64 length;
1823 };
1824
1825This ioctl maps the memory at "user_addr" with the length "length" to
1826the vcpu's address space starting at "vcpu_addr". All parameters need to
f884ab15 1827be aligned by 1 megabyte.
27e0393f 1828
414fa985 1829
e24ed81f 18304.66 KVM_S390_UCAS_UNMAP
27e0393f
CO
1831
1832Capability: KVM_CAP_S390_UCONTROL
1833Architectures: s390
1834Type: vcpu ioctl
1835Parameters: struct kvm_s390_ucas_mapping (in)
1836Returns: 0 in case of success
1837
1838The parameter is defined like this:
1839 struct kvm_s390_ucas_mapping {
1840 __u64 user_addr;
1841 __u64 vcpu_addr;
1842 __u64 length;
1843 };
1844
1845This ioctl unmaps the memory in the vcpu's address space starting at
1846"vcpu_addr" with the length "length". The field "user_addr" is ignored.
f884ab15 1847All parameters need to be aligned by 1 megabyte.
27e0393f 1848
414fa985 1849
e24ed81f 18504.67 KVM_S390_VCPU_FAULT
ccc7910f
CO
1851
1852Capability: KVM_CAP_S390_UCONTROL
1853Architectures: s390
1854Type: vcpu ioctl
1855Parameters: vcpu absolute address (in)
1856Returns: 0 in case of success
1857
1858This call creates a page table entry on the virtual cpu's address space
1859(for user controlled virtual machines) or the virtual machine's address
1860space (for regular virtual machines). This only works for minor faults,
1861thus it's recommended to access subject memory page via the user page
1862table upfront. This is useful to handle validity intercepts for user
1863controlled virtual machines to fault in the virtual cpu's lowcore pages
1864prior to calling the KVM_RUN ioctl.
1865
414fa985 1866
e24ed81f
AG
18674.68 KVM_SET_ONE_REG
1868
1869Capability: KVM_CAP_ONE_REG
1870Architectures: all
1871Type: vcpu ioctl
1872Parameters: struct kvm_one_reg (in)
1873Returns: 0 on success, negative value on failure
395f562f
DM
1874Errors:
1875  ENOENT:   no such register
50036ad0 1876  EPERM:    register access forbidden for architecture-dependent reasons
395f562f 1877  EINVAL:   other errors, such as bad size encoding for a known register
e24ed81f
AG
1878
1879struct kvm_one_reg {
1880 __u64 id;
1881 __u64 addr;
1882};
1883
1884Using this ioctl, a single vcpu register can be set to a specific value
1885defined by user space with the passed in struct kvm_one_reg, where id
1886refers to the register identifier as described below and addr is a pointer
1887to a variable with the respective size. There can be architecture agnostic
1888and architecture specific registers. Each have their own range of operation
1889and their own constants and width. To keep track of the implemented
1890registers, find a list below:
1891
bf5590f3
JH
1892 Arch | Register | Width (bits)
1893 | |
1894 PPC | KVM_REG_PPC_HIOR | 64
1895 PPC | KVM_REG_PPC_IAC1 | 64
1896 PPC | KVM_REG_PPC_IAC2 | 64
1897 PPC | KVM_REG_PPC_IAC3 | 64
1898 PPC | KVM_REG_PPC_IAC4 | 64
1899 PPC | KVM_REG_PPC_DAC1 | 64
1900 PPC | KVM_REG_PPC_DAC2 | 64
1901 PPC | KVM_REG_PPC_DABR | 64
1902 PPC | KVM_REG_PPC_DSCR | 64
1903 PPC | KVM_REG_PPC_PURR | 64
1904 PPC | KVM_REG_PPC_SPURR | 64
1905 PPC | KVM_REG_PPC_DAR | 64
1906 PPC | KVM_REG_PPC_DSISR | 32
1907 PPC | KVM_REG_PPC_AMR | 64
1908 PPC | KVM_REG_PPC_UAMOR | 64
1909 PPC | KVM_REG_PPC_MMCR0 | 64
1910 PPC | KVM_REG_PPC_MMCR1 | 64
1911 PPC | KVM_REG_PPC_MMCRA | 64
1912 PPC | KVM_REG_PPC_MMCR2 | 64
1913 PPC | KVM_REG_PPC_MMCRS | 64
1914 PPC | KVM_REG_PPC_SIAR | 64
1915 PPC | KVM_REG_PPC_SDAR | 64
1916 PPC | KVM_REG_PPC_SIER | 64
1917 PPC | KVM_REG_PPC_PMC1 | 32
1918 PPC | KVM_REG_PPC_PMC2 | 32
1919 PPC | KVM_REG_PPC_PMC3 | 32
1920 PPC | KVM_REG_PPC_PMC4 | 32
1921 PPC | KVM_REG_PPC_PMC5 | 32
1922 PPC | KVM_REG_PPC_PMC6 | 32
1923 PPC | KVM_REG_PPC_PMC7 | 32
1924 PPC | KVM_REG_PPC_PMC8 | 32
1925 PPC | KVM_REG_PPC_FPR0 | 64
a8bd19ef 1926 ...
bf5590f3
JH
1927 PPC | KVM_REG_PPC_FPR31 | 64
1928 PPC | KVM_REG_PPC_VR0 | 128
a8bd19ef 1929 ...
bf5590f3
JH
1930 PPC | KVM_REG_PPC_VR31 | 128
1931 PPC | KVM_REG_PPC_VSR0 | 128
a8bd19ef 1932 ...
bf5590f3
JH
1933 PPC | KVM_REG_PPC_VSR31 | 128
1934 PPC | KVM_REG_PPC_FPSCR | 64
1935 PPC | KVM_REG_PPC_VSCR | 32
1936 PPC | KVM_REG_PPC_VPA_ADDR | 64
1937 PPC | KVM_REG_PPC_VPA_SLB | 128
1938 PPC | KVM_REG_PPC_VPA_DTL | 128
1939 PPC | KVM_REG_PPC_EPCR | 32
1940 PPC | KVM_REG_PPC_EPR | 32
1941 PPC | KVM_REG_PPC_TCR | 32
1942 PPC | KVM_REG_PPC_TSR | 32
1943 PPC | KVM_REG_PPC_OR_TSR | 32
1944 PPC | KVM_REG_PPC_CLEAR_TSR | 32
1945 PPC | KVM_REG_PPC_MAS0 | 32
1946 PPC | KVM_REG_PPC_MAS1 | 32
1947 PPC | KVM_REG_PPC_MAS2 | 64
1948 PPC | KVM_REG_PPC_MAS7_3 | 64
1949 PPC | KVM_REG_PPC_MAS4 | 32
1950 PPC | KVM_REG_PPC_MAS6 | 32
1951 PPC | KVM_REG_PPC_MMUCFG | 32
1952 PPC | KVM_REG_PPC_TLB0CFG | 32
1953 PPC | KVM_REG_PPC_TLB1CFG | 32
1954 PPC | KVM_REG_PPC_TLB2CFG | 32
1955 PPC | KVM_REG_PPC_TLB3CFG | 32
1956 PPC | KVM_REG_PPC_TLB0PS | 32
1957 PPC | KVM_REG_PPC_TLB1PS | 32
1958 PPC | KVM_REG_PPC_TLB2PS | 32
1959 PPC | KVM_REG_PPC_TLB3PS | 32
1960 PPC | KVM_REG_PPC_EPTCFG | 32
1961 PPC | KVM_REG_PPC_ICP_STATE | 64
1962 PPC | KVM_REG_PPC_TB_OFFSET | 64
1963 PPC | KVM_REG_PPC_SPMC1 | 32
1964 PPC | KVM_REG_PPC_SPMC2 | 32
1965 PPC | KVM_REG_PPC_IAMR | 64
1966 PPC | KVM_REG_PPC_TFHAR | 64
1967 PPC | KVM_REG_PPC_TFIAR | 64
1968 PPC | KVM_REG_PPC_TEXASR | 64
1969 PPC | KVM_REG_PPC_FSCR | 64
1970 PPC | KVM_REG_PPC_PSPB | 32
1971 PPC | KVM_REG_PPC_EBBHR | 64
1972 PPC | KVM_REG_PPC_EBBRR | 64
1973 PPC | KVM_REG_PPC_BESCR | 64
1974 PPC | KVM_REG_PPC_TAR | 64
1975 PPC | KVM_REG_PPC_DPDES | 64
1976 PPC | KVM_REG_PPC_DAWR | 64
1977 PPC | KVM_REG_PPC_DAWRX | 64
1978 PPC | KVM_REG_PPC_CIABR | 64
1979 PPC | KVM_REG_PPC_IC | 64
1980 PPC | KVM_REG_PPC_VTB | 64
1981 PPC | KVM_REG_PPC_CSIGR | 64
1982 PPC | KVM_REG_PPC_TACR | 64
1983 PPC | KVM_REG_PPC_TCSCR | 64
1984 PPC | KVM_REG_PPC_PID | 64
1985 PPC | KVM_REG_PPC_ACOP | 64
1986 PPC | KVM_REG_PPC_VRSAVE | 32
cc568ead
PB
1987 PPC | KVM_REG_PPC_LPCR | 32
1988 PPC | KVM_REG_PPC_LPCR_64 | 64
bf5590f3
JH
1989 PPC | KVM_REG_PPC_PPR | 64
1990 PPC | KVM_REG_PPC_ARCH_COMPAT | 32
1991 PPC | KVM_REG_PPC_DABRX | 32
1992 PPC | KVM_REG_PPC_WORT | 64
bc8a4e5c
BB
1993 PPC | KVM_REG_PPC_SPRG9 | 64
1994 PPC | KVM_REG_PPC_DBSR | 32
e9cf1e08
PM
1995 PPC | KVM_REG_PPC_TIDR | 64
1996 PPC | KVM_REG_PPC_PSSCR | 64
5855564c 1997 PPC | KVM_REG_PPC_DEC_EXPIRY | 64
30323418 1998 PPC | KVM_REG_PPC_PTCR | 64
bf5590f3 1999 PPC | KVM_REG_PPC_TM_GPR0 | 64
3b783474 2000 ...
bf5590f3
JH
2001 PPC | KVM_REG_PPC_TM_GPR31 | 64
2002 PPC | KVM_REG_PPC_TM_VSR0 | 128
3b783474 2003 ...
bf5590f3
JH
2004 PPC | KVM_REG_PPC_TM_VSR63 | 128
2005 PPC | KVM_REG_PPC_TM_CR | 64
2006 PPC | KVM_REG_PPC_TM_LR | 64
2007 PPC | KVM_REG_PPC_TM_CTR | 64
2008 PPC | KVM_REG_PPC_TM_FPSCR | 64
2009 PPC | KVM_REG_PPC_TM_AMR | 64
2010 PPC | KVM_REG_PPC_TM_PPR | 64
2011 PPC | KVM_REG_PPC_TM_VRSAVE | 64
2012 PPC | KVM_REG_PPC_TM_VSCR | 32
2013 PPC | KVM_REG_PPC_TM_DSCR | 64
2014 PPC | KVM_REG_PPC_TM_TAR | 64
0d808df0 2015 PPC | KVM_REG_PPC_TM_XER | 64
c2d2c21b
JH
2016 | |
2017 MIPS | KVM_REG_MIPS_R0 | 64
2018 ...
2019 MIPS | KVM_REG_MIPS_R31 | 64
2020 MIPS | KVM_REG_MIPS_HI | 64
2021 MIPS | KVM_REG_MIPS_LO | 64
2022 MIPS | KVM_REG_MIPS_PC | 64
2023 MIPS | KVM_REG_MIPS_CP0_INDEX | 32
013044cc
JH
2024 MIPS | KVM_REG_MIPS_CP0_ENTRYLO0 | 64
2025 MIPS | KVM_REG_MIPS_CP0_ENTRYLO1 | 64
c2d2c21b 2026 MIPS | KVM_REG_MIPS_CP0_CONTEXT | 64
dffe042f 2027 MIPS | KVM_REG_MIPS_CP0_CONTEXTCONFIG| 32
c2d2c21b 2028 MIPS | KVM_REG_MIPS_CP0_USERLOCAL | 64
dffe042f 2029 MIPS | KVM_REG_MIPS_CP0_XCONTEXTCONFIG| 64
c2d2c21b 2030 MIPS | KVM_REG_MIPS_CP0_PAGEMASK | 32
c992a4f6 2031 MIPS | KVM_REG_MIPS_CP0_PAGEGRAIN | 32
4b7de028
JH
2032 MIPS | KVM_REG_MIPS_CP0_SEGCTL0 | 64
2033 MIPS | KVM_REG_MIPS_CP0_SEGCTL1 | 64
2034 MIPS | KVM_REG_MIPS_CP0_SEGCTL2 | 64
5a2f352f
JH
2035 MIPS | KVM_REG_MIPS_CP0_PWBASE | 64
2036 MIPS | KVM_REG_MIPS_CP0_PWFIELD | 64
2037 MIPS | KVM_REG_MIPS_CP0_PWSIZE | 64
c2d2c21b 2038 MIPS | KVM_REG_MIPS_CP0_WIRED | 32
5a2f352f 2039 MIPS | KVM_REG_MIPS_CP0_PWCTL | 32
c2d2c21b
JH
2040 MIPS | KVM_REG_MIPS_CP0_HWRENA | 32
2041 MIPS | KVM_REG_MIPS_CP0_BADVADDR | 64
edc89260
JH
2042 MIPS | KVM_REG_MIPS_CP0_BADINSTR | 32
2043 MIPS | KVM_REG_MIPS_CP0_BADINSTRP | 32
c2d2c21b
JH
2044 MIPS | KVM_REG_MIPS_CP0_COUNT | 32
2045 MIPS | KVM_REG_MIPS_CP0_ENTRYHI | 64
2046 MIPS | KVM_REG_MIPS_CP0_COMPARE | 32
2047 MIPS | KVM_REG_MIPS_CP0_STATUS | 32
ad58d4d4 2048 MIPS | KVM_REG_MIPS_CP0_INTCTL | 32
c2d2c21b
JH
2049 MIPS | KVM_REG_MIPS_CP0_CAUSE | 32
2050 MIPS | KVM_REG_MIPS_CP0_EPC | 64
1068eaaf 2051 MIPS | KVM_REG_MIPS_CP0_PRID | 32
7801bbe1 2052 MIPS | KVM_REG_MIPS_CP0_EBASE | 64
c2d2c21b
JH
2053 MIPS | KVM_REG_MIPS_CP0_CONFIG | 32
2054 MIPS | KVM_REG_MIPS_CP0_CONFIG1 | 32
2055 MIPS | KVM_REG_MIPS_CP0_CONFIG2 | 32
2056 MIPS | KVM_REG_MIPS_CP0_CONFIG3 | 32
c771607a
JH
2057 MIPS | KVM_REG_MIPS_CP0_CONFIG4 | 32
2058 MIPS | KVM_REG_MIPS_CP0_CONFIG5 | 32
c2d2c21b 2059 MIPS | KVM_REG_MIPS_CP0_CONFIG7 | 32
c992a4f6 2060 MIPS | KVM_REG_MIPS_CP0_XCONTEXT | 64
c2d2c21b 2061 MIPS | KVM_REG_MIPS_CP0_ERROREPC | 64
05108709
JH
2062 MIPS | KVM_REG_MIPS_CP0_KSCRATCH1 | 64
2063 MIPS | KVM_REG_MIPS_CP0_KSCRATCH2 | 64
2064 MIPS | KVM_REG_MIPS_CP0_KSCRATCH3 | 64
2065 MIPS | KVM_REG_MIPS_CP0_KSCRATCH4 | 64
2066 MIPS | KVM_REG_MIPS_CP0_KSCRATCH5 | 64
2067 MIPS | KVM_REG_MIPS_CP0_KSCRATCH6 | 64
d42a008f 2068 MIPS | KVM_REG_MIPS_CP0_MAAR(0..63) | 64
c2d2c21b
JH
2069 MIPS | KVM_REG_MIPS_COUNT_CTL | 64
2070 MIPS | KVM_REG_MIPS_COUNT_RESUME | 64
2071 MIPS | KVM_REG_MIPS_COUNT_HZ | 64
379245cd
JH
2072 MIPS | KVM_REG_MIPS_FPR_32(0..31) | 32
2073 MIPS | KVM_REG_MIPS_FPR_64(0..31) | 64
ab86bd60 2074 MIPS | KVM_REG_MIPS_VEC_128(0..31) | 128
379245cd
JH
2075 MIPS | KVM_REG_MIPS_FCR_IR | 32
2076 MIPS | KVM_REG_MIPS_FCR_CSR | 32
ab86bd60
JH
2077 MIPS | KVM_REG_MIPS_MSA_IR | 32
2078 MIPS | KVM_REG_MIPS_MSA_CSR | 32
414fa985 2079
749cf76c
CD
2080ARM registers are mapped using the lower 32 bits. The upper 16 of that
2081is the register group type, or coprocessor number:
2082
2083ARM core registers have the following id bit patterns:
aa404ddf 2084 0x4020 0000 0010 <index into the kvm_regs struct:16>
749cf76c 2085
1138245c 2086ARM 32-bit CP15 registers have the following id bit patterns:
aa404ddf 2087 0x4020 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>
1138245c
CD
2088
2089ARM 64-bit CP15 registers have the following id bit patterns:
aa404ddf 2090 0x4030 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>
749cf76c 2091
c27581ed 2092ARM CCSIDR registers are demultiplexed by CSSELR value:
aa404ddf 2093 0x4020 0000 0011 00 <csselr:8>
749cf76c 2094
4fe21e4c 2095ARM 32-bit VFP control registers have the following id bit patterns:
aa404ddf 2096 0x4020 0000 0012 1 <regno:12>
4fe21e4c
RR
2097
2098ARM 64-bit FP registers have the following id bit patterns:
aa404ddf 2099 0x4030 0000 0012 0 <regno:12>
4fe21e4c 2100
85bd0ba1
MZ
2101ARM firmware pseudo-registers have the following bit pattern:
2102 0x4030 0000 0014 <regno:16>
2103
379e04c7
MZ
2104
2105arm64 registers are mapped using the lower 32 bits. The upper 16 of
2106that is the register group type, or coprocessor number:
2107
2108arm64 core/FP-SIMD registers have the following id bit patterns. Note
2109that the size of the access is variable, as the kvm_regs structure
2110contains elements ranging from 32 to 128 bits. The index is a 32bit
2111value in the kvm_regs structure seen as a 32bit array.
2112 0x60x0 0000 0010 <index into the kvm_regs struct:16>
2113
fd3bc912
DM
2114Specifically:
2115 Encoding Register Bits kvm_regs member
2116----------------------------------------------------------------
2117 0x6030 0000 0010 0000 X0 64 regs.regs[0]
2118 0x6030 0000 0010 0002 X1 64 regs.regs[1]
2119 ...
2120 0x6030 0000 0010 003c X30 64 regs.regs[30]
2121 0x6030 0000 0010 003e SP 64 regs.sp
2122 0x6030 0000 0010 0040 PC 64 regs.pc
2123 0x6030 0000 0010 0042 PSTATE 64 regs.pstate
2124 0x6030 0000 0010 0044 SP_EL1 64 sp_el1
2125 0x6030 0000 0010 0046 ELR_EL1 64 elr_el1
2126 0x6030 0000 0010 0048 SPSR_EL1 64 spsr[KVM_SPSR_EL1] (alias SPSR_SVC)
2127 0x6030 0000 0010 004a SPSR_ABT 64 spsr[KVM_SPSR_ABT]
2128 0x6030 0000 0010 004c SPSR_UND 64 spsr[KVM_SPSR_UND]
2129 0x6030 0000 0010 004e SPSR_IRQ 64 spsr[KVM_SPSR_IRQ]
2130 0x6060 0000 0010 0050 SPSR_FIQ 64 spsr[KVM_SPSR_FIQ]
50036ad0
DM
2131 0x6040 0000 0010 0054 V0 128 fp_regs.vregs[0] (*)
2132 0x6040 0000 0010 0058 V1 128 fp_regs.vregs[1] (*)
fd3bc912 2133 ...
50036ad0 2134 0x6040 0000 0010 00d0 V31 128 fp_regs.vregs[31] (*)
fd3bc912
DM
2135 0x6020 0000 0010 00d4 FPSR 32 fp_regs.fpsr
2136 0x6020 0000 0010 00d5 FPCR 32 fp_regs.fpcr
2137
50036ad0
DM
2138(*) These encodings are not accepted for SVE-enabled vcpus. See
2139 KVM_ARM_VCPU_INIT.
2140
2141 The equivalent register content can be accessed via bits [127:0] of
2142 the corresponding SVE Zn registers instead for vcpus that have SVE
2143 enabled (see below).
2144
379e04c7
MZ
2145arm64 CCSIDR registers are demultiplexed by CSSELR value:
2146 0x6020 0000 0011 00 <csselr:8>
2147
2148arm64 system registers have the following id bit patterns:
2149 0x6030 0000 0013 <op0:2> <op1:3> <crn:4> <crm:4> <op2:3>
2150
85bd0ba1
MZ
2151arm64 firmware pseudo-registers have the following bit pattern:
2152 0x6030 0000 0014 <regno:16>
2153
50036ad0
DM
2154arm64 SVE registers have the following bit patterns:
2155 0x6080 0000 0015 00 <n:5> <slice:5> Zn bits[2048*slice + 2047 : 2048*slice]
2156 0x6050 0000 0015 04 <n:4> <slice:5> Pn bits[256*slice + 255 : 256*slice]
2157 0x6050 0000 0015 060 <slice:5> FFR bits[256*slice + 255 : 256*slice]
2158 0x6060 0000 0015 ffff KVM_REG_ARM64_SVE_VLS pseudo-register
2159
2160Access to slices beyond the maximum vector length configured for the
2161vcpu (i.e., where 16 * slice >= max_vq (**)) will fail with ENOENT.
2162
2163These registers are only accessible on vcpus for which SVE is enabled.
2164See KVM_ARM_VCPU_INIT for details.
2165
2166In addition, except for KVM_REG_ARM64_SVE_VLS, these registers are not
2167accessible until the vcpu's SVE configuration has been finalized
2168using KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE). See KVM_ARM_VCPU_INIT
2169and KVM_ARM_VCPU_FINALIZE for more information about this procedure.
2170
2171KVM_REG_ARM64_SVE_VLS is a pseudo-register that allows the set of vector
2172lengths supported by the vcpu to be discovered and configured by
2173userspace. When transferred to or from user memory via KVM_GET_ONE_REG
4bd774e5
DM
2174or KVM_SET_ONE_REG, the value of this register is of type
2175__u64[KVM_ARM64_SVE_VLS_WORDS], and encodes the set of vector lengths as
2176follows:
50036ad0 2177
4bd774e5 2178__u64 vector_lengths[KVM_ARM64_SVE_VLS_WORDS];
50036ad0
DM
2179
2180if (vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX &&
4bd774e5
DM
2181 ((vector_lengths[(vq - KVM_ARM64_SVE_VQ_MIN) / 64] >>
2182 ((vq - KVM_ARM64_SVE_VQ_MIN) % 64)) & 1))
50036ad0
DM
2183 /* Vector length vq * 16 bytes supported */
2184else
2185 /* Vector length vq * 16 bytes not supported */
2186
2187(**) The maximum value vq for which the above condition is true is
2188max_vq. This is the maximum vector length available to the guest on
2189this vcpu, and determines which register slices are visible through
2190this ioctl interface.
2191
2192(See Documentation/arm64/sve.txt for an explanation of the "vq"
2193nomenclature.)
2194
2195KVM_REG_ARM64_SVE_VLS is only accessible after KVM_ARM_VCPU_INIT.
2196KVM_ARM_VCPU_INIT initialises it to the best set of vector lengths that
2197the host supports.
2198
2199Userspace may subsequently modify it if desired until the vcpu's SVE
2200configuration is finalized using KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE).
2201
2202Apart from simply removing all vector lengths from the host set that
2203exceed some value, support for arbitrarily chosen sets of vector lengths
2204is hardware-dependent and may not be available. Attempting to configure
2205an invalid set of vector lengths via KVM_SET_ONE_REG will fail with
2206EINVAL.
2207
2208After the vcpu's SVE configuration is finalized, further attempts to
2209write this register will fail with EPERM.
2210
c2d2c21b
JH
2211
2212MIPS registers are mapped using the lower 32 bits. The upper 16 of that is
2213the register group type:
2214
2215MIPS core registers (see above) have the following id bit patterns:
2216 0x7030 0000 0000 <reg:16>
2217
2218MIPS CP0 registers (see KVM_REG_MIPS_CP0_* above) have the following id bit
2219patterns depending on whether they're 32-bit or 64-bit registers:
2220 0x7020 0000 0001 00 <reg:5> <sel:3> (32-bit)
2221 0x7030 0000 0001 00 <reg:5> <sel:3> (64-bit)
2222
013044cc
JH
2223Note: KVM_REG_MIPS_CP0_ENTRYLO0 and KVM_REG_MIPS_CP0_ENTRYLO1 are the MIPS64
2224versions of the EntryLo registers regardless of the word size of the host
2225hardware, host kernel, guest, and whether XPA is present in the guest, i.e.
2226with the RI and XI bits (if they exist) in bits 63 and 62 respectively, and
2227the PFNX field starting at bit 30.
2228
d42a008f
JH
2229MIPS MAARs (see KVM_REG_MIPS_CP0_MAAR(*) above) have the following id bit
2230patterns:
2231 0x7030 0000 0001 01 <reg:8>
2232
c2d2c21b
JH
2233MIPS KVM control registers (see above) have the following id bit patterns:
2234 0x7030 0000 0002 <reg:16>
2235
379245cd
JH
2236MIPS FPU registers (see KVM_REG_MIPS_FPR_{32,64}() above) have the following
2237id bit patterns depending on the size of the register being accessed. They are
2238always accessed according to the current guest FPU mode (Status.FR and
2239Config5.FRE), i.e. as the guest would see them, and they become unpredictable
ab86bd60
JH
2240if the guest FPU mode is changed. MIPS SIMD Architecture (MSA) vector
2241registers (see KVM_REG_MIPS_VEC_128() above) have similar patterns as they
2242overlap the FPU registers:
379245cd
JH
2243 0x7020 0000 0003 00 <0:3> <reg:5> (32-bit FPU registers)
2244 0x7030 0000 0003 00 <0:3> <reg:5> (64-bit FPU registers)
ab86bd60 2245 0x7040 0000 0003 00 <0:3> <reg:5> (128-bit MSA vector registers)
379245cd
JH
2246
2247MIPS FPU control registers (see KVM_REG_MIPS_FCR_{IR,CSR} above) have the
2248following id bit patterns:
2249 0x7020 0000 0003 01 <0:3> <reg:5>
2250
ab86bd60
JH
2251MIPS MSA control registers (see KVM_REG_MIPS_MSA_{IR,CSR} above) have the
2252following id bit patterns:
2253 0x7020 0000 0003 02 <0:3> <reg:5>
2254
c2d2c21b 2255
e24ed81f
AG
22564.69 KVM_GET_ONE_REG
2257
2258Capability: KVM_CAP_ONE_REG
2259Architectures: all
2260Type: vcpu ioctl
2261Parameters: struct kvm_one_reg (in and out)
2262Returns: 0 on success, negative value on failure
395f562f
DM
2263Errors:
2264  ENOENT:   no such register
50036ad0 2265  EPERM:    register access forbidden for architecture-dependent reasons
395f562f 2266  EINVAL:   other errors, such as bad size encoding for a known register
e24ed81f
AG
2267
2268This ioctl allows to receive the value of a single register implemented
2269in a vcpu. The register to read is indicated by the "id" field of the
2270kvm_one_reg struct passed in. On success, the register value can be found
2271at the memory location pointed to by "addr".
2272
2273The list of registers accessible using this interface is identical to the
2e232702 2274list in 4.68.
e24ed81f 2275
414fa985 2276
1c0b28c2
EM
22774.70 KVM_KVMCLOCK_CTRL
2278
2279Capability: KVM_CAP_KVMCLOCK_CTRL
2280Architectures: Any that implement pvclocks (currently x86 only)
2281Type: vcpu ioctl
2282Parameters: None
2283Returns: 0 on success, -1 on error
2284
2285This signals to the host kernel that the specified guest is being paused by
2286userspace. The host will set a flag in the pvclock structure that is checked
2287from the soft lockup watchdog. The flag is part of the pvclock structure that
2288is shared between guest and host, specifically the second bit of the flags
2289field of the pvclock_vcpu_time_info structure. It will be set exclusively by
2290the host and read/cleared exclusively by the guest. The guest operation of
2291checking and clearing the flag must an atomic operation so
2292load-link/store-conditional, or equivalent must be used. There are two cases
2293where the guest will clear the flag: when the soft lockup watchdog timer resets
2294itself or when a soft lockup is detected. This ioctl can be called any time
2295after pausing the vcpu, but before it is resumed.
2296
414fa985 2297
07975ad3
JK
22984.71 KVM_SIGNAL_MSI
2299
2300Capability: KVM_CAP_SIGNAL_MSI
2988509d 2301Architectures: x86 arm arm64
07975ad3
JK
2302Type: vm ioctl
2303Parameters: struct kvm_msi (in)
2304Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error
2305
2306Directly inject a MSI message. Only valid with in-kernel irqchip that handles
2307MSI messages.
2308
2309struct kvm_msi {
2310 __u32 address_lo;
2311 __u32 address_hi;
2312 __u32 data;
2313 __u32 flags;
2b8ddd93
AP
2314 __u32 devid;
2315 __u8 pad[12];
07975ad3
JK
2316};
2317
6f49b2f3
PB
2318flags: KVM_MSI_VALID_DEVID: devid contains a valid value. The per-VM
2319 KVM_CAP_MSI_DEVID capability advertises the requirement to provide
2320 the device ID. If this capability is not available, userspace
2321 should never set the KVM_MSI_VALID_DEVID flag as the ioctl might fail.
2b8ddd93 2322
6f49b2f3
PB
2323If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier
2324for the device that wrote the MSI message. For PCI, this is usually a
2325BFD identifier in the lower 16 bits.
07975ad3 2326
055b6ae9
PB
2327On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS
2328feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled,
2329address_hi bits 31-8 provide bits 31-8 of the destination id. Bits 7-0 of
2330address_hi must be zero.
37131313 2331
414fa985 2332
0589ff6c
JK
23334.71 KVM_CREATE_PIT2
2334
2335Capability: KVM_CAP_PIT2
2336Architectures: x86
2337Type: vm ioctl
2338Parameters: struct kvm_pit_config (in)
2339Returns: 0 on success, -1 on error
2340
2341Creates an in-kernel device model for the i8254 PIT. This call is only valid
2342after enabling in-kernel irqchip support via KVM_CREATE_IRQCHIP. The following
2343parameters have to be passed:
2344
2345struct kvm_pit_config {
2346 __u32 flags;
2347 __u32 pad[15];
2348};
2349
2350Valid flags are:
2351
2352#define KVM_PIT_SPEAKER_DUMMY 1 /* emulate speaker port stub */
2353
b6ddf05f
JK
2354PIT timer interrupts may use a per-VM kernel thread for injection. If it
2355exists, this thread will have a name of the following pattern:
2356
2357kvm-pit/<owner-process-pid>
2358
2359When running a guest with elevated priorities, the scheduling parameters of
2360this thread may have to be adjusted accordingly.
2361
0589ff6c
JK
2362This IOCTL replaces the obsolete KVM_CREATE_PIT.
2363
2364
23654.72 KVM_GET_PIT2
2366
2367Capability: KVM_CAP_PIT_STATE2
2368Architectures: x86
2369Type: vm ioctl
2370Parameters: struct kvm_pit_state2 (out)
2371Returns: 0 on success, -1 on error
2372
2373Retrieves the state of the in-kernel PIT model. Only valid after
2374KVM_CREATE_PIT2. The state is returned in the following structure:
2375
2376struct kvm_pit_state2 {
2377 struct kvm_pit_channel_state channels[3];
2378 __u32 flags;
2379 __u32 reserved[9];
2380};
2381
2382Valid flags are:
2383
2384/* disable PIT in HPET legacy mode */
2385#define KVM_PIT_FLAGS_HPET_LEGACY 0x00000001
2386
2387This IOCTL replaces the obsolete KVM_GET_PIT.
2388
2389
23904.73 KVM_SET_PIT2
2391
2392Capability: KVM_CAP_PIT_STATE2
2393Architectures: x86
2394Type: vm ioctl
2395Parameters: struct kvm_pit_state2 (in)
2396Returns: 0 on success, -1 on error
2397
2398Sets the state of the in-kernel PIT model. Only valid after KVM_CREATE_PIT2.
2399See KVM_GET_PIT2 for details on struct kvm_pit_state2.
2400
2401This IOCTL replaces the obsolete KVM_SET_PIT.
2402
2403
5b74716e
BH
24044.74 KVM_PPC_GET_SMMU_INFO
2405
2406Capability: KVM_CAP_PPC_GET_SMMU_INFO
2407Architectures: powerpc
2408Type: vm ioctl
2409Parameters: None
2410Returns: 0 on success, -1 on error
2411
2412This populates and returns a structure describing the features of
2413the "Server" class MMU emulation supported by KVM.
cc22c354 2414This can in turn be used by userspace to generate the appropriate
5b74716e
BH
2415device-tree properties for the guest operating system.
2416
c98be0c9 2417The structure contains some global information, followed by an
5b74716e
BH
2418array of supported segment page sizes:
2419
2420 struct kvm_ppc_smmu_info {
2421 __u64 flags;
2422 __u32 slb_size;
2423 __u32 pad;
2424 struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
2425 };
2426
2427The supported flags are:
2428
2429 - KVM_PPC_PAGE_SIZES_REAL:
2430 When that flag is set, guest page sizes must "fit" the backing
2431 store page sizes. When not set, any page size in the list can
2432 be used regardless of how they are backed by userspace.
2433
2434 - KVM_PPC_1T_SEGMENTS
2435 The emulated MMU supports 1T segments in addition to the
2436 standard 256M ones.
2437
901f8c3f
PM
2438 - KVM_PPC_NO_HASH
2439 This flag indicates that HPT guests are not supported by KVM,
2440 thus all guests must use radix MMU mode.
2441
5b74716e
BH
2442The "slb_size" field indicates how many SLB entries are supported
2443
2444The "sps" array contains 8 entries indicating the supported base
2445page sizes for a segment in increasing order. Each entry is defined
2446as follow:
2447
2448 struct kvm_ppc_one_seg_page_size {
2449 __u32 page_shift; /* Base page shift of segment (or 0) */
2450 __u32 slb_enc; /* SLB encoding for BookS */
2451 struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ];
2452 };
2453
2454An entry with a "page_shift" of 0 is unused. Because the array is
2455organized in increasing order, a lookup can stop when encoutering
2456such an entry.
2457
2458The "slb_enc" field provides the encoding to use in the SLB for the
2459page size. The bits are in positions such as the value can directly
2460be OR'ed into the "vsid" argument of the slbmte instruction.
2461
2462The "enc" array is a list which for each of those segment base page
2463size provides the list of supported actual page sizes (which can be
2464only larger or equal to the base page size), along with the
f884ab15 2465corresponding encoding in the hash PTE. Similarly, the array is
5b74716e
BH
24668 entries sorted by increasing sizes and an entry with a "0" shift
2467is an empty entry and a terminator:
2468
2469 struct kvm_ppc_one_page_size {
2470 __u32 page_shift; /* Page shift (or 0) */
2471 __u32 pte_enc; /* Encoding in the HPTE (>>12) */
2472 };
2473
2474The "pte_enc" field provides a value that can OR'ed into the hash
2475PTE's RPN field (ie, it needs to be shifted left by 12 to OR it
2476into the hash PTE second double word).
2477
f36992e3
AW
24784.75 KVM_IRQFD
2479
2480Capability: KVM_CAP_IRQFD
174178fe 2481Architectures: x86 s390 arm arm64
f36992e3
AW
2482Type: vm ioctl
2483Parameters: struct kvm_irqfd (in)
2484Returns: 0 on success, -1 on error
2485
2486Allows setting an eventfd to directly trigger a guest interrupt.
2487kvm_irqfd.fd specifies the file descriptor to use as the eventfd and
2488kvm_irqfd.gsi specifies the irqchip pin toggled by this event. When
17180032 2489an event is triggered on the eventfd, an interrupt is injected into
f36992e3
AW
2490the guest using the specified gsi pin. The irqfd is removed using
2491the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
2492and kvm_irqfd.gsi.
2493
7a84428a
AW
2494With KVM_CAP_IRQFD_RESAMPLE, KVM_IRQFD supports a de-assert and notify
2495mechanism allowing emulation of level-triggered, irqfd-based
2496interrupts. When KVM_IRQFD_FLAG_RESAMPLE is set the user must pass an
2497additional eventfd in the kvm_irqfd.resamplefd field. When operating
2498in resample mode, posting of an interrupt through kvm_irq.fd asserts
2499the specified gsi in the irqchip. When the irqchip is resampled, such
17180032 2500as from an EOI, the gsi is de-asserted and the user is notified via
7a84428a
AW
2501kvm_irqfd.resamplefd. It is the user's responsibility to re-queue
2502the interrupt if the device making use of it still requires service.
2503Note that closing the resamplefd is not sufficient to disable the
2504irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment
2505and need not be specified with KVM_IRQFD_FLAG_DEASSIGN.
2506
180ae7b1
EA
2507On arm/arm64, gsi routing being supported, the following can happen:
2508- in case no routing entry is associated to this gsi, injection fails
2509- in case the gsi is associated to an irqchip routing entry,
2510 irqchip.pin + 32 corresponds to the injected SPI ID.
995a0ee9
EA
2511- in case the gsi is associated to an MSI routing entry, the MSI
2512 message and device ID are translated into an LPI (support restricted
2513 to GICv3 ITS in-kernel emulation).
174178fe 2514
5fecc9d8 25154.76 KVM_PPC_ALLOCATE_HTAB
32fad281
PM
2516
2517Capability: KVM_CAP_PPC_ALLOC_HTAB
2518Architectures: powerpc
2519Type: vm ioctl
2520Parameters: Pointer to u32 containing hash table order (in/out)
2521Returns: 0 on success, -1 on error
2522
2523This requests the host kernel to allocate an MMU hash table for a
2524guest using the PAPR paravirtualization interface. This only does
2525anything if the kernel is configured to use the Book 3S HV style of
2526virtualization. Otherwise the capability doesn't exist and the ioctl
2527returns an ENOTTY error. The rest of this description assumes Book 3S
2528HV.
2529
2530There must be no vcpus running when this ioctl is called; if there
2531are, it will do nothing and return an EBUSY error.
2532
2533The parameter is a pointer to a 32-bit unsigned integer variable
2534containing the order (log base 2) of the desired size of the hash
2535table, which must be between 18 and 46. On successful return from the
f98a8bf9 2536ioctl, the value will not be changed by the kernel.
32fad281
PM
2537
2538If no hash table has been allocated when any vcpu is asked to run
2539(with the KVM_RUN ioctl), the host kernel will allocate a
2540default-sized hash table (16 MB).
2541
2542If this ioctl is called when a hash table has already been allocated,
f98a8bf9
DG
2543with a different order from the existing hash table, the existing hash
2544table will be freed and a new one allocated. If this is ioctl is
2545called when a hash table has already been allocated of the same order
2546as specified, the kernel will clear out the existing hash table (zero
2547all HPTEs). In either case, if the guest is using the virtualized
2548real-mode area (VRMA) facility, the kernel will re-create the VMRA
2549HPTEs on the next KVM_RUN of any vcpu.
32fad281 2550
416ad65f
CH
25514.77 KVM_S390_INTERRUPT
2552
2553Capability: basic
2554Architectures: s390
2555Type: vm ioctl, vcpu ioctl
2556Parameters: struct kvm_s390_interrupt (in)
2557Returns: 0 on success, -1 on error
2558
2559Allows to inject an interrupt to the guest. Interrupts can be floating
2560(vm ioctl) or per cpu (vcpu ioctl), depending on the interrupt type.
2561
2562Interrupt parameters are passed via kvm_s390_interrupt:
2563
2564struct kvm_s390_interrupt {
2565 __u32 type;
2566 __u32 parm;
2567 __u64 parm64;
2568};
2569
2570type can be one of the following:
2571
2822545f 2572KVM_S390_SIGP_STOP (vcpu) - sigp stop; optional flags in parm
416ad65f
CH
2573KVM_S390_PROGRAM_INT (vcpu) - program check; code in parm
2574KVM_S390_SIGP_SET_PREFIX (vcpu) - sigp set prefix; prefix address in parm
2575KVM_S390_RESTART (vcpu) - restart
e029ae5b
TH
2576KVM_S390_INT_CLOCK_COMP (vcpu) - clock comparator interrupt
2577KVM_S390_INT_CPU_TIMER (vcpu) - CPU timer interrupt
416ad65f
CH
2578KVM_S390_INT_VIRTIO (vm) - virtio external interrupt; external interrupt
2579 parameters in parm and parm64
2580KVM_S390_INT_SERVICE (vm) - sclp external interrupt; sclp parameter in parm
2581KVM_S390_INT_EMERGENCY (vcpu) - sigp emergency; source cpu in parm
2582KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; source cpu in parm
d8346b7d
CH
2583KVM_S390_INT_IO(ai,cssid,ssid,schid) (vm) - compound value to indicate an
2584 I/O interrupt (ai - adapter interrupt; cssid,ssid,schid - subchannel);
2585 I/O interruption parameters in parm (subchannel) and parm64 (intparm,
2586 interruption subclass)
48a3e950
CH
2587KVM_S390_MCHK (vm, vcpu) - machine check interrupt; cr 14 bits in parm,
2588 machine check interrupt code in parm64 (note that
2589 machine checks needing further payload are not
2590 supported by this ioctl)
416ad65f
CH
2591
2592Note that the vcpu ioctl is asynchronous to vcpu execution.
2593
a2932923
PM
25944.78 KVM_PPC_GET_HTAB_FD
2595
2596Capability: KVM_CAP_PPC_HTAB_FD
2597Architectures: powerpc
2598Type: vm ioctl
2599Parameters: Pointer to struct kvm_get_htab_fd (in)
2600Returns: file descriptor number (>= 0) on success, -1 on error
2601
2602This returns a file descriptor that can be used either to read out the
2603entries in the guest's hashed page table (HPT), or to write entries to
2604initialize the HPT. The returned fd can only be written to if the
2605KVM_GET_HTAB_WRITE bit is set in the flags field of the argument, and
2606can only be read if that bit is clear. The argument struct looks like
2607this:
2608
2609/* For KVM_PPC_GET_HTAB_FD */
2610struct kvm_get_htab_fd {
2611 __u64 flags;
2612 __u64 start_index;
2613 __u64 reserved[2];
2614};
2615
2616/* Values for kvm_get_htab_fd.flags */
2617#define KVM_GET_HTAB_BOLTED_ONLY ((__u64)0x1)
2618#define KVM_GET_HTAB_WRITE ((__u64)0x2)
2619
2620The `start_index' field gives the index in the HPT of the entry at
2621which to start reading. It is ignored when writing.
2622
2623Reads on the fd will initially supply information about all
2624"interesting" HPT entries. Interesting entries are those with the
2625bolted bit set, if the KVM_GET_HTAB_BOLTED_ONLY bit is set, otherwise
2626all entries. When the end of the HPT is reached, the read() will
2627return. If read() is called again on the fd, it will start again from
2628the beginning of the HPT, but will only return HPT entries that have
2629changed since they were last read.
2630
2631Data read or written is structured as a header (8 bytes) followed by a
2632series of valid HPT entries (16 bytes) each. The header indicates how
2633many valid HPT entries there are and how many invalid entries follow
2634the valid entries. The invalid entries are not represented explicitly
2635in the stream. The header format is:
2636
2637struct kvm_get_htab_header {
2638 __u32 index;
2639 __u16 n_valid;
2640 __u16 n_invalid;
2641};
2642
2643Writes to the fd create HPT entries starting at the index given in the
2644header; first `n_valid' valid entries with contents from the data
2645written, then `n_invalid' invalid entries, invalidating any previously
2646valid entries found.
2647
852b6d57
SW
26484.79 KVM_CREATE_DEVICE
2649
2650Capability: KVM_CAP_DEVICE_CTRL
2651Type: vm ioctl
2652Parameters: struct kvm_create_device (in/out)
2653Returns: 0 on success, -1 on error
2654Errors:
2655 ENODEV: The device type is unknown or unsupported
2656 EEXIST: Device already created, and this type of device may not
2657 be instantiated multiple times
2658
2659 Other error conditions may be defined by individual device types or
2660 have their standard meanings.
2661
2662Creates an emulated device in the kernel. The file descriptor returned
2663in fd can be used with KVM_SET/GET/HAS_DEVICE_ATTR.
2664
2665If the KVM_CREATE_DEVICE_TEST flag is set, only test whether the
2666device type is supported (not necessarily whether it can be created
2667in the current vm).
2668
2669Individual devices should not define flags. Attributes should be used
2670for specifying any behavior that is not implied by the device type
2671number.
2672
2673struct kvm_create_device {
2674 __u32 type; /* in: KVM_DEV_TYPE_xxx */
2675 __u32 fd; /* out: device handle */
2676 __u32 flags; /* in: KVM_CREATE_DEVICE_xxx */
2677};
2678
26794.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR
2680
f577f6c2
SZ
2681Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device,
2682 KVM_CAP_VCPU_ATTRIBUTES for vcpu device
2683Type: device ioctl, vm ioctl, vcpu ioctl
852b6d57
SW
2684Parameters: struct kvm_device_attr
2685Returns: 0 on success, -1 on error
2686Errors:
2687 ENXIO: The group or attribute is unknown/unsupported for this device
f9cbd9b0 2688 or hardware support is missing.
852b6d57
SW
2689 EPERM: The attribute cannot (currently) be accessed this way
2690 (e.g. read-only attribute, or attribute that only makes
2691 sense when the device is in a different state)
2692
2693 Other error conditions may be defined by individual device types.
2694
2695Gets/sets a specified piece of device configuration and/or state. The
2696semantics are device-specific. See individual device documentation in
2697the "devices" directory. As with ONE_REG, the size of the data
2698transferred is defined by the particular attribute.
2699
2700struct kvm_device_attr {
2701 __u32 flags; /* no flags currently defined */
2702 __u32 group; /* device-defined */
2703 __u64 attr; /* group-defined */
2704 __u64 addr; /* userspace address of attr data */
2705};
2706
27074.81 KVM_HAS_DEVICE_ATTR
2708
f577f6c2
SZ
2709Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device,
2710 KVM_CAP_VCPU_ATTRIBUTES for vcpu device
2711Type: device ioctl, vm ioctl, vcpu ioctl
852b6d57
SW
2712Parameters: struct kvm_device_attr
2713Returns: 0 on success, -1 on error
2714Errors:
2715 ENXIO: The group or attribute is unknown/unsupported for this device
f9cbd9b0 2716 or hardware support is missing.
852b6d57
SW
2717
2718Tests whether a device supports a particular attribute. A successful
2719return indicates the attribute is implemented. It does not necessarily
2720indicate that the attribute can be read or written in the device's
2721current state. "addr" is ignored.
f36992e3 2722
d8968f1f 27234.82 KVM_ARM_VCPU_INIT
749cf76c
CD
2724
2725Capability: basic
379e04c7 2726Architectures: arm, arm64
749cf76c 2727Type: vcpu ioctl
beb11fc7 2728Parameters: struct kvm_vcpu_init (in)
749cf76c
CD
2729Returns: 0 on success; -1 on error
2730Errors:
2731  EINVAL:    the target is unknown, or the combination of features is invalid.
2732  ENOENT:    a features bit specified is unknown.
2733
2734This tells KVM what type of CPU to present to the guest, and what
2735optional features it should have.  This will cause a reset of the cpu
2736registers to their initial values.  If this is not called, KVM_RUN will
2737return ENOEXEC for that vcpu.
2738
2739Note that because some registers reflect machine topology, all vcpus
2740should be created before this ioctl is invoked.
2741
f7fa034d
CD
2742Userspace can call this function multiple times for a given vcpu, including
2743after the vcpu has been run. This will reset the vcpu to its initial
2744state. All calls to this function after the initial call must use the same
2745target and same set of feature flags, otherwise EINVAL will be returned.
2746
aa024c2f
MZ
2747Possible features:
2748 - KVM_ARM_VCPU_POWER_OFF: Starts the CPU in a power-off state.
3ad8b3de
CD
2749 Depends on KVM_CAP_ARM_PSCI. If not set, the CPU will be powered on
2750 and execute guest code when KVM_RUN is called.
379e04c7
MZ
2751 - KVM_ARM_VCPU_EL1_32BIT: Starts the CPU in a 32bit mode.
2752 Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only).
85bd0ba1
MZ
2753 - KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 (or a future revision
2754 backward compatible with v0.2) for the CPU.
50bb0c94 2755 Depends on KVM_CAP_ARM_PSCI_0_2.
808e7381
SZ
2756 - KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU.
2757 Depends on KVM_CAP_ARM_PMU_V3.
aa024c2f 2758
50036ad0
DM
2759 - KVM_ARM_VCPU_SVE: Enables SVE for the CPU (arm64 only).
2760 Depends on KVM_CAP_ARM_SVE.
2761 Requires KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE):
2762
2763 * After KVM_ARM_VCPU_INIT:
2764
2765 - KVM_REG_ARM64_SVE_VLS may be read using KVM_GET_ONE_REG: the
2766 initial value of this pseudo-register indicates the best set of
2767 vector lengths possible for a vcpu on this host.
2768
2769 * Before KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE):
2770
2771 - KVM_RUN and KVM_GET_REG_LIST are not available;
2772
2773 - KVM_GET_ONE_REG and KVM_SET_ONE_REG cannot be used to access
2774 the scalable archietctural SVE registers
2775 KVM_REG_ARM64_SVE_ZREG(), KVM_REG_ARM64_SVE_PREG() or
2776 KVM_REG_ARM64_SVE_FFR;
2777
2778 - KVM_REG_ARM64_SVE_VLS may optionally be written using
2779 KVM_SET_ONE_REG, to modify the set of vector lengths available
2780 for the vcpu.
2781
2782 * After KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE):
2783
2784 - the KVM_REG_ARM64_SVE_VLS pseudo-register is immutable, and can
2785 no longer be written using KVM_SET_ONE_REG.
749cf76c 2786
740edfc0
AP
27874.83 KVM_ARM_PREFERRED_TARGET
2788
2789Capability: basic
2790Architectures: arm, arm64
2791Type: vm ioctl
2792Parameters: struct struct kvm_vcpu_init (out)
2793Returns: 0 on success; -1 on error
2794Errors:
a7265fb1 2795 ENODEV: no preferred target available for the host
740edfc0
AP
2796
2797This queries KVM for preferred CPU target type which can be emulated
2798by KVM on underlying host.
2799
2800The ioctl returns struct kvm_vcpu_init instance containing information
2801about preferred CPU target type and recommended features for it. The
2802kvm_vcpu_init->features bitmap returned will have feature bits set if
2803the preferred target recommends setting these features, but this is
2804not mandatory.
2805
2806The information returned by this ioctl can be used to prepare an instance
2807of struct kvm_vcpu_init for KVM_ARM_VCPU_INIT ioctl which will result in
2808in VCPU matching underlying host.
2809
2810
28114.84 KVM_GET_REG_LIST
749cf76c
CD
2812
2813Capability: basic
c2d2c21b 2814Architectures: arm, arm64, mips
749cf76c
CD
2815Type: vcpu ioctl
2816Parameters: struct kvm_reg_list (in/out)
2817Returns: 0 on success; -1 on error
2818Errors:
2819  E2BIG:     the reg index list is too big to fit in the array specified by
2820             the user (the number required will be written into n).
2821
2822struct kvm_reg_list {
2823 __u64 n; /* number of registers in reg[] */
2824 __u64 reg[0];
2825};
2826
2827This ioctl returns the guest registers that are supported for the
2828KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
2829
ce01e4e8
CD
2830
28314.85 KVM_ARM_SET_DEVICE_ADDR (deprecated)
3401d546
CD
2832
2833Capability: KVM_CAP_ARM_SET_DEVICE_ADDR
379e04c7 2834Architectures: arm, arm64
3401d546
CD
2835Type: vm ioctl
2836Parameters: struct kvm_arm_device_address (in)
2837Returns: 0 on success, -1 on error
2838Errors:
2839 ENODEV: The device id is unknown
2840 ENXIO: Device not supported on current system
2841 EEXIST: Address already set
2842 E2BIG: Address outside guest physical address space
330690cd 2843 EBUSY: Address overlaps with other device range
3401d546
CD
2844
2845struct kvm_arm_device_addr {
2846 __u64 id;
2847 __u64 addr;
2848};
2849
2850Specify a device address in the guest's physical address space where guests
2851can access emulated or directly exposed devices, which the host kernel needs
2852to know about. The id field is an architecture specific identifier for a
2853specific device.
2854
379e04c7
MZ
2855ARM/arm64 divides the id field into two parts, a device id and an
2856address type id specific to the individual device.
3401d546
CD
2857
2858  bits: | 63 ... 32 | 31 ... 16 | 15 ... 0 |
2859 field: | 0x00000000 | device id | addr type id |
2860
379e04c7
MZ
2861ARM/arm64 currently only require this when using the in-kernel GIC
2862support for the hardware VGIC features, using KVM_ARM_DEVICE_VGIC_V2
2863as the device id. When setting the base address for the guest's
2864mapping of the VGIC virtual CPU and distributor interface, the ioctl
2865must be called after calling KVM_CREATE_IRQCHIP, but before calling
2866KVM_RUN on any of the VCPUs. Calling this ioctl twice for any of the
2867base addresses will return -EEXIST.
3401d546 2868
ce01e4e8
CD
2869Note, this IOCTL is deprecated and the more flexible SET/GET_DEVICE_ATTR API
2870should be used instead.
2871
2872
740edfc0 28734.86 KVM_PPC_RTAS_DEFINE_TOKEN
8e591cb7
ME
2874
2875Capability: KVM_CAP_PPC_RTAS
2876Architectures: ppc
2877Type: vm ioctl
2878Parameters: struct kvm_rtas_token_args
2879Returns: 0 on success, -1 on error
2880
2881Defines a token value for a RTAS (Run Time Abstraction Services)
2882service in order to allow it to be handled in the kernel. The
2883argument struct gives the name of the service, which must be the name
2884of a service that has a kernel-side implementation. If the token
2885value is non-zero, it will be associated with that service, and
2886subsequent RTAS calls by the guest specifying that token will be
2887handled by the kernel. If the token value is 0, then any token
2888associated with the service will be forgotten, and subsequent RTAS
2889calls by the guest for that service will be passed to userspace to be
2890handled.
2891
4bd9d344
AB
28924.87 KVM_SET_GUEST_DEBUG
2893
2894Capability: KVM_CAP_SET_GUEST_DEBUG
0e6f07f2 2895Architectures: x86, s390, ppc, arm64
4bd9d344
AB
2896Type: vcpu ioctl
2897Parameters: struct kvm_guest_debug (in)
2898Returns: 0 on success; -1 on error
2899
2900struct kvm_guest_debug {
2901 __u32 control;
2902 __u32 pad;
2903 struct kvm_guest_debug_arch arch;
2904};
2905
2906Set up the processor specific debug registers and configure vcpu for
2907handling guest debug events. There are two parts to the structure, the
2908first a control bitfield indicates the type of debug events to handle
2909when running. Common control bits are:
2910
2911 - KVM_GUESTDBG_ENABLE: guest debugging is enabled
2912 - KVM_GUESTDBG_SINGLESTEP: the next run should single-step
2913
2914The top 16 bits of the control field are architecture specific control
2915flags which can include the following:
2916
4bd611ca 2917 - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64]
834bf887 2918 - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390, arm64]
4bd9d344
AB
2919 - KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86]
2920 - KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86]
2921 - KVM_GUESTDBG_EXIT_PENDING: trigger an immediate guest exit [s390]
2922
2923For example KVM_GUESTDBG_USE_SW_BP indicates that software breakpoints
2924are enabled in memory so we need to ensure breakpoint exceptions are
2925correctly trapped and the KVM run loop exits at the breakpoint and not
2926running off into the normal guest vector. For KVM_GUESTDBG_USE_HW_BP
2927we need to ensure the guest vCPUs architecture specific registers are
2928updated to the correct (supplied) values.
2929
2930The second part of the structure is architecture specific and
2931typically contains a set of debug registers.
2932
834bf887
AB
2933For arm64 the number of debug registers is implementation defined and
2934can be determined by querying the KVM_CAP_GUEST_DEBUG_HW_BPS and
2935KVM_CAP_GUEST_DEBUG_HW_WPS capabilities which return a positive number
2936indicating the number of supported registers.
2937
4bd9d344
AB
2938When debug events exit the main run loop with the reason
2939KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run
2940structure containing architecture specific debug information.
3401d546 2941
209cf19f
AB
29424.88 KVM_GET_EMULATED_CPUID
2943
2944Capability: KVM_CAP_EXT_EMUL_CPUID
2945Architectures: x86
2946Type: system ioctl
2947Parameters: struct kvm_cpuid2 (in/out)
2948Returns: 0 on success, -1 on error
2949
2950struct kvm_cpuid2 {
2951 __u32 nent;
2952 __u32 flags;
2953 struct kvm_cpuid_entry2 entries[0];
2954};
2955
2956The member 'flags' is used for passing flags from userspace.
2957
2958#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
2959#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
2960#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
2961
2962struct kvm_cpuid_entry2 {
2963 __u32 function;
2964 __u32 index;
2965 __u32 flags;
2966 __u32 eax;
2967 __u32 ebx;
2968 __u32 ecx;
2969 __u32 edx;
2970 __u32 padding[3];
2971};
2972
2973This ioctl returns x86 cpuid features which are emulated by
2974kvm.Userspace can use the information returned by this ioctl to query
2975which features are emulated by kvm instead of being present natively.
2976
2977Userspace invokes KVM_GET_EMULATED_CPUID by passing a kvm_cpuid2
2978structure with the 'nent' field indicating the number of entries in
2979the variable-size array 'entries'. If the number of entries is too low
2980to describe the cpu capabilities, an error (E2BIG) is returned. If the
2981number is too high, the 'nent' field is adjusted and an error (ENOMEM)
2982is returned. If the number is just right, the 'nent' field is adjusted
2983to the number of valid entries in the 'entries' array, which is then
2984filled.
2985
2986The entries returned are the set CPUID bits of the respective features
2987which kvm emulates, as returned by the CPUID instruction, with unknown
2988or unsupported feature bits cleared.
2989
2990Features like x2apic, for example, may not be present in the host cpu
2991but are exposed by kvm in KVM_GET_SUPPORTED_CPUID because they can be
2992emulated efficiently and thus not included here.
2993
2994The fields in each entry are defined as follows:
2995
2996 function: the eax value used to obtain the entry
2997 index: the ecx value used to obtain the entry (for entries that are
2998 affected by ecx)
2999 flags: an OR of zero or more of the following:
3000 KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
3001 if the index field is valid
3002 KVM_CPUID_FLAG_STATEFUL_FUNC:
3003 if cpuid for this function returns different values for successive
3004 invocations; there will be several entries with the same function,
3005 all with this flag set
3006 KVM_CPUID_FLAG_STATE_READ_NEXT:
3007 for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
3008 the first entry to be read by a cpu
3009 eax, ebx, ecx, edx: the values returned by the cpuid instruction for
3010 this function/index combination
3011
41408c28
TH
30124.89 KVM_S390_MEM_OP
3013
3014Capability: KVM_CAP_S390_MEM_OP
3015Architectures: s390
3016Type: vcpu ioctl
3017Parameters: struct kvm_s390_mem_op (in)
3018Returns: = 0 on success,
3019 < 0 on generic error (e.g. -EFAULT or -ENOMEM),
3020 > 0 if an exception occurred while walking the page tables
3021
5d4f6f3d 3022Read or write data from/to the logical (virtual) memory of a VCPU.
41408c28
TH
3023
3024Parameters are specified via the following structure:
3025
3026struct kvm_s390_mem_op {
3027 __u64 gaddr; /* the guest address */
3028 __u64 flags; /* flags */
3029 __u32 size; /* amount of bytes */
3030 __u32 op; /* type of operation */
3031 __u64 buf; /* buffer in userspace */
3032 __u8 ar; /* the access register number */
3033 __u8 reserved[31]; /* should be set to 0 */
3034};
3035
3036The type of operation is specified in the "op" field. It is either
3037KVM_S390_MEMOP_LOGICAL_READ for reading from logical memory space or
3038KVM_S390_MEMOP_LOGICAL_WRITE for writing to logical memory space. The
3039KVM_S390_MEMOP_F_CHECK_ONLY flag can be set in the "flags" field to check
3040whether the corresponding memory access would create an access exception
3041(without touching the data in the memory at the destination). In case an
3042access exception occurred while walking the MMU tables of the guest, the
3043ioctl returns a positive error number to indicate the type of exception.
3044This exception is also raised directly at the corresponding VCPU if the
3045flag KVM_S390_MEMOP_F_INJECT_EXCEPTION is set in the "flags" field.
3046
3047The start address of the memory region has to be specified in the "gaddr"
3048field, and the length of the region in the "size" field. "buf" is the buffer
3049supplied by the userspace application where the read data should be written
3050to for KVM_S390_MEMOP_LOGICAL_READ, or where the data that should be written
3051is stored for a KVM_S390_MEMOP_LOGICAL_WRITE. "buf" is unused and can be NULL
3052when KVM_S390_MEMOP_F_CHECK_ONLY is specified. "ar" designates the access
3053register number to be used.
3054
3055The "reserved" field is meant for future extensions. It is not used by
3056KVM with the currently defined set of flags.
3057
30ee2a98
JH
30584.90 KVM_S390_GET_SKEYS
3059
3060Capability: KVM_CAP_S390_SKEYS
3061Architectures: s390
3062Type: vm ioctl
3063Parameters: struct kvm_s390_skeys
3064Returns: 0 on success, KVM_S390_GET_KEYS_NONE if guest is not using storage
3065 keys, negative value on error
3066
3067This ioctl is used to get guest storage key values on the s390
3068architecture. The ioctl takes parameters via the kvm_s390_skeys struct.
3069
3070struct kvm_s390_skeys {
3071 __u64 start_gfn;
3072 __u64 count;
3073 __u64 skeydata_addr;
3074 __u32 flags;
3075 __u32 reserved[9];
3076};
3077
3078The start_gfn field is the number of the first guest frame whose storage keys
3079you want to get.
3080
3081The count field is the number of consecutive frames (starting from start_gfn)
3082whose storage keys to get. The count field must be at least 1 and the maximum
3083allowed value is defined as KVM_S390_SKEYS_ALLOC_MAX. Values outside this range
3084will cause the ioctl to return -EINVAL.
3085
3086The skeydata_addr field is the address to a buffer large enough to hold count
3087bytes. This buffer will be filled with storage key data by the ioctl.
3088
30894.91 KVM_S390_SET_SKEYS
3090
3091Capability: KVM_CAP_S390_SKEYS
3092Architectures: s390
3093Type: vm ioctl
3094Parameters: struct kvm_s390_skeys
3095Returns: 0 on success, negative value on error
3096
3097This ioctl is used to set guest storage key values on the s390
3098architecture. The ioctl takes parameters via the kvm_s390_skeys struct.
3099See section on KVM_S390_GET_SKEYS for struct definition.
3100
3101The start_gfn field is the number of the first guest frame whose storage keys
3102you want to set.
3103
3104The count field is the number of consecutive frames (starting from start_gfn)
3105whose storage keys to get. The count field must be at least 1 and the maximum
3106allowed value is defined as KVM_S390_SKEYS_ALLOC_MAX. Values outside this range
3107will cause the ioctl to return -EINVAL.
3108
3109The skeydata_addr field is the address to a buffer containing count bytes of
3110storage keys. Each byte in the buffer will be set as the storage key for a
3111single frame starting at start_gfn for count frames.
3112
3113Note: If any architecturally invalid key value is found in the given data then
3114the ioctl will return -EINVAL.
3115
47b43c52
JF
31164.92 KVM_S390_IRQ
3117
3118Capability: KVM_CAP_S390_INJECT_IRQ
3119Architectures: s390
3120Type: vcpu ioctl
3121Parameters: struct kvm_s390_irq (in)
3122Returns: 0 on success, -1 on error
3123Errors:
3124 EINVAL: interrupt type is invalid
3125 type is KVM_S390_SIGP_STOP and flag parameter is invalid value
3126 type is KVM_S390_INT_EXTERNAL_CALL and code is bigger
3127 than the maximum of VCPUs
3128 EBUSY: type is KVM_S390_SIGP_SET_PREFIX and vcpu is not stopped
3129 type is KVM_S390_SIGP_STOP and a stop irq is already pending
3130 type is KVM_S390_INT_EXTERNAL_CALL and an external call interrupt
3131 is already pending
3132
3133Allows to inject an interrupt to the guest.
3134
3135Using struct kvm_s390_irq as a parameter allows
3136to inject additional payload which is not
3137possible via KVM_S390_INTERRUPT.
3138
3139Interrupt parameters are passed via kvm_s390_irq:
3140
3141struct kvm_s390_irq {
3142 __u64 type;
3143 union {
3144 struct kvm_s390_io_info io;
3145 struct kvm_s390_ext_info ext;
3146 struct kvm_s390_pgm_info pgm;
3147 struct kvm_s390_emerg_info emerg;
3148 struct kvm_s390_extcall_info extcall;
3149 struct kvm_s390_prefix_info prefix;
3150 struct kvm_s390_stop_info stop;
3151 struct kvm_s390_mchk_info mchk;
3152 char reserved[64];
3153 } u;
3154};
3155
3156type can be one of the following:
3157
3158KVM_S390_SIGP_STOP - sigp stop; parameter in .stop
3159KVM_S390_PROGRAM_INT - program check; parameters in .pgm
3160KVM_S390_SIGP_SET_PREFIX - sigp set prefix; parameters in .prefix
3161KVM_S390_RESTART - restart; no parameters
3162KVM_S390_INT_CLOCK_COMP - clock comparator interrupt; no parameters
3163KVM_S390_INT_CPU_TIMER - CPU timer interrupt; no parameters
3164KVM_S390_INT_EMERGENCY - sigp emergency; parameters in .emerg
3165KVM_S390_INT_EXTERNAL_CALL - sigp external call; parameters in .extcall
3166KVM_S390_MCHK - machine check interrupt; parameters in .mchk
3167
3168
3169Note that the vcpu ioctl is asynchronous to vcpu execution.
3170
816c7667
JF
31714.94 KVM_S390_GET_IRQ_STATE
3172
3173Capability: KVM_CAP_S390_IRQ_STATE
3174Architectures: s390
3175Type: vcpu ioctl
3176Parameters: struct kvm_s390_irq_state (out)
3177Returns: >= number of bytes copied into buffer,
3178 -EINVAL if buffer size is 0,
3179 -ENOBUFS if buffer size is too small to fit all pending interrupts,
3180 -EFAULT if the buffer address was invalid
3181
3182This ioctl allows userspace to retrieve the complete state of all currently
3183pending interrupts in a single buffer. Use cases include migration
3184and introspection. The parameter structure contains the address of a
3185userspace buffer and its length:
3186
3187struct kvm_s390_irq_state {
3188 __u64 buf;
bb64da9a 3189 __u32 flags; /* will stay unused for compatibility reasons */
816c7667 3190 __u32 len;
bb64da9a 3191 __u32 reserved[4]; /* will stay unused for compatibility reasons */
816c7667
JF
3192};
3193
3194Userspace passes in the above struct and for each pending interrupt a
3195struct kvm_s390_irq is copied to the provided buffer.
3196
bb64da9a
CB
3197The structure contains a flags and a reserved field for future extensions. As
3198the kernel never checked for flags == 0 and QEMU never pre-zeroed flags and
3199reserved, these fields can not be used in the future without breaking
3200compatibility.
3201
816c7667
JF
3202If -ENOBUFS is returned the buffer provided was too small and userspace
3203may retry with a bigger buffer.
3204
32054.95 KVM_S390_SET_IRQ_STATE
3206
3207Capability: KVM_CAP_S390_IRQ_STATE
3208Architectures: s390
3209Type: vcpu ioctl
3210Parameters: struct kvm_s390_irq_state (in)
3211Returns: 0 on success,
3212 -EFAULT if the buffer address was invalid,
3213 -EINVAL for an invalid buffer length (see below),
3214 -EBUSY if there were already interrupts pending,
3215 errors occurring when actually injecting the
3216 interrupt. See KVM_S390_IRQ.
3217
3218This ioctl allows userspace to set the complete state of all cpu-local
3219interrupts currently pending for the vcpu. It is intended for restoring
3220interrupt state after a migration. The input parameter is a userspace buffer
3221containing a struct kvm_s390_irq_state:
3222
3223struct kvm_s390_irq_state {
3224 __u64 buf;
bb64da9a 3225 __u32 flags; /* will stay unused for compatibility reasons */
816c7667 3226 __u32 len;
bb64da9a 3227 __u32 reserved[4]; /* will stay unused for compatibility reasons */
816c7667
JF
3228};
3229
bb64da9a
CB
3230The restrictions for flags and reserved apply as well.
3231(see KVM_S390_GET_IRQ_STATE)
3232
816c7667
JF
3233The userspace memory referenced by buf contains a struct kvm_s390_irq
3234for each interrupt to be injected into the guest.
3235If one of the interrupts could not be injected for some reason the
3236ioctl aborts.
3237
3238len must be a multiple of sizeof(struct kvm_s390_irq). It must be > 0
3239and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq),
3240which is the maximum number of possibly pending cpu-local interrupts.
47b43c52 3241
ed8e5a24 32424.96 KVM_SMI
f077825a
PB
3243
3244Capability: KVM_CAP_X86_SMM
3245Architectures: x86
3246Type: vcpu ioctl
3247Parameters: none
3248Returns: 0 on success, -1 on error
3249
3250Queues an SMI on the thread's vcpu.
3251
d3695aa4
AK
32524.97 KVM_CAP_PPC_MULTITCE
3253
3254Capability: KVM_CAP_PPC_MULTITCE
3255Architectures: ppc
3256Type: vm
3257
3258This capability means the kernel is capable of handling hypercalls
3259H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
3260space. This significantly accelerates DMA operations for PPC KVM guests.
3261User space should expect that its handlers for these hypercalls
3262are not going to be called if user space previously registered LIOBN
3263in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
3264
3265In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
3266user space might have to advertise it for the guest. For example,
3267IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
3268present in the "ibm,hypertas-functions" device-tree property.
3269
3270The hypercalls mentioned above may or may not be processed successfully
3271in the kernel based fast path. If they can not be handled by the kernel,
3272they will get passed on to user space. So user space still has to have
3273an implementation for these despite the in kernel acceleration.
3274
3275This capability is always enabled.
3276
58ded420
AK
32774.98 KVM_CREATE_SPAPR_TCE_64
3278
3279Capability: KVM_CAP_SPAPR_TCE_64
3280Architectures: powerpc
3281Type: vm ioctl
3282Parameters: struct kvm_create_spapr_tce_64 (in)
3283Returns: file descriptor for manipulating the created TCE table
3284
3285This is an extension for KVM_CAP_SPAPR_TCE which only supports 32bit
3286windows, described in 4.62 KVM_CREATE_SPAPR_TCE
3287
3288This capability uses extended struct in ioctl interface:
3289
3290/* for KVM_CAP_SPAPR_TCE_64 */
3291struct kvm_create_spapr_tce_64 {
3292 __u64 liobn;
3293 __u32 page_shift;
3294 __u32 flags;
3295 __u64 offset; /* in pages */
3296 __u64 size; /* in pages */
3297};
3298
3299The aim of extension is to support an additional bigger DMA window with
3300a variable page size.
3301KVM_CREATE_SPAPR_TCE_64 receives a 64bit window size, an IOMMU page shift and
3302a bus offset of the corresponding DMA window, @size and @offset are numbers
3303of IOMMU pages.
3304
3305@flags are not used at the moment.
3306
3307The rest of functionality is identical to KVM_CREATE_SPAPR_TCE.
3308
ccc4df4e 33094.99 KVM_REINJECT_CONTROL
107d44a2
RK
3310
3311Capability: KVM_CAP_REINJECT_CONTROL
3312Architectures: x86
3313Type: vm ioctl
3314Parameters: struct kvm_reinject_control (in)
3315Returns: 0 on success,
3316 -EFAULT if struct kvm_reinject_control cannot be read,
3317 -ENXIO if KVM_CREATE_PIT or KVM_CREATE_PIT2 didn't succeed earlier.
3318
3319i8254 (PIT) has two modes, reinject and !reinject. The default is reinject,
3320where KVM queues elapsed i8254 ticks and monitors completion of interrupt from
3321vector(s) that i8254 injects. Reinject mode dequeues a tick and injects its
3322interrupt whenever there isn't a pending interrupt from i8254.
3323!reinject mode injects an interrupt as soon as a tick arrives.
3324
3325struct kvm_reinject_control {
3326 __u8 pit_reinject;
3327 __u8 reserved[31];
3328};
3329
3330pit_reinject = 0 (!reinject mode) is recommended, unless running an old
3331operating system that uses the PIT for timing (e.g. Linux 2.4.x).
3332
ccc4df4e 33334.100 KVM_PPC_CONFIGURE_V3_MMU
c9270132
PM
3334
3335Capability: KVM_CAP_PPC_RADIX_MMU or KVM_CAP_PPC_HASH_MMU_V3
3336Architectures: ppc
3337Type: vm ioctl
3338Parameters: struct kvm_ppc_mmuv3_cfg (in)
3339Returns: 0 on success,
3340 -EFAULT if struct kvm_ppc_mmuv3_cfg cannot be read,
3341 -EINVAL if the configuration is invalid
3342
3343This ioctl controls whether the guest will use radix or HPT (hashed
3344page table) translation, and sets the pointer to the process table for
3345the guest.
3346
3347struct kvm_ppc_mmuv3_cfg {
3348 __u64 flags;
3349 __u64 process_table;
3350};
3351
3352There are two bits that can be set in flags; KVM_PPC_MMUV3_RADIX and
3353KVM_PPC_MMUV3_GTSE. KVM_PPC_MMUV3_RADIX, if set, configures the guest
3354to use radix tree translation, and if clear, to use HPT translation.
3355KVM_PPC_MMUV3_GTSE, if set and if KVM permits it, configures the guest
3356to be able to use the global TLB and SLB invalidation instructions;
3357if clear, the guest may not use these instructions.
3358
3359The process_table field specifies the address and size of the guest
3360process table, which is in the guest's space. This field is formatted
3361as the second doubleword of the partition table entry, as defined in
3362the Power ISA V3.00, Book III section 5.7.6.1.
3363
ccc4df4e 33644.101 KVM_PPC_GET_RMMU_INFO
c9270132
PM
3365
3366Capability: KVM_CAP_PPC_RADIX_MMU
3367Architectures: ppc
3368Type: vm ioctl
3369Parameters: struct kvm_ppc_rmmu_info (out)
3370Returns: 0 on success,
3371 -EFAULT if struct kvm_ppc_rmmu_info cannot be written,
3372 -EINVAL if no useful information can be returned
3373
3374This ioctl returns a structure containing two things: (a) a list
3375containing supported radix tree geometries, and (b) a list that maps
3376page sizes to put in the "AP" (actual page size) field for the tlbie
3377(TLB invalidate entry) instruction.
3378
3379struct kvm_ppc_rmmu_info {
3380 struct kvm_ppc_radix_geom {
3381 __u8 page_shift;
3382 __u8 level_bits[4];
3383 __u8 pad[3];
3384 } geometries[8];
3385 __u32 ap_encodings[8];
3386};
3387
3388The geometries[] field gives up to 8 supported geometries for the
3389radix page table, in terms of the log base 2 of the smallest page
3390size, and the number of bits indexed at each level of the tree, from
3391the PTE level up to the PGD level in that order. Any unused entries
3392will have 0 in the page_shift field.
3393
3394The ap_encodings gives the supported page sizes and their AP field
3395encodings, encoded with the AP value in the top 3 bits and the log
3396base 2 of the page size in the bottom 6 bits.
3397
ef1ead0c
DG
33984.102 KVM_PPC_RESIZE_HPT_PREPARE
3399
3400Capability: KVM_CAP_SPAPR_RESIZE_HPT
3401Architectures: powerpc
3402Type: vm ioctl
3403Parameters: struct kvm_ppc_resize_hpt (in)
3404Returns: 0 on successful completion,
3405 >0 if a new HPT is being prepared, the value is an estimated
3406 number of milliseconds until preparation is complete
3407 -EFAULT if struct kvm_reinject_control cannot be read,
3408 -EINVAL if the supplied shift or flags are invalid
3409 -ENOMEM if unable to allocate the new HPT
3410 -ENOSPC if there was a hash collision when moving existing
3411 HPT entries to the new HPT
3412 -EIO on other error conditions
3413
3414Used to implement the PAPR extension for runtime resizing of a guest's
3415Hashed Page Table (HPT). Specifically this starts, stops or monitors
3416the preparation of a new potential HPT for the guest, essentially
3417implementing the H_RESIZE_HPT_PREPARE hypercall.
3418
3419If called with shift > 0 when there is no pending HPT for the guest,
3420this begins preparation of a new pending HPT of size 2^(shift) bytes.
3421It then returns a positive integer with the estimated number of
3422milliseconds until preparation is complete.
3423
3424If called when there is a pending HPT whose size does not match that
3425requested in the parameters, discards the existing pending HPT and
3426creates a new one as above.
3427
3428If called when there is a pending HPT of the size requested, will:
3429 * If preparation of the pending HPT is already complete, return 0
3430 * If preparation of the pending HPT has failed, return an error
3431 code, then discard the pending HPT.
3432 * If preparation of the pending HPT is still in progress, return an
3433 estimated number of milliseconds until preparation is complete.
3434
3435If called with shift == 0, discards any currently pending HPT and
3436returns 0 (i.e. cancels any in-progress preparation).
3437
3438flags is reserved for future expansion, currently setting any bits in
3439flags will result in an -EINVAL.
3440
3441Normally this will be called repeatedly with the same parameters until
3442it returns <= 0. The first call will initiate preparation, subsequent
3443ones will monitor preparation until it completes or fails.
3444
3445struct kvm_ppc_resize_hpt {
3446 __u64 flags;
3447 __u32 shift;
3448 __u32 pad;
3449};
3450
34514.103 KVM_PPC_RESIZE_HPT_COMMIT
3452
3453Capability: KVM_CAP_SPAPR_RESIZE_HPT
3454Architectures: powerpc
3455Type: vm ioctl
3456Parameters: struct kvm_ppc_resize_hpt (in)
3457Returns: 0 on successful completion,
3458 -EFAULT if struct kvm_reinject_control cannot be read,
3459 -EINVAL if the supplied shift or flags are invalid
3460 -ENXIO is there is no pending HPT, or the pending HPT doesn't
3461 have the requested size
3462 -EBUSY if the pending HPT is not fully prepared
3463 -ENOSPC if there was a hash collision when moving existing
3464 HPT entries to the new HPT
3465 -EIO on other error conditions
3466
3467Used to implement the PAPR extension for runtime resizing of a guest's
3468Hashed Page Table (HPT). Specifically this requests that the guest be
3469transferred to working with the new HPT, essentially implementing the
3470H_RESIZE_HPT_COMMIT hypercall.
3471
3472This should only be called after KVM_PPC_RESIZE_HPT_PREPARE has
3473returned 0 with the same parameters. In other cases
3474KVM_PPC_RESIZE_HPT_COMMIT will return an error (usually -ENXIO or
3475-EBUSY, though others may be possible if the preparation was started,
3476but failed).
3477
3478This will have undefined effects on the guest if it has not already
3479placed itself in a quiescent state where no vcpu will make MMU enabled
3480memory accesses.
3481
3482On succsful completion, the pending HPT will become the guest's active
3483HPT and the previous HPT will be discarded.
3484
3485On failure, the guest will still be operating on its previous HPT.
3486
3487struct kvm_ppc_resize_hpt {
3488 __u64 flags;
3489 __u32 shift;
3490 __u32 pad;
3491};
3492
3aa53859
LC
34934.104 KVM_X86_GET_MCE_CAP_SUPPORTED
3494
3495Capability: KVM_CAP_MCE
3496Architectures: x86
3497Type: system ioctl
3498Parameters: u64 mce_cap (out)
3499Returns: 0 on success, -1 on error
3500
3501Returns supported MCE capabilities. The u64 mce_cap parameter
3502has the same format as the MSR_IA32_MCG_CAP register. Supported
3503capabilities will have the corresponding bits set.
3504
35054.105 KVM_X86_SETUP_MCE
3506
3507Capability: KVM_CAP_MCE
3508Architectures: x86
3509Type: vcpu ioctl
3510Parameters: u64 mcg_cap (in)
3511Returns: 0 on success,
3512 -EFAULT if u64 mcg_cap cannot be read,
3513 -EINVAL if the requested number of banks is invalid,
3514 -EINVAL if requested MCE capability is not supported.
3515
3516Initializes MCE support for use. The u64 mcg_cap parameter
3517has the same format as the MSR_IA32_MCG_CAP register and
3518specifies which capabilities should be enabled. The maximum
3519supported number of error-reporting banks can be retrieved when
3520checking for KVM_CAP_MCE. The supported capabilities can be
3521retrieved with KVM_X86_GET_MCE_CAP_SUPPORTED.
3522
35234.106 KVM_X86_SET_MCE
3524
3525Capability: KVM_CAP_MCE
3526Architectures: x86
3527Type: vcpu ioctl
3528Parameters: struct kvm_x86_mce (in)
3529Returns: 0 on success,
3530 -EFAULT if struct kvm_x86_mce cannot be read,
3531 -EINVAL if the bank number is invalid,
3532 -EINVAL if VAL bit is not set in status field.
3533
3534Inject a machine check error (MCE) into the guest. The input
3535parameter is:
3536
3537struct kvm_x86_mce {
3538 __u64 status;
3539 __u64 addr;
3540 __u64 misc;
3541 __u64 mcg_status;
3542 __u8 bank;
3543 __u8 pad1[7];
3544 __u64 pad2[3];
3545};
3546
3547If the MCE being reported is an uncorrected error, KVM will
3548inject it as an MCE exception into the guest. If the guest
3549MCG_STATUS register reports that an MCE is in progress, KVM
3550causes an KVM_EXIT_SHUTDOWN vmexit.
3551
3552Otherwise, if the MCE is a corrected error, KVM will just
3553store it in the corresponding bank (provided this bank is
3554not holding a previously reported uncorrected error).
3555
4036e387
CI
35564.107 KVM_S390_GET_CMMA_BITS
3557
3558Capability: KVM_CAP_S390_CMMA_MIGRATION
3559Architectures: s390
3560Type: vm ioctl
3561Parameters: struct kvm_s390_cmma_log (in, out)
3562Returns: 0 on success, a negative value on error
3563
3564This ioctl is used to get the values of the CMMA bits on the s390
3565architecture. It is meant to be used in two scenarios:
3566- During live migration to save the CMMA values. Live migration needs
3567 to be enabled via the KVM_REQ_START_MIGRATION VM property.
3568- To non-destructively peek at the CMMA values, with the flag
3569 KVM_S390_CMMA_PEEK set.
3570
3571The ioctl takes parameters via the kvm_s390_cmma_log struct. The desired
3572values are written to a buffer whose location is indicated via the "values"
3573member in the kvm_s390_cmma_log struct. The values in the input struct are
3574also updated as needed.
3575Each CMMA value takes up one byte.
3576
3577struct kvm_s390_cmma_log {
3578 __u64 start_gfn;
3579 __u32 count;
3580 __u32 flags;
3581 union {
3582 __u64 remaining;
3583 __u64 mask;
3584 };
3585 __u64 values;
3586};
3587
3588start_gfn is the number of the first guest frame whose CMMA values are
3589to be retrieved,
3590
3591count is the length of the buffer in bytes,
3592
3593values points to the buffer where the result will be written to.
3594
3595If count is greater than KVM_S390_SKEYS_MAX, then it is considered to be
3596KVM_S390_SKEYS_MAX. KVM_S390_SKEYS_MAX is re-used for consistency with
3597other ioctls.
3598
3599The result is written in the buffer pointed to by the field values, and
3600the values of the input parameter are updated as follows.
3601
3602Depending on the flags, different actions are performed. The only
3603supported flag so far is KVM_S390_CMMA_PEEK.
3604
3605The default behaviour if KVM_S390_CMMA_PEEK is not set is:
3606start_gfn will indicate the first page frame whose CMMA bits were dirty.
3607It is not necessarily the same as the one passed as input, as clean pages
3608are skipped.
3609
3610count will indicate the number of bytes actually written in the buffer.
3611It can (and very often will) be smaller than the input value, since the
3612buffer is only filled until 16 bytes of clean values are found (which
3613are then not copied in the buffer). Since a CMMA migration block needs
3614the base address and the length, for a total of 16 bytes, we will send
3615back some clean data if there is some dirty data afterwards, as long as
3616the size of the clean data does not exceed the size of the header. This
3617allows to minimize the amount of data to be saved or transferred over
3618the network at the expense of more roundtrips to userspace. The next
3619invocation of the ioctl will skip over all the clean values, saving
3620potentially more than just the 16 bytes we found.
3621
3622If KVM_S390_CMMA_PEEK is set:
3623the existing storage attributes are read even when not in migration
3624mode, and no other action is performed;
3625
3626the output start_gfn will be equal to the input start_gfn,
3627
3628the output count will be equal to the input count, except if the end of
3629memory has been reached.
3630
3631In both cases:
3632the field "remaining" will indicate the total number of dirty CMMA values
3633still remaining, or 0 if KVM_S390_CMMA_PEEK is set and migration mode is
3634not enabled.
3635
3636mask is unused.
3637
3638values points to the userspace buffer where the result will be stored.
3639
3640This ioctl can fail with -ENOMEM if not enough memory can be allocated to
3641complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
3642KVM_S390_CMMA_PEEK is not set but migration mode was not enabled, with
3643-EFAULT if the userspace address is invalid or if no page table is
3644present for the addresses (e.g. when using hugepages).
3645
36464.108 KVM_S390_SET_CMMA_BITS
3647
3648Capability: KVM_CAP_S390_CMMA_MIGRATION
3649Architectures: s390
3650Type: vm ioctl
3651Parameters: struct kvm_s390_cmma_log (in)
3652Returns: 0 on success, a negative value on error
3653
3654This ioctl is used to set the values of the CMMA bits on the s390
3655architecture. It is meant to be used during live migration to restore
3656the CMMA values, but there are no restrictions on its use.
3657The ioctl takes parameters via the kvm_s390_cmma_values struct.
3658Each CMMA value takes up one byte.
3659
3660struct kvm_s390_cmma_log {
3661 __u64 start_gfn;
3662 __u32 count;
3663 __u32 flags;
3664 union {
3665 __u64 remaining;
3666 __u64 mask;
3667 };
3668 __u64 values;
3669};
3670
3671start_gfn indicates the starting guest frame number,
3672
3673count indicates how many values are to be considered in the buffer,
3674
3675flags is not used and must be 0.
3676
3677mask indicates which PGSTE bits are to be considered.
3678
3679remaining is not used.
3680
3681values points to the buffer in userspace where to store the values.
3682
3683This ioctl can fail with -ENOMEM if not enough memory can be allocated to
3684complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
3685the count field is too large (e.g. more than KVM_S390_CMMA_SIZE_MAX) or
3686if the flags field was not 0, with -EFAULT if the userspace address is
3687invalid, if invalid pages are written to (e.g. after the end of memory)
3688or if no page table is present for the addresses (e.g. when using
3689hugepages).
3690
7bf14c28 36914.109 KVM_PPC_GET_CPU_CHAR
3214d01f
PM
3692
3693Capability: KVM_CAP_PPC_GET_CPU_CHAR
3694Architectures: powerpc
3695Type: vm ioctl
3696Parameters: struct kvm_ppc_cpu_char (out)
3697Returns: 0 on successful completion
3698 -EFAULT if struct kvm_ppc_cpu_char cannot be written
3699
3700This ioctl gives userspace information about certain characteristics
3701of the CPU relating to speculative execution of instructions and
3702possible information leakage resulting from speculative execution (see
3703CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754). The information is
3704returned in struct kvm_ppc_cpu_char, which looks like this:
3705
3706struct kvm_ppc_cpu_char {
3707 __u64 character; /* characteristics of the CPU */
3708 __u64 behaviour; /* recommended software behaviour */
3709 __u64 character_mask; /* valid bits in character */
3710 __u64 behaviour_mask; /* valid bits in behaviour */
3711};
3712
3713For extensibility, the character_mask and behaviour_mask fields
3714indicate which bits of character and behaviour have been filled in by
3715the kernel. If the set of defined bits is extended in future then
3716userspace will be able to tell whether it is running on a kernel that
3717knows about the new bits.
3718
3719The character field describes attributes of the CPU which can help
3720with preventing inadvertent information disclosure - specifically,
3721whether there is an instruction to flash-invalidate the L1 data cache
3722(ori 30,30,0 or mtspr SPRN_TRIG2,rN), whether the L1 data cache is set
3723to a mode where entries can only be used by the thread that created
3724them, whether the bcctr[l] instruction prevents speculation, and
3725whether a speculation barrier instruction (ori 31,31,0) is provided.
3726
3727The behaviour field describes actions that software should take to
3728prevent inadvertent information disclosure, and thus describes which
3729vulnerabilities the hardware is subject to; specifically whether the
3730L1 data cache should be flushed when returning to user mode from the
3731kernel, and whether a speculation barrier should be placed between an
3732array bounds check and the array access.
3733
3734These fields use the same bit definitions as the new
3735H_GET_CPU_CHARACTERISTICS hypercall.
3736
7bf14c28 37374.110 KVM_MEMORY_ENCRYPT_OP
5acc5c06
BS
3738
3739Capability: basic
3740Architectures: x86
3741Type: system
3742Parameters: an opaque platform specific structure (in/out)
3743Returns: 0 on success; -1 on error
3744
3745If the platform supports creating encrypted VMs then this ioctl can be used
3746for issuing platform-specific memory encryption commands to manage those
3747encrypted VMs.
3748
3749Currently, this ioctl is used for issuing Secure Encrypted Virtualization
3750(SEV) commands on AMD Processors. The SEV commands are defined in
21e94aca 3751Documentation/virtual/kvm/amd-memory-encryption.rst.
5acc5c06 3752
7bf14c28 37534.111 KVM_MEMORY_ENCRYPT_REG_REGION
69eaedee
BS
3754
3755Capability: basic
3756Architectures: x86
3757Type: system
3758Parameters: struct kvm_enc_region (in)
3759Returns: 0 on success; -1 on error
3760
3761This ioctl can be used to register a guest memory region which may
3762contain encrypted data (e.g. guest RAM, SMRAM etc).
3763
3764It is used in the SEV-enabled guest. When encryption is enabled, a guest
3765memory region may contain encrypted data. The SEV memory encryption
3766engine uses a tweak such that two identical plaintext pages, each at
3767different locations will have differing ciphertexts. So swapping or
3768moving ciphertext of those pages will not result in plaintext being
3769swapped. So relocating (or migrating) physical backing pages for the SEV
3770guest will require some additional steps.
3771
3772Note: The current SEV key management spec does not provide commands to
3773swap or migrate (move) ciphertext pages. Hence, for now we pin the guest
3774memory region registered with the ioctl.
3775
7bf14c28 37764.112 KVM_MEMORY_ENCRYPT_UNREG_REGION
69eaedee
BS
3777
3778Capability: basic
3779Architectures: x86
3780Type: system
3781Parameters: struct kvm_enc_region (in)
3782Returns: 0 on success; -1 on error
3783
3784This ioctl can be used to unregister the guest memory region registered
3785with KVM_MEMORY_ENCRYPT_REG_REGION ioctl above.
3786
faeb7833
RK
37874.113 KVM_HYPERV_EVENTFD
3788
3789Capability: KVM_CAP_HYPERV_EVENTFD
3790Architectures: x86
3791Type: vm ioctl
3792Parameters: struct kvm_hyperv_eventfd (in)
3793
3794This ioctl (un)registers an eventfd to receive notifications from the guest on
3795the specified Hyper-V connection id through the SIGNAL_EVENT hypercall, without
3796causing a user exit. SIGNAL_EVENT hypercall with non-zero event flag number
3797(bits 24-31) still triggers a KVM_EXIT_HYPERV_HCALL user exit.
3798
3799struct kvm_hyperv_eventfd {
3800 __u32 conn_id;
3801 __s32 fd;
3802 __u32 flags;
3803 __u32 padding[3];
3804};
3805
3806The conn_id field should fit within 24 bits:
3807
3808#define KVM_HYPERV_CONN_ID_MASK 0x00ffffff
3809
3810The acceptable values for the flags field are:
3811
3812#define KVM_HYPERV_EVENTFD_DEASSIGN (1 << 0)
3813
3814Returns: 0 on success,
3815 -EINVAL if conn_id or flags is outside the allowed range
3816 -ENOENT on deassign if the conn_id isn't registered
3817 -EEXIST on assign if the conn_id is already registered
3818
8fcc4b59
JM
38194.114 KVM_GET_NESTED_STATE
3820
3821Capability: KVM_CAP_NESTED_STATE
3822Architectures: x86
3823Type: vcpu ioctl
3824Parameters: struct kvm_nested_state (in/out)
3825Returns: 0 on success, -1 on error
3826Errors:
3827 E2BIG: the total state size (including the fixed-size part of struct
3828 kvm_nested_state) exceeds the value of 'size' specified by
3829 the user; the size required will be written into size.
3830
3831struct kvm_nested_state {
3832 __u16 flags;
3833 __u16 format;
3834 __u32 size;
3835 union {
3836 struct kvm_vmx_nested_state vmx;
3837 struct kvm_svm_nested_state svm;
3838 __u8 pad[120];
3839 };
3840 __u8 data[0];
3841};
3842
3843#define KVM_STATE_NESTED_GUEST_MODE 0x00000001
3844#define KVM_STATE_NESTED_RUN_PENDING 0x00000002
3845
3846#define KVM_STATE_NESTED_SMM_GUEST_MODE 0x00000001
3847#define KVM_STATE_NESTED_SMM_VMXON 0x00000002
3848
3849struct kvm_vmx_nested_state {
3850 __u64 vmxon_pa;
3851 __u64 vmcs_pa;
3852
3853 struct {
3854 __u16 flags;
3855 } smm;
3856};
3857
3858This ioctl copies the vcpu's nested virtualization state from the kernel to
3859userspace.
3860
3861The maximum size of the state, including the fixed-size part of struct
3862kvm_nested_state, can be retrieved by passing KVM_CAP_NESTED_STATE to
3863the KVM_CHECK_EXTENSION ioctl().
3864
38654.115 KVM_SET_NESTED_STATE
3866
3867Capability: KVM_CAP_NESTED_STATE
3868Architectures: x86
3869Type: vcpu ioctl
3870Parameters: struct kvm_nested_state (in)
3871Returns: 0 on success, -1 on error
3872
3873This copies the vcpu's kvm_nested_state struct from userspace to the kernel. For
3874the definition of struct kvm_nested_state, see KVM_GET_NESTED_STATE.
7bf14c28 3875
9943450b
PH
38764.116 KVM_(UN)REGISTER_COALESCED_MMIO
3877
0804c849
PH
3878Capability: KVM_CAP_COALESCED_MMIO (for coalesced mmio)
3879 KVM_CAP_COALESCED_PIO (for coalesced pio)
9943450b
PH
3880Architectures: all
3881Type: vm ioctl
3882Parameters: struct kvm_coalesced_mmio_zone
3883Returns: 0 on success, < 0 on error
3884
0804c849 3885Coalesced I/O is a performance optimization that defers hardware
9943450b
PH
3886register write emulation so that userspace exits are avoided. It is
3887typically used to reduce the overhead of emulating frequently accessed
3888hardware registers.
3889
0804c849 3890When a hardware register is configured for coalesced I/O, write accesses
9943450b
PH
3891do not exit to userspace and their value is recorded in a ring buffer
3892that is shared between kernel and userspace.
3893
0804c849 3894Coalesced I/O is used if one or more write accesses to a hardware
9943450b
PH
3895register can be deferred until a read or a write to another hardware
3896register on the same device. This last access will cause a vmexit and
3897userspace will process accesses from the ring buffer before emulating
0804c849
PH
3898it. That will avoid exiting to userspace on repeated writes.
3899
3900Coalesced pio is based on coalesced mmio. There is little difference
3901between coalesced mmio and pio except that coalesced pio records accesses
3902to I/O ports.
9943450b 3903
2a31b9db
PB
39044.117 KVM_CLEAR_DIRTY_LOG (vm ioctl)
3905
3906Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
3907Architectures: x86
3908Type: vm ioctl
3909Parameters: struct kvm_dirty_log (in)
3910Returns: 0 on success, -1 on error
3911
3912/* for KVM_CLEAR_DIRTY_LOG */
3913struct kvm_clear_dirty_log {
3914 __u32 slot;
3915 __u32 num_pages;
3916 __u64 first_page;
3917 union {
3918 void __user *dirty_bitmap; /* one bit per page */
3919 __u64 padding;
3920 };
3921};
3922
3923The ioctl clears the dirty status of pages in a memory slot, according to
3924the bitmap that is passed in struct kvm_clear_dirty_log's dirty_bitmap
3925field. Bit 0 of the bitmap corresponds to page "first_page" in the
3926memory slot, and num_pages is the size in bits of the input bitmap.
3927Both first_page and num_pages must be a multiple of 64. For each bit
3928that is set in the input bitmap, the corresponding page is marked "clean"
3929in KVM's dirty bitmap, and dirty tracking is re-enabled for that page
3930(for example via write-protection, or by clearing the dirty bit in
3931a page table entry).
3932
3933If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 specifies
3934the address space for which you want to return the dirty bitmap.
3935They must be less than the value that KVM_CHECK_EXTENSION returns for
3936the KVM_CAP_MULTI_ADDRESS_SPACE capability.
3937
3938This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
3939is enabled; for more information, see the description of the capability.
3940However, it can always be used as long as KVM_CHECK_EXTENSION confirms
3941that KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is present.
3942
2bc39970
VK
39434.118 KVM_GET_SUPPORTED_HV_CPUID
3944
3945Capability: KVM_CAP_HYPERV_CPUID
3946Architectures: x86
3947Type: vcpu ioctl
3948Parameters: struct kvm_cpuid2 (in/out)
3949Returns: 0 on success, -1 on error
3950
3951struct kvm_cpuid2 {
3952 __u32 nent;
3953 __u32 padding;
3954 struct kvm_cpuid_entry2 entries[0];
3955};
3956
3957struct kvm_cpuid_entry2 {
3958 __u32 function;
3959 __u32 index;
3960 __u32 flags;
3961 __u32 eax;
3962 __u32 ebx;
3963 __u32 ecx;
3964 __u32 edx;
3965 __u32 padding[3];
3966};
3967
3968This ioctl returns x86 cpuid features leaves related to Hyper-V emulation in
3969KVM. Userspace can use the information returned by this ioctl to construct
3970cpuid information presented to guests consuming Hyper-V enlightenments (e.g.
3971Windows or Hyper-V guests).
3972
3973CPUID feature leaves returned by this ioctl are defined by Hyper-V Top Level
3974Functional Specification (TLFS). These leaves can't be obtained with
3975KVM_GET_SUPPORTED_CPUID ioctl because some of them intersect with KVM feature
3976leaves (0x40000000, 0x40000001).
3977
3978Currently, the following list of CPUID leaves are returned:
3979 HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS
3980 HYPERV_CPUID_INTERFACE
3981 HYPERV_CPUID_VERSION
3982 HYPERV_CPUID_FEATURES
3983 HYPERV_CPUID_ENLIGHTMENT_INFO
3984 HYPERV_CPUID_IMPLEMENT_LIMITS
3985 HYPERV_CPUID_NESTED_FEATURES
3986
3987HYPERV_CPUID_NESTED_FEATURES leaf is only exposed when Enlightened VMCS was
3988enabled on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS).
3989
3990Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure
3991with the 'nent' field indicating the number of entries in the variable-size
3992array 'entries'. If the number of entries is too low to describe all Hyper-V
3993feature leaves, an error (E2BIG) is returned. If the number is more or equal
3994to the number of Hyper-V feature leaves, the 'nent' field is adjusted to the
3995number of valid entries in the 'entries' array, which is then filled.
3996
3997'index' and 'flags' fields in 'struct kvm_cpuid_entry2' are currently reserved,
3998userspace should not expect to get any particular value there.
2a31b9db 3999
50036ad0
DM
40004.119 KVM_ARM_VCPU_FINALIZE
4001
4002Capability: KVM_CAP_ARM_SVE
4003Architectures: arm, arm64
4004Type: vcpu ioctl
4005Parameters: int feature (in)
4006Returns: 0 on success, -1 on error
4007Errors:
4008 EPERM: feature not enabled, needs configuration, or already finalized
4009 EINVAL: unknown feature
4010
4011Recognised values for feature:
4012 arm64 KVM_ARM_VCPU_SVE
4013
4014Finalizes the configuration of the specified vcpu feature.
4015
4016The vcpu must already have been initialised, enabling the affected feature, by
4017means of a successful KVM_ARM_VCPU_INIT call with the appropriate flag set in
4018features[].
4019
4020For affected vcpu features, this is a mandatory step that must be performed
4021before the vcpu is fully usable.
4022
4023Between KVM_ARM_VCPU_INIT and KVM_ARM_VCPU_FINALIZE, the feature may be
4024configured by use of ioctls such as KVM_SET_ONE_REG. The exact configuration
4025that should be performaned and how to do it are feature-dependent.
4026
4027Other calls that depend on a particular feature being finalized, such as
4028KVM_RUN, KVM_GET_REG_LIST, KVM_GET_ONE_REG and KVM_SET_ONE_REG, will fail with
4029-EPERM unless the feature has already been finalized by means of a
4030KVM_ARM_VCPU_FINALIZE call.
4031
4032See KVM_ARM_VCPU_INIT for details of vcpu features that require finalization
4033using this ioctl.
4034
9c1b96e3 40355. The kvm_run structure
414fa985 4036------------------------
9c1b96e3
AK
4037
4038Application code obtains a pointer to the kvm_run structure by
4039mmap()ing a vcpu fd. From that point, application code can control
4040execution by changing fields in kvm_run prior to calling the KVM_RUN
4041ioctl, and obtain information about the reason KVM_RUN returned by
4042looking up structure members.
4043
4044struct kvm_run {
4045 /* in */
4046 __u8 request_interrupt_window;
4047
4048Request that KVM_RUN return when it becomes possible to inject external
4049interrupts into the guest. Useful in conjunction with KVM_INTERRUPT.
4050
460df4c1
PB
4051 __u8 immediate_exit;
4052
4053This field is polled once when KVM_RUN starts; if non-zero, KVM_RUN
4054exits immediately, returning -EINTR. In the common scenario where a
4055signal is used to "kick" a VCPU out of KVM_RUN, this field can be used
4056to avoid usage of KVM_SET_SIGNAL_MASK, which has worse scalability.
4057Rather than blocking the signal outside KVM_RUN, userspace can set up
4058a signal handler that sets run->immediate_exit to a non-zero value.
4059
4060This field is ignored if KVM_CAP_IMMEDIATE_EXIT is not available.
4061
4062 __u8 padding1[6];
9c1b96e3
AK
4063
4064 /* out */
4065 __u32 exit_reason;
4066
4067When KVM_RUN has returned successfully (return value 0), this informs
4068application code why KVM_RUN has returned. Allowable values for this
4069field are detailed below.
4070
4071 __u8 ready_for_interrupt_injection;
4072
4073If request_interrupt_window has been specified, this field indicates
4074an interrupt can be injected now with KVM_INTERRUPT.
4075
4076 __u8 if_flag;
4077
4078The value of the current interrupt flag. Only valid if in-kernel
4079local APIC is not used.
4080
f077825a
PB
4081 __u16 flags;
4082
4083More architecture-specific flags detailing state of the VCPU that may
4084affect the device's behavior. The only currently defined flag is
4085KVM_RUN_X86_SMM, which is valid on x86 machines and is set if the
4086VCPU is in system management mode.
9c1b96e3
AK
4087
4088 /* in (pre_kvm_run), out (post_kvm_run) */
4089 __u64 cr8;
4090
4091The value of the cr8 register. Only valid if in-kernel local APIC is
4092not used. Both input and output.
4093
4094 __u64 apic_base;
4095
4096The value of the APIC BASE msr. Only valid if in-kernel local
4097APIC is not used. Both input and output.
4098
4099 union {
4100 /* KVM_EXIT_UNKNOWN */
4101 struct {
4102 __u64 hardware_exit_reason;
4103 } hw;
4104
4105If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown
4106reasons. Further architecture-specific information is available in
4107hardware_exit_reason.
4108
4109 /* KVM_EXIT_FAIL_ENTRY */
4110 struct {
4111 __u64 hardware_entry_failure_reason;
4112 } fail_entry;
4113
4114If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due
4115to unknown reasons. Further architecture-specific information is
4116available in hardware_entry_failure_reason.
4117
4118 /* KVM_EXIT_EXCEPTION */
4119 struct {
4120 __u32 exception;
4121 __u32 error_code;
4122 } ex;
4123
4124Unused.
4125
4126 /* KVM_EXIT_IO */
4127 struct {
4128#define KVM_EXIT_IO_IN 0
4129#define KVM_EXIT_IO_OUT 1
4130 __u8 direction;
4131 __u8 size; /* bytes */
4132 __u16 port;
4133 __u32 count;
4134 __u64 data_offset; /* relative to kvm_run start */
4135 } io;
4136
2044892d 4137If exit_reason is KVM_EXIT_IO, then the vcpu has
9c1b96e3
AK
4138executed a port I/O instruction which could not be satisfied by kvm.
4139data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
4140where kvm expects application code to place the data for the next
2044892d 4141KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array.
9c1b96e3 4142
8ab30c15 4143 /* KVM_EXIT_DEBUG */
9c1b96e3
AK
4144 struct {
4145 struct kvm_debug_exit_arch arch;
4146 } debug;
4147
8ab30c15
AB
4148If the exit_reason is KVM_EXIT_DEBUG, then a vcpu is processing a debug event
4149for which architecture specific information is returned.
9c1b96e3
AK
4150
4151 /* KVM_EXIT_MMIO */
4152 struct {
4153 __u64 phys_addr;
4154 __u8 data[8];
4155 __u32 len;
4156 __u8 is_write;
4157 } mmio;
4158
2044892d 4159If exit_reason is KVM_EXIT_MMIO, then the vcpu has
9c1b96e3
AK
4160executed a memory-mapped I/O instruction which could not be satisfied
4161by kvm. The 'data' member contains the written data if 'is_write' is
4162true, and should be filled by application code otherwise.
4163
6acdb160
CD
4164The 'data' member contains, in its first 'len' bytes, the value as it would
4165appear if the VCPU performed a load or store of the appropriate width directly
4166to the byte array.
4167
cc568ead 4168NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and
ce91ddc4 4169 KVM_EXIT_EPR the corresponding
ad0a048b
AG
4170operations are complete (and guest state is consistent) only after userspace
4171has re-entered the kernel with KVM_RUN. The kernel side will first finish
67961344
MT
4172incomplete operations and then check for pending signals. Userspace
4173can re-enter the guest with an unmasked signal pending to complete
4174pending operations.
4175
9c1b96e3
AK
4176 /* KVM_EXIT_HYPERCALL */
4177 struct {
4178 __u64 nr;
4179 __u64 args[6];
4180 __u64 ret;
4181 __u32 longmode;
4182 __u32 pad;
4183 } hypercall;
4184
647dc49e
AK
4185Unused. This was once used for 'hypercall to userspace'. To implement
4186such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).
4187Note KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO.
9c1b96e3
AK
4188
4189 /* KVM_EXIT_TPR_ACCESS */
4190 struct {
4191 __u64 rip;
4192 __u32 is_write;
4193 __u32 pad;
4194 } tpr_access;
4195
4196To be documented (KVM_TPR_ACCESS_REPORTING).
4197
4198 /* KVM_EXIT_S390_SIEIC */
4199 struct {
4200 __u8 icptcode;
4201 __u64 mask; /* psw upper half */
4202 __u64 addr; /* psw lower half */
4203 __u16 ipa;
4204 __u32 ipb;
4205 } s390_sieic;
4206
4207s390 specific.
4208
4209 /* KVM_EXIT_S390_RESET */
4210#define KVM_S390_RESET_POR 1
4211#define KVM_S390_RESET_CLEAR 2
4212#define KVM_S390_RESET_SUBSYSTEM 4
4213#define KVM_S390_RESET_CPU_INIT 8
4214#define KVM_S390_RESET_IPL 16
4215 __u64 s390_reset_flags;
4216
4217s390 specific.
4218
e168bf8d
CO
4219 /* KVM_EXIT_S390_UCONTROL */
4220 struct {
4221 __u64 trans_exc_code;
4222 __u32 pgm_code;
4223 } s390_ucontrol;
4224
4225s390 specific. A page fault has occurred for a user controlled virtual
4226machine (KVM_VM_S390_UNCONTROL) on it's host page table that cannot be
4227resolved by the kernel.
4228The program code and the translation exception code that were placed
4229in the cpu's lowcore are presented here as defined by the z Architecture
4230Principles of Operation Book in the Chapter for Dynamic Address Translation
4231(DAT)
4232
9c1b96e3
AK
4233 /* KVM_EXIT_DCR */
4234 struct {
4235 __u32 dcrn;
4236 __u32 data;
4237 __u8 is_write;
4238 } dcr;
4239
ce91ddc4 4240Deprecated - was used for 440 KVM.
9c1b96e3 4241
ad0a048b
AG
4242 /* KVM_EXIT_OSI */
4243 struct {
4244 __u64 gprs[32];
4245 } osi;
4246
4247MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch
4248hypercalls and exit with this exit struct that contains all the guest gprs.
4249
4250If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall.
4251Userspace can now handle the hypercall and when it's done modify the gprs as
4252necessary. Upon guest entry all guest GPRs will then be replaced by the values
4253in this struct.
4254
de56a948
PM
4255 /* KVM_EXIT_PAPR_HCALL */
4256 struct {
4257 __u64 nr;
4258 __u64 ret;
4259 __u64 args[9];
4260 } papr_hcall;
4261
4262This is used on 64-bit PowerPC when emulating a pSeries partition,
4263e.g. with the 'pseries' machine type in qemu. It occurs when the
4264guest does a hypercall using the 'sc 1' instruction. The 'nr' field
4265contains the hypercall number (from the guest R3), and 'args' contains
4266the arguments (from the guest R4 - R12). Userspace should put the
4267return code in 'ret' and any extra returned values in args[].
4268The possible hypercalls are defined in the Power Architecture Platform
4269Requirements (PAPR) document available from www.power.org (free
4270developer registration required to access it).
4271
fa6b7fe9
CH
4272 /* KVM_EXIT_S390_TSCH */
4273 struct {
4274 __u16 subchannel_id;
4275 __u16 subchannel_nr;
4276 __u32 io_int_parm;
4277 __u32 io_int_word;
4278 __u32 ipb;
4279 __u8 dequeued;
4280 } s390_tsch;
4281
4282s390 specific. This exit occurs when KVM_CAP_S390_CSS_SUPPORT has been enabled
4283and TEST SUBCHANNEL was intercepted. If dequeued is set, a pending I/O
4284interrupt for the target subchannel has been dequeued and subchannel_id,
4285subchannel_nr, io_int_parm and io_int_word contain the parameters for that
4286interrupt. ipb is needed for instruction parameter decoding.
4287
1c810636
AG
4288 /* KVM_EXIT_EPR */
4289 struct {
4290 __u32 epr;
4291 } epr;
4292
4293On FSL BookE PowerPC chips, the interrupt controller has a fast patch
4294interrupt acknowledge path to the core. When the core successfully
4295delivers an interrupt, it automatically populates the EPR register with
4296the interrupt vector number and acknowledges the interrupt inside
4297the interrupt controller.
4298
4299In case the interrupt controller lives in user space, we need to do
4300the interrupt acknowledge cycle through it to fetch the next to be
4301delivered interrupt vector using this exit.
4302
4303It gets triggered whenever both KVM_CAP_PPC_EPR are enabled and an
4304external interrupt has just been delivered into the guest. User space
4305should put the acknowledged interrupt vector into the 'epr' field.
4306
8ad6b634
AP
4307 /* KVM_EXIT_SYSTEM_EVENT */
4308 struct {
4309#define KVM_SYSTEM_EVENT_SHUTDOWN 1
4310#define KVM_SYSTEM_EVENT_RESET 2
2ce79189 4311#define KVM_SYSTEM_EVENT_CRASH 3
8ad6b634
AP
4312 __u32 type;
4313 __u64 flags;
4314 } system_event;
4315
4316If exit_reason is KVM_EXIT_SYSTEM_EVENT then the vcpu has triggered
4317a system-level event using some architecture specific mechanism (hypercall
4318or some special instruction). In case of ARM/ARM64, this is triggered using
4319HVC instruction based PSCI call from the vcpu. The 'type' field describes
4320the system-level event type. The 'flags' field describes architecture
4321specific flags for the system-level event.
4322
cf5d3188
CD
4323Valid values for 'type' are:
4324 KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the
4325 VM. Userspace is not obliged to honour this, and if it does honour
4326 this does not need to destroy the VM synchronously (ie it may call
4327 KVM_RUN again before shutdown finally occurs).
4328 KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM.
4329 As with SHUTDOWN, userspace can choose to ignore the request, or
4330 to schedule the reset to occur in the future and may call KVM_RUN again.
2ce79189
AS
4331 KVM_SYSTEM_EVENT_CRASH -- the guest crash occurred and the guest
4332 has requested a crash condition maintenance. Userspace can choose
4333 to ignore the request, or to gather VM memory core dump and/or
4334 reset/shutdown of the VM.
cf5d3188 4335
7543a635
SR
4336 /* KVM_EXIT_IOAPIC_EOI */
4337 struct {
4338 __u8 vector;
4339 } eoi;
4340
4341Indicates that the VCPU's in-kernel local APIC received an EOI for a
4342level-triggered IOAPIC interrupt. This exit only triggers when the
4343IOAPIC is implemented in userspace (i.e. KVM_CAP_SPLIT_IRQCHIP is enabled);
4344the userspace IOAPIC should process the EOI and retrigger the interrupt if
4345it is still asserted. Vector is the LAPIC interrupt vector for which the
4346EOI was received.
4347
db397571
AS
4348 struct kvm_hyperv_exit {
4349#define KVM_EXIT_HYPERV_SYNIC 1
83326e43 4350#define KVM_EXIT_HYPERV_HCALL 2
db397571
AS
4351 __u32 type;
4352 union {
4353 struct {
4354 __u32 msr;
4355 __u64 control;
4356 __u64 evt_page;
4357 __u64 msg_page;
4358 } synic;
83326e43
AS
4359 struct {
4360 __u64 input;
4361 __u64 result;
4362 __u64 params[2];
4363 } hcall;
db397571
AS
4364 } u;
4365 };
4366 /* KVM_EXIT_HYPERV */
4367 struct kvm_hyperv_exit hyperv;
4368Indicates that the VCPU exits into userspace to process some tasks
4369related to Hyper-V emulation.
4370Valid values for 'type' are:
4371 KVM_EXIT_HYPERV_SYNIC -- synchronously notify user-space about
4372Hyper-V SynIC state change. Notification is used to remap SynIC
4373event/message pages and to enable/disable SynIC messages/events processing
4374in userspace.
4375
9c1b96e3
AK
4376 /* Fix the size of the union. */
4377 char padding[256];
4378 };
b9e5dc8d
CB
4379
4380 /*
4381 * shared registers between kvm and userspace.
4382 * kvm_valid_regs specifies the register classes set by the host
4383 * kvm_dirty_regs specified the register classes dirtied by userspace
4384 * struct kvm_sync_regs is architecture specific, as well as the
4385 * bits for kvm_valid_regs and kvm_dirty_regs
4386 */
4387 __u64 kvm_valid_regs;
4388 __u64 kvm_dirty_regs;
4389 union {
4390 struct kvm_sync_regs regs;
7b7e3952 4391 char padding[SYNC_REGS_SIZE_BYTES];
b9e5dc8d
CB
4392 } s;
4393
4394If KVM_CAP_SYNC_REGS is defined, these fields allow userspace to access
4395certain guest registers without having to call SET/GET_*REGS. Thus we can
4396avoid some system call overhead if userspace has to handle the exit.
4397Userspace can query the validity of the structure by checking
4398kvm_valid_regs for specific bits. These bits are architecture specific
4399and usually define the validity of a groups of registers. (e.g. one bit
4400 for general purpose registers)
4401
d8482c0d
DH
4402Please note that the kernel is allowed to use the kvm_run structure as the
4403primary storage for certain register types. Therefore, the kernel may use the
4404values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
4405
9c1b96e3 4406};
821246a5 4407
414fa985 4408
9c15bb1d 4409
699a0ea0
PM
44106. Capabilities that can be enabled on vCPUs
4411--------------------------------------------
821246a5 4412
0907c855
CH
4413There are certain capabilities that change the behavior of the virtual CPU or
4414the virtual machine when enabled. To enable them, please see section 4.37.
4415Below you can find a list of capabilities and what their effect on the vCPU or
4416the virtual machine is when enabling them.
821246a5
AG
4417
4418The following information is provided along with the description:
4419
4420 Architectures: which instruction set architectures provide this ioctl.
4421 x86 includes both i386 and x86_64.
4422
0907c855
CH
4423 Target: whether this is a per-vcpu or per-vm capability.
4424
821246a5
AG
4425 Parameters: what parameters are accepted by the capability.
4426
4427 Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
4428 are not detailed, but errors with specific meanings are.
4429
414fa985 4430
821246a5
AG
44316.1 KVM_CAP_PPC_OSI
4432
4433Architectures: ppc
0907c855 4434Target: vcpu
821246a5
AG
4435Parameters: none
4436Returns: 0 on success; -1 on error
4437
4438This capability enables interception of OSI hypercalls that otherwise would
4439be treated as normal system calls to be injected into the guest. OSI hypercalls
4440were invented by Mac-on-Linux to have a standardized communication mechanism
4441between the guest and the host.
4442
4443When this capability is enabled, KVM_EXIT_OSI can occur.
4444
414fa985 4445
821246a5
AG
44466.2 KVM_CAP_PPC_PAPR
4447
4448Architectures: ppc
0907c855 4449Target: vcpu
821246a5
AG
4450Parameters: none
4451Returns: 0 on success; -1 on error
4452
4453This capability enables interception of PAPR hypercalls. PAPR hypercalls are
4454done using the hypercall instruction "sc 1".
4455
4456It also sets the guest privilege level to "supervisor" mode. Usually the guest
4457runs in "hypervisor" privilege mode with a few missing features.
4458
4459In addition to the above, it changes the semantics of SDR1. In this mode, the
4460HTAB address part of SDR1 contains an HVA instead of a GPA, as PAPR keeps the
4461HTAB invisible to the guest.
4462
4463When this capability is enabled, KVM_EXIT_PAPR_HCALL can occur.
dc83b8bc 4464
414fa985 4465
dc83b8bc
SW
44666.3 KVM_CAP_SW_TLB
4467
4468Architectures: ppc
0907c855 4469Target: vcpu
dc83b8bc
SW
4470Parameters: args[0] is the address of a struct kvm_config_tlb
4471Returns: 0 on success; -1 on error
4472
4473struct kvm_config_tlb {
4474 __u64 params;
4475 __u64 array;
4476 __u32 mmu_type;
4477 __u32 array_len;
4478};
4479
4480Configures the virtual CPU's TLB array, establishing a shared memory area
4481between userspace and KVM. The "params" and "array" fields are userspace
4482addresses of mmu-type-specific data structures. The "array_len" field is an
4483safety mechanism, and should be set to the size in bytes of the memory that
4484userspace has reserved for the array. It must be at least the size dictated
4485by "mmu_type" and "params".
4486
4487While KVM_RUN is active, the shared region is under control of KVM. Its
4488contents are undefined, and any modification by userspace results in
4489boundedly undefined behavior.
4490
4491On return from KVM_RUN, the shared region will reflect the current state of
4492the guest's TLB. If userspace makes any changes, it must call KVM_DIRTY_TLB
4493to tell KVM which entries have been changed, prior to calling KVM_RUN again
4494on this vcpu.
4495
4496For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
4497 - The "params" field is of type "struct kvm_book3e_206_tlb_params".
4498 - The "array" field points to an array of type "struct
4499 kvm_book3e_206_tlb_entry".
4500 - The array consists of all entries in the first TLB, followed by all
4501 entries in the second TLB.
4502 - Within a TLB, entries are ordered first by increasing set number. Within a
4503 set, entries are ordered by way (increasing ESEL).
4504 - The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1)
4505 where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value.
4506 - The tsize field of mas1 shall be set to 4K on TLB0, even though the
4507 hardware ignores this value for TLB0.
fa6b7fe9
CH
4508
45096.4 KVM_CAP_S390_CSS_SUPPORT
4510
4511Architectures: s390
0907c855 4512Target: vcpu
fa6b7fe9
CH
4513Parameters: none
4514Returns: 0 on success; -1 on error
4515
4516This capability enables support for handling of channel I/O instructions.
4517
4518TEST PENDING INTERRUPTION and the interrupt portion of TEST SUBCHANNEL are
4519handled in-kernel, while the other I/O instructions are passed to userspace.
4520
4521When this capability is enabled, KVM_EXIT_S390_TSCH will occur on TEST
4522SUBCHANNEL intercepts.
1c810636 4523
0907c855
CH
4524Note that even though this capability is enabled per-vcpu, the complete
4525virtual machine is affected.
4526
1c810636
AG
45276.5 KVM_CAP_PPC_EPR
4528
4529Architectures: ppc
0907c855 4530Target: vcpu
1c810636
AG
4531Parameters: args[0] defines whether the proxy facility is active
4532Returns: 0 on success; -1 on error
4533
4534This capability enables or disables the delivery of interrupts through the
4535external proxy facility.
4536
4537When enabled (args[0] != 0), every time the guest gets an external interrupt
4538delivered, it automatically exits into user space with a KVM_EXIT_EPR exit
4539to receive the topmost interrupt vector.
4540
4541When disabled (args[0] == 0), behavior is as if this facility is unsupported.
4542
4543When this capability is enabled, KVM_EXIT_EPR can occur.
eb1e4f43
SW
4544
45456.6 KVM_CAP_IRQ_MPIC
4546
4547Architectures: ppc
4548Parameters: args[0] is the MPIC device fd
4549 args[1] is the MPIC CPU number for this vcpu
4550
4551This capability connects the vcpu to an in-kernel MPIC device.
5975a2e0
PM
4552
45536.7 KVM_CAP_IRQ_XICS
4554
4555Architectures: ppc
0907c855 4556Target: vcpu
5975a2e0
PM
4557Parameters: args[0] is the XICS device fd
4558 args[1] is the XICS CPU number (server ID) for this vcpu
4559
4560This capability connects the vcpu to an in-kernel XICS device.
8a366a4b
CH
4561
45626.8 KVM_CAP_S390_IRQCHIP
4563
4564Architectures: s390
4565Target: vm
4566Parameters: none
4567
4568This capability enables the in-kernel irqchip for s390. Please refer to
4569"4.24 KVM_CREATE_IRQCHIP" for details.
699a0ea0 4570
5fafd874
JH
45716.9 KVM_CAP_MIPS_FPU
4572
4573Architectures: mips
4574Target: vcpu
4575Parameters: args[0] is reserved for future use (should be 0).
4576
4577This capability allows the use of the host Floating Point Unit by the guest. It
4578allows the Config1.FP bit to be set to enable the FPU in the guest. Once this is
4579done the KVM_REG_MIPS_FPR_* and KVM_REG_MIPS_FCR_* registers can be accessed
4580(depending on the current guest FPU register mode), and the Status.FR,
4581Config5.FRE bits are accessible via the KVM API and also from the guest,
4582depending on them being supported by the FPU.
4583
d952bd07
JH
45846.10 KVM_CAP_MIPS_MSA
4585
4586Architectures: mips
4587Target: vcpu
4588Parameters: args[0] is reserved for future use (should be 0).
4589
4590This capability allows the use of the MIPS SIMD Architecture (MSA) by the guest.
4591It allows the Config3.MSAP bit to be set to enable the use of MSA by the guest.
4592Once this is done the KVM_REG_MIPS_VEC_* and KVM_REG_MIPS_MSA_* registers can be
4593accessed, and the Config5.MSAEn bit is accessible via the KVM API and also from
4594the guest.
4595
01643c51
KH
45966.74 KVM_CAP_SYNC_REGS
4597Architectures: s390, x86
4598Target: s390: always enabled, x86: vcpu
4599Parameters: none
4600Returns: x86: KVM_CHECK_EXTENSION returns a bit-array indicating which register
4601sets are supported (bitfields defined in arch/x86/include/uapi/asm/kvm.h).
4602
4603As described above in the kvm_sync_regs struct info in section 5 (kvm_run):
4604KVM_CAP_SYNC_REGS "allow[s] userspace to access certain guest registers
4605without having to call SET/GET_*REGS". This reduces overhead by eliminating
4606repeated ioctl calls for setting and/or getting register values. This is
4607particularly important when userspace is making synchronous guest state
4608modifications, e.g. when emulating and/or intercepting instructions in
4609userspace.
4610
4611For s390 specifics, please refer to the source code.
4612
4613For x86:
4614- the register sets to be copied out to kvm_run are selectable
4615 by userspace (rather that all sets being copied out for every exit).
4616- vcpu_events are available in addition to regs and sregs.
4617
4618For x86, the 'kvm_valid_regs' field of struct kvm_run is overloaded to
4619function as an input bit-array field set by userspace to indicate the
4620specific register sets to be copied out on the next exit.
4621
4622To indicate when userspace has modified values that should be copied into
4623the vCPU, the all architecture bitarray field, 'kvm_dirty_regs' must be set.
4624This is done using the same bitflags as for the 'kvm_valid_regs' field.
4625If the dirty bit is not set, then the register set values will not be copied
4626into the vCPU even if they've been modified.
4627
4628Unused bitfields in the bitarrays must be set to zero.
4629
4630struct kvm_sync_regs {
4631 struct kvm_regs regs;
4632 struct kvm_sregs sregs;
4633 struct kvm_vcpu_events events;
4634};
4635
699a0ea0
PM
46367. Capabilities that can be enabled on VMs
4637------------------------------------------
4638
4639There are certain capabilities that change the behavior of the virtual
4640machine when enabled. To enable them, please see section 4.37. Below
4641you can find a list of capabilities and what their effect on the VM
4642is when enabling them.
4643
4644The following information is provided along with the description:
4645
4646 Architectures: which instruction set architectures provide this ioctl.
4647 x86 includes both i386 and x86_64.
4648
4649 Parameters: what parameters are accepted by the capability.
4650
4651 Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
4652 are not detailed, but errors with specific meanings are.
4653
4654
46557.1 KVM_CAP_PPC_ENABLE_HCALL
4656
4657Architectures: ppc
4658Parameters: args[0] is the sPAPR hcall number
4659 args[1] is 0 to disable, 1 to enable in-kernel handling
4660
4661This capability controls whether individual sPAPR hypercalls (hcalls)
4662get handled by the kernel or not. Enabling or disabling in-kernel
4663handling of an hcall is effective across the VM. On creation, an
4664initial set of hcalls are enabled for in-kernel handling, which
4665consists of those hcalls for which in-kernel handlers were implemented
4666before this capability was implemented. If disabled, the kernel will
4667not to attempt to handle the hcall, but will always exit to userspace
4668to handle it. Note that it may not make sense to enable some and
4669disable others of a group of related hcalls, but KVM does not prevent
4670userspace from doing that.
ae2113a4
PM
4671
4672If the hcall number specified is not one that has an in-kernel
4673implementation, the KVM_ENABLE_CAP ioctl will fail with an EINVAL
4674error.
2444b352
DH
4675
46767.2 KVM_CAP_S390_USER_SIGP
4677
4678Architectures: s390
4679Parameters: none
4680
4681This capability controls which SIGP orders will be handled completely in user
4682space. With this capability enabled, all fast orders will be handled completely
4683in the kernel:
4684- SENSE
4685- SENSE RUNNING
4686- EXTERNAL CALL
4687- EMERGENCY SIGNAL
4688- CONDITIONAL EMERGENCY SIGNAL
4689
4690All other orders will be handled completely in user space.
4691
4692Only privileged operation exceptions will be checked for in the kernel (or even
4693in the hardware prior to interception). If this capability is not enabled, the
4694old way of handling SIGP orders is used (partially in kernel and user space).
68c55750
EF
4695
46967.3 KVM_CAP_S390_VECTOR_REGISTERS
4697
4698Architectures: s390
4699Parameters: none
4700Returns: 0 on success, negative value on error
4701
4702Allows use of the vector registers introduced with z13 processor, and
4703provides for the synchronization between host and user space. Will
4704return -EINVAL if the machine does not support vectors.
e44fc8c9
ET
4705
47067.4 KVM_CAP_S390_USER_STSI
4707
4708Architectures: s390
4709Parameters: none
4710
4711This capability allows post-handlers for the STSI instruction. After
4712initial handling in the kernel, KVM exits to user space with
4713KVM_EXIT_S390_STSI to allow user space to insert further data.
4714
4715Before exiting to userspace, kvm handlers should fill in s390_stsi field of
4716vcpu->run:
4717struct {
4718 __u64 addr;
4719 __u8 ar;
4720 __u8 reserved;
4721 __u8 fc;
4722 __u8 sel1;
4723 __u16 sel2;
4724} s390_stsi;
4725
4726@addr - guest address of STSI SYSIB
4727@fc - function code
4728@sel1 - selector 1
4729@sel2 - selector 2
4730@ar - access register number
4731
4732KVM handlers should exit to userspace with rc = -EREMOTE.
e928e9cb 4733
49df6397
SR
47347.5 KVM_CAP_SPLIT_IRQCHIP
4735
4736Architectures: x86
b053b2ae 4737Parameters: args[0] - number of routes reserved for userspace IOAPICs
49df6397
SR
4738Returns: 0 on success, -1 on error
4739
4740Create a local apic for each processor in the kernel. This can be used
4741instead of KVM_CREATE_IRQCHIP if the userspace VMM wishes to emulate the
4742IOAPIC and PIC (and also the PIT, even though this has to be enabled
4743separately).
4744
b053b2ae
SR
4745This capability also enables in kernel routing of interrupt requests;
4746when KVM_CAP_SPLIT_IRQCHIP only routes of KVM_IRQ_ROUTING_MSI type are
4747used in the IRQ routing table. The first args[0] MSI routes are reserved
4748for the IOAPIC pins. Whenever the LAPIC receives an EOI for these routes,
4749a KVM_EXIT_IOAPIC_EOI vmexit will be reported to userspace.
49df6397
SR
4750
4751Fails if VCPU has already been created, or if the irqchip is already in the
4752kernel (i.e. KVM_CREATE_IRQCHIP has already been called).
4753
051c87f7
DH
47547.6 KVM_CAP_S390_RI
4755
4756Architectures: s390
4757Parameters: none
4758
4759Allows use of runtime-instrumentation introduced with zEC12 processor.
4760Will return -EINVAL if the machine does not support runtime-instrumentation.
4761Will return -EBUSY if a VCPU has already been created.
e928e9cb 4762
37131313
RK
47637.7 KVM_CAP_X2APIC_API
4764
4765Architectures: x86
4766Parameters: args[0] - features that should be enabled
4767Returns: 0 on success, -EINVAL when args[0] contains invalid features
4768
4769Valid feature flags in args[0] are
4770
4771#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
c519265f 4772#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
37131313
RK
4773
4774Enabling KVM_X2APIC_API_USE_32BIT_IDS changes the behavior of
4775KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC,
4776allowing the use of 32-bit APIC IDs. See KVM_CAP_X2APIC_API in their
4777respective sections.
4778
c519265f
RK
4779KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK must be enabled for x2APIC to work
4780in logical mode or with more than 255 VCPUs. Otherwise, KVM treats 0xff
4781as a broadcast even in x2APIC mode in order to support physical x2APIC
4782without interrupt remapping. This is undesirable in logical mode,
4783where 0xff represents CPUs 0-7 in cluster 0.
37131313 4784
6502a34c
DH
47857.8 KVM_CAP_S390_USER_INSTR0
4786
4787Architectures: s390
4788Parameters: none
4789
4790With this capability enabled, all illegal instructions 0x0000 (2 bytes) will
4791be intercepted and forwarded to user space. User space can use this
4792mechanism e.g. to realize 2-byte software breakpoints. The kernel will
4793not inject an operating exception for these instructions, user space has
4794to take care of that.
4795
4796This capability can be enabled dynamically even if VCPUs were already
4797created and are running.
37131313 4798
4e0b1ab7
FZ
47997.9 KVM_CAP_S390_GS
4800
4801Architectures: s390
4802Parameters: none
4803Returns: 0 on success; -EINVAL if the machine does not support
4804 guarded storage; -EBUSY if a VCPU has already been created.
4805
4806Allows use of guarded storage for the KVM guest.
4807
47a4693e
YMZ
48087.10 KVM_CAP_S390_AIS
4809
4810Architectures: s390
4811Parameters: none
4812
4813Allow use of adapter-interruption suppression.
4814Returns: 0 on success; -EBUSY if a VCPU has already been created.
4815
3c313524
PM
48167.11 KVM_CAP_PPC_SMT
4817
4818Architectures: ppc
4819Parameters: vsmt_mode, flags
4820
4821Enabling this capability on a VM provides userspace with a way to set
4822the desired virtual SMT mode (i.e. the number of virtual CPUs per
4823virtual core). The virtual SMT mode, vsmt_mode, must be a power of 2
4824between 1 and 8. On POWER8, vsmt_mode must also be no greater than
4825the number of threads per subcore for the host. Currently flags must
4826be 0. A successful call to enable this capability will result in
4827vsmt_mode being returned when the KVM_CAP_PPC_SMT capability is
4828subsequently queried for the VM. This capability is only supported by
4829HV KVM, and can only be set before any VCPUs have been created.
2ed4f9dd
PM
4830The KVM_CAP_PPC_SMT_POSSIBLE capability indicates which virtual SMT
4831modes are available.
3c313524 4832
134764ed
AP
48337.12 KVM_CAP_PPC_FWNMI
4834
4835Architectures: ppc
4836Parameters: none
4837
4838With this capability a machine check exception in the guest address
4839space will cause KVM to exit the guest with NMI exit reason. This
4840enables QEMU to build error log and branch to guest kernel registered
4841machine check handling routine. Without this capability KVM will
4842branch to guests' 0x200 interrupt vector.
4843
4d5422ce
WL
48447.13 KVM_CAP_X86_DISABLE_EXITS
4845
4846Architectures: x86
4847Parameters: args[0] defines which exits are disabled
4848Returns: 0 on success, -EINVAL when args[0] contains invalid exits
4849
4850Valid bits in args[0] are
4851
4852#define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0)
caa057a2 4853#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
4d5422ce
WL
4854
4855Enabling this capability on a VM provides userspace with a way to no
4856longer intercept some instructions for improved latency in some
4857workloads, and is suggested when vCPUs are associated to dedicated
4858physical CPUs. More bits can be added in the future; userspace can
4859just pass the KVM_CHECK_EXTENSION result to KVM_ENABLE_CAP to disable
4860all such vmexits.
4861
caa057a2 4862Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
4d5422ce 4863
a4499382
JF
48647.14 KVM_CAP_S390_HPAGE_1M
4865
4866Architectures: s390
4867Parameters: none
4868Returns: 0 on success, -EINVAL if hpage module parameter was not set
40ebdb8e
JF
4869 or cmma is enabled, or the VM has the KVM_VM_S390_UCONTROL
4870 flag set
a4499382
JF
4871
4872With this capability the KVM support for memory backing with 1m pages
4873through hugetlbfs can be enabled for a VM. After the capability is
4874enabled, cmma can't be enabled anymore and pfmfi and the storage key
4875interpretation are disabled. If cmma has already been enabled or the
4876hpage module parameter is not set to 1, -EINVAL is returned.
4877
4878While it is generally possible to create a huge page backed VM without
4879this capability, the VM will not be able to run.
4880
c4f55198 48817.15 KVM_CAP_MSR_PLATFORM_INFO
6fbbde9a
DS
4882
4883Architectures: x86
4884Parameters: args[0] whether feature should be enabled or not
4885
4886With this capability, a guest may read the MSR_PLATFORM_INFO MSR. Otherwise,
4887a #GP would be raised when the guest tries to access. Currently, this
4888capability does not enable write permissions of this MSR for the guest.
4889
aa069a99
PM
48907.16 KVM_CAP_PPC_NESTED_HV
4891
4892Architectures: ppc
4893Parameters: none
4894Returns: 0 on success, -EINVAL when the implementation doesn't support
4895 nested-HV virtualization.
4896
4897HV-KVM on POWER9 and later systems allows for "nested-HV"
4898virtualization, which provides a way for a guest VM to run guests that
4899can run using the CPU's supervisor mode (privileged non-hypervisor
4900state). Enabling this capability on a VM depends on the CPU having
4901the necessary functionality and on the facility being enabled with a
4902kvm-hv module parameter.
4903
c4f55198
JM
49047.17 KVM_CAP_EXCEPTION_PAYLOAD
4905
4906Architectures: x86
4907Parameters: args[0] whether feature should be enabled or not
4908
4909With this capability enabled, CR2 will not be modified prior to the
4910emulated VM-exit when L1 intercepts a #PF exception that occurs in
4911L2. Similarly, for kvm-intel only, DR6 will not be modified prior to
4912the emulated VM-exit when L1 intercepts a #DB exception that occurs in
4913L2. As a result, when KVM_GET_VCPU_EVENTS reports a pending #PF (or
4914#DB) exception for L2, exception.has_payload will be set and the
4915faulting address (or the new DR6 bits*) will be reported in the
4916exception_payload field. Similarly, when userspace injects a #PF (or
4917#DB) into L2 using KVM_SET_VCPU_EVENTS, it is expected to set
4918exception.has_payload and to put the faulting address (or the new DR6
4919bits*) in the exception_payload field.
4920
4921This capability also enables exception.pending in struct
4922kvm_vcpu_events, which allows userspace to distinguish between pending
4923and injected exceptions.
4924
4925
4926* For the new DR6 bits, note that bit 16 is set iff the #DB exception
4927 will clear DR6.RTM.
4928
2a31b9db
PB
49297.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
4930
4931Architectures: all
4932Parameters: args[0] whether feature should be enabled or not
4933
4934With this capability enabled, KVM_GET_DIRTY_LOG will not automatically
4935clear and write-protect all pages that are returned as dirty.
4936Rather, userspace will have to do this operation separately using
4937KVM_CLEAR_DIRTY_LOG.
4938
4939At the cost of a slightly more complicated operation, this provides better
4940scalability and responsiveness for two reasons. First,
4941KVM_CLEAR_DIRTY_LOG ioctl can operate on a 64-page granularity rather
4942than requiring to sync a full memslot; this ensures that KVM does not
4943take spinlocks for an extended period of time. Second, in some cases a
4944large amount of time can pass between a call to KVM_GET_DIRTY_LOG and
4945userspace actually using the data in the page. Pages can be modified
4946during this time, which is inefficint for both the guest and userspace:
4947the guest will incur a higher penalty due to write protection faults,
4948while userspace can see false reports of dirty pages. Manual reprotection
4949helps reducing this time, improving guest performance and reducing the
4950number of dirty log false positives.
4951
4952
e928e9cb
ME
49538. Other capabilities.
4954----------------------
4955
4956This section lists capabilities that give information about other
4957features of the KVM implementation.
4958
49598.1 KVM_CAP_PPC_HWRNG
4960
4961Architectures: ppc
4962
4963This capability, if KVM_CHECK_EXTENSION indicates that it is
4964available, means that that the kernel has an implementation of the
4965H_RANDOM hypercall backed by a hardware random-number generator.
4966If present, the kernel H_RANDOM handler can be enabled for guest use
4967with the KVM_CAP_PPC_ENABLE_HCALL capability.
5c919412
AS
4968
49698.2 KVM_CAP_HYPERV_SYNIC
4970
4971Architectures: x86
4972This capability, if KVM_CHECK_EXTENSION indicates that it is
4973available, means that that the kernel has an implementation of the
4974Hyper-V Synthetic interrupt controller(SynIC). Hyper-V SynIC is
4975used to support Windows Hyper-V based guest paravirt drivers(VMBus).
4976
4977In order to use SynIC, it has to be activated by setting this
4978capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
4979will disable the use of APIC hardware virtualization even if supported
4980by the CPU, as it's incompatible with SynIC auto-EOI behavior.
c9270132
PM
4981
49828.3 KVM_CAP_PPC_RADIX_MMU
4983
4984Architectures: ppc
4985
4986This capability, if KVM_CHECK_EXTENSION indicates that it is
4987available, means that that the kernel can support guests using the
4988radix MMU defined in Power ISA V3.00 (as implemented in the POWER9
4989processor).
4990
49918.4 KVM_CAP_PPC_HASH_MMU_V3
4992
4993Architectures: ppc
4994
4995This capability, if KVM_CHECK_EXTENSION indicates that it is
4996available, means that that the kernel can support guests using the
4997hashed page table MMU defined in Power ISA V3.00 (as implemented in
4998the POWER9 processor), including in-memory segment tables.
a8a3c426
JH
4999
50008.5 KVM_CAP_MIPS_VZ
5001
5002Architectures: mips
5003
5004This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that
5005it is available, means that full hardware assisted virtualization capabilities
5006of the hardware are available for use through KVM. An appropriate
5007KVM_VM_MIPS_* type must be passed to KVM_CREATE_VM to create a VM which
5008utilises it.
5009
5010If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is
5011available, it means that the VM is using full hardware assisted virtualization
5012capabilities of the hardware. This is useful to check after creating a VM with
5013KVM_VM_MIPS_DEFAULT.
5014
5015The value returned by KVM_CHECK_EXTENSION should be compared against known
5016values (see below). All other values are reserved. This is to allow for the
5017possibility of other hardware assisted virtualization implementations which
5018may be incompatible with the MIPS VZ ASE.
5019
5020 0: The trap & emulate implementation is in use to run guest code in user
5021 mode. Guest virtual memory segments are rearranged to fit the guest in the
5022 user mode address space.
5023
5024 1: The MIPS VZ ASE is in use, providing full hardware assisted
5025 virtualization, including standard guest virtual memory segments.
5026
50278.6 KVM_CAP_MIPS_TE
5028
5029Architectures: mips
5030
5031This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that
5032it is available, means that the trap & emulate implementation is available to
5033run guest code in user mode, even if KVM_CAP_MIPS_VZ indicates that hardware
5034assisted virtualisation is also available. KVM_VM_MIPS_TE (0) must be passed
5035to KVM_CREATE_VM to create a VM which utilises it.
5036
5037If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is
5038available, it means that the VM is using trap & emulate.
578fd61d
JH
5039
50408.7 KVM_CAP_MIPS_64BIT
5041
5042Architectures: mips
5043
5044This capability indicates the supported architecture type of the guest, i.e. the
5045supported register and address width.
5046
5047The values returned when this capability is checked by KVM_CHECK_EXTENSION on a
5048kvm VM handle correspond roughly to the CP0_Config.AT register field, and should
5049be checked specifically against known values (see below). All other values are
5050reserved.
5051
5052 0: MIPS32 or microMIPS32.
5053 Both registers and addresses are 32-bits wide.
5054 It will only be possible to run 32-bit guest code.
5055
5056 1: MIPS64 or microMIPS64 with access only to 32-bit compatibility segments.
5057 Registers are 64-bits wide, but addresses are 32-bits wide.
5058 64-bit guest code may run but cannot access MIPS64 memory segments.
5059 It will also be possible to run 32-bit guest code.
5060
5061 2: MIPS64 or microMIPS64 with access to all address segments.
5062 Both registers and addresses are 64-bits wide.
5063 It will be possible to run 64-bit or 32-bit guest code.
668fffa3 5064
c24a7be2 50658.9 KVM_CAP_ARM_USER_IRQ
3fe17e68
AG
5066
5067Architectures: arm, arm64
5068This capability, if KVM_CHECK_EXTENSION indicates that it is available, means
5069that if userspace creates a VM without an in-kernel interrupt controller, it
5070will be notified of changes to the output level of in-kernel emulated devices,
5071which can generate virtual interrupts, presented to the VM.
5072For such VMs, on every return to userspace, the kernel
5073updates the vcpu's run->s.regs.device_irq_level field to represent the actual
5074output level of the device.
5075
5076Whenever kvm detects a change in the device output level, kvm guarantees at
5077least one return to userspace before running the VM. This exit could either
5078be a KVM_EXIT_INTR or any other exit event, like KVM_EXIT_MMIO. This way,
5079userspace can always sample the device output level and re-compute the state of
5080the userspace interrupt controller. Userspace should always check the state
5081of run->s.regs.device_irq_level on every kvm exit.
5082The value in run->s.regs.device_irq_level can represent both level and edge
5083triggered interrupt signals, depending on the device. Edge triggered interrupt
5084signals will exit to userspace with the bit in run->s.regs.device_irq_level
5085set exactly once per edge signal.
5086
5087The field run->s.regs.device_irq_level is available independent of
5088run->kvm_valid_regs or run->kvm_dirty_regs bits.
5089
5090If KVM_CAP_ARM_USER_IRQ is supported, the KVM_CHECK_EXTENSION ioctl returns a
5091number larger than 0 indicating the version of this capability is implemented
5092and thereby which bits in in run->s.regs.device_irq_level can signal values.
5093
5094Currently the following bits are defined for the device_irq_level bitmap:
5095
5096 KVM_CAP_ARM_USER_IRQ >= 1:
5097
5098 KVM_ARM_DEV_EL1_VTIMER - EL1 virtual timer
5099 KVM_ARM_DEV_EL1_PTIMER - EL1 physical timer
5100 KVM_ARM_DEV_PMU - ARM PMU overflow interrupt signal
5101
5102Future versions of kvm may implement additional events. These will get
5103indicated by returning a higher number from KVM_CHECK_EXTENSION and will be
5104listed above.
2ed4f9dd
PM
5105
51068.10 KVM_CAP_PPC_SMT_POSSIBLE
5107
5108Architectures: ppc
5109
5110Querying this capability returns a bitmap indicating the possible
5111virtual SMT modes that can be set using KVM_CAP_PPC_SMT. If bit N
5112(counting from the right) is set, then a virtual SMT mode of 2^N is
5113available.
efc479e6
RK
5114
51158.11 KVM_CAP_HYPERV_SYNIC2
5116
5117Architectures: x86
5118
5119This capability enables a newer version of Hyper-V Synthetic interrupt
5120controller (SynIC). The only difference with KVM_CAP_HYPERV_SYNIC is that KVM
5121doesn't clear SynIC message and event flags pages when they are enabled by
5122writing to the respective MSRs.
d3457c87
RK
5123
51248.12 KVM_CAP_HYPERV_VP_INDEX
5125
5126Architectures: x86
5127
5128This capability indicates that userspace can load HV_X64_MSR_VP_INDEX msr. Its
5129value is used to denote the target vcpu for a SynIC interrupt. For
5130compatibilty, KVM initializes this msr to KVM's internal vcpu index. When this
5131capability is absent, userspace can still query this msr's value.
da9a1446
CB
5132
51338.13 KVM_CAP_S390_AIS_MIGRATION
5134
5135Architectures: s390
5136Parameters: none
5137
5138This capability indicates if the flic device will be able to get/set the
5139AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows
5140to discover this without having to create a flic device.
5c2b4d5b
CB
5141
51428.14 KVM_CAP_S390_PSW
5143
5144Architectures: s390
5145
5146This capability indicates that the PSW is exposed via the kvm_run structure.
5147
51488.15 KVM_CAP_S390_GMAP
5149
5150Architectures: s390
5151
5152This capability indicates that the user space memory used as guest mapping can
5153be anywhere in the user memory address space, as long as the memory slots are
5154aligned and sized to a segment (1MB) boundary.
5155
51568.16 KVM_CAP_S390_COW
5157
5158Architectures: s390
5159
5160This capability indicates that the user space memory used as guest mapping can
5161use copy-on-write semantics as well as dirty pages tracking via read-only page
5162tables.
5163
51648.17 KVM_CAP_S390_BPB
5165
5166Architectures: s390
5167
5168This capability indicates that kvm will implement the interfaces to handle
5169reset, migration and nested KVM for branch prediction blocking. The stfle
5170facility 82 should not be provided to the guest without this capability.
c1aea919 5171
2ddc6498 51728.18 KVM_CAP_HYPERV_TLBFLUSH
c1aea919
VK
5173
5174Architectures: x86
5175
5176This capability indicates that KVM supports paravirtualized Hyper-V TLB Flush
5177hypercalls:
5178HvFlushVirtualAddressSpace, HvFlushVirtualAddressSpaceEx,
5179HvFlushVirtualAddressList, HvFlushVirtualAddressListEx.
be26b3a7 5180
688e0581 51818.19 KVM_CAP_ARM_INJECT_SERROR_ESR
be26b3a7
DG
5182
5183Architectures: arm, arm64
5184
5185This capability indicates that userspace can specify (via the
5186KVM_SET_VCPU_EVENTS ioctl) the syndrome value reported to the guest when it
5187takes a virtual SError interrupt exception.
5188If KVM advertises this capability, userspace can only specify the ISS field for
5189the ESR syndrome. Other parts of the ESR, such as the EC are generated by the
5190CPU when the exception is taken. If this virtual SError is taken to EL1 using
5191AArch64, this value will be reported in the ISS field of ESR_ELx.
5192
5193See KVM_CAP_VCPU_EVENTS for more details.
214ff83d
VK
51948.20 KVM_CAP_HYPERV_SEND_IPI
5195
5196Architectures: x86
5197
5198This capability indicates that KVM supports paravirtualized Hyper-V IPI send
5199hypercalls:
5200HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx.