X-Git-Url: https://git.proxmox.com/?p=mirror_qemu.git;a=blobdiff_plain;f=qemu-tech.texi;h=3451cfaa5be412b3d0ffd840a44dd52390178a76;hp=659cd203ece5bb2c171fa1edc2cb2a7f64b7531b;hb=c4107e8208d0222f9b328691b519aaee4101db87;hpb=ad6a4837f88756fbc11d779fb81d96c3e35280bf diff --git a/qemu-tech.texi b/qemu-tech.texi index 659cd203ec..3451cfaa5b 100644 --- a/qemu-tech.texi +++ b/qemu-tech.texi @@ -1,116 +1,63 @@ -\input texinfo @c -*- texinfo -*- - -@iftex -@settitle QEMU Internals -@titlepage -@sp 7 -@center @titlefont{QEMU Internals} -@sp 3 -@end titlepage -@end iftex - -@chapter Introduction - -@section Features - -QEMU is a FAST! processor emulator using a portable dynamic -translator. - -QEMU has two operating modes: - -@itemize @minus - -@item -Full system emulation. In this mode, QEMU emulates a full system -(usually a PC), including a processor and various peripherials. It can -be used to launch an different Operating System without rebooting the -PC or to debug system code. - -@item -User mode emulation (Linux host only). In this mode, QEMU can launch -Linux processes compiled for one CPU on another CPU. It can be used to -launch the Wine Windows API emulator (@url{http://www.winehq.org}) or -to ease cross-compilation and cross-debugging. - -@end itemize - -As QEMU requires no host kernel driver to run, it is very safe and -easy to use. - -QEMU generic features: - -@itemize - -@item User space only or full system emulation. - -@item Using dynamic translation to native code for reasonnable speed. - -@item Working on x86 and PowerPC hosts. Being tested on ARM, Sparc32, Alpha and S390. - -@item Self-modifying code support. - -@item Precise exceptions support. - -@item The virtual CPU is a library (@code{libqemu}) which can be used -in other projects (look at @file{qemu/tests/qruncom.c} to have an -example of user mode @code{libqemu} usage). - -@end itemize - -QEMU user mode emulation features: -@itemize -@item Generic Linux system call converter, including most ioctls. - -@item clone() emulation using native CPU clone() to use Linux scheduler for threads. - -@item Accurate signal handling by remapping host signals to target signals. -@end itemize -@end itemize - -QEMU full system emulation features: -@itemize -@item QEMU can either use a full software MMU for maximum portability or use the host system call mmap() to simulate the target MMU. -@end itemize - -@section x86 emulation +@node Implementation notes +@appendix Implementation notes + +@menu +* CPU emulation:: +* Translator Internals:: +* QEMU compared to other emulators:: +* Managed start up options:: +* Bibliography:: +@end menu + +@node CPU emulation +@section CPU emulation + +@menu +* x86:: x86 and x86-64 emulation +* ARM:: ARM emulation +* MIPS:: MIPS emulation +* PPC:: PowerPC emulation +* SPARC:: Sparc32 and Sparc64 emulation +* Xtensa:: Xtensa emulation +@end menu + +@node x86 +@subsection x86 and x86-64 emulation QEMU x86 target features: -@itemize +@itemize -@item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation. -LDT/GDT and IDT are emulated. VM86 mode is also supported to run DOSEMU. +@item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation. +LDT/GDT and IDT are emulated. VM86 mode is also supported to run +DOSEMU. There is some support for MMX/3DNow!, SSE, SSE2, SSE3, SSSE3, +and SSE4 as well as x86-64 SVM. @item Support of host page sizes bigger than 4KB in user mode emulation. @item QEMU can emulate itself on x86. -@item An extensive Linux x86 CPU test program is included @file{tests/test-i386}. +@item An extensive Linux x86 CPU test program is included @file{tests/test-i386}. It can be used to test other x86 virtual CPUs. @end itemize Current QEMU limitations: -@itemize - -@item No SSE/MMX support (yet). +@itemize -@item No x86-64 support. +@item Limited x86-64 support. @item IPC syscalls are missing. -@item The x86 segment limits and access rights are not tested at every +@item The x86 segment limits and access rights are not tested at every memory access (yet). Hopefully, very few OSes seem to rely on that for normal use. -@item On non x86 host CPUs, @code{double}s are used instead of the non standard -10 byte @code{long double}s of x86 for floating point emulation to get -maximum performances. - @end itemize -@section ARM emulation +@node ARM +@subsection ARM emulation @itemize @@ -122,386 +69,130 @@ maximum performances. @end itemize -@section PowerPC emulation +@node MIPS +@subsection MIPS emulation @itemize -@item Full PowerPC 32 bit emulation, including priviledged instructions, -FPU and MMU. +@item The system emulation allows full MIPS32/MIPS64 Release 2 emulation, +including privileged instructions, FPU and MMU, in both little and big +endian modes. -@item Can run most PowerPC Linux binaries. +@item The Linux userland emulation can run many 32 bit MIPS Linux binaries. @end itemize -@section SPARC emulation +Current QEMU limitations: @itemize -@item SPARC V8 user support, except FPU instructions. - -@item Can run some SPARC Linux binaries. +@item Self-modifying code is not always handled correctly. -@end itemize - -@chapter QEMU Internals - -@section QEMU compared to other emulators - -Like bochs [3], QEMU emulates an x86 CPU. But QEMU is much faster than -bochs as it uses dynamic compilation. Bochs is closely tied to x86 PC -emulation while QEMU can emulate several processors. - -Like Valgrind [2], QEMU does user space emulation and dynamic -translation. Valgrind is mainly a memory debugger while QEMU has no -support for it (QEMU could be used to detect out of bound memory -accesses as Valgrind, but it has no support to track uninitialised data -as Valgrind does). The Valgrind dynamic translator generates better code -than QEMU (in particular it does register allocation) but it is closely -tied to an x86 host and target and has no support for precise exceptions -and system emulation. - -EM86 [4] is the closest project to user space QEMU (and QEMU still uses -some of its code, in particular the ELF file loader). EM86 was limited -to an alpha host and used a proprietary and slow interpreter (the -interpreter part of the FX!32 Digital Win32 code translator [5]). - -TWIN [6] is a Windows API emulator like Wine. It is less accurate than -Wine but includes a protected mode x86 interpreter to launch x86 Windows -executables. Such an approach as greater potential because most of the -Windows API is executed natively but it is far more difficult to develop -because all the data structures and function parameters exchanged -between the API and the x86 code must be converted. - -User mode Linux [7] was the only solution before QEMU to launch a -Linux kernel as a process while not needing any host kernel -patches. However, user mode Linux requires heavy kernel patches while -QEMU accepts unpatched Linux kernels. The price to pay is that QEMU is -slower. - -The new Plex86 [8] PC virtualizer is done in the same spirit as the -qemu-fast system emulator. It requires a patched Linux kernel to work -(you cannot launch the same kernel on your PC), but the patches are -really small. As it is a PC virtualizer (no emulation is done except -for some priveledged instructions), it has the potential of being -faster than QEMU. The downside is that a complicated (and potentially -unsafe) host kernel patch is needed. - -The commercial PC Virtualizers (VMWare [9], VirtualPC [10], TwoOStwo -[11]) are faster than QEMU, but they all need specific, proprietary -and potentially unsafe host drivers. Moreover, they are unable to -provide cycle exact simulation as an emulator can. - -@section Portable dynamic translation - -QEMU is a dynamic translator. When it first encounters a piece of code, -it converts it to the host instruction set. Usually dynamic translators -are very complicated and highly CPU dependent. QEMU uses some tricks -which make it relatively easily portable and simple while achieving good -performances. - -The basic idea is to split every x86 instruction into fewer simpler -instructions. Each simple instruction is implemented by a piece of C -code (see @file{target-i386/op.c}). Then a compile time tool -(@file{dyngen}) takes the corresponding object file (@file{op.o}) -to generate a dynamic code generator which concatenates the simple -instructions to build a function (see @file{op.h:dyngen_code()}). - -In essence, the process is similar to [1], but more work is done at -compile time. - -A key idea to get optimal performances is that constant parameters can -be passed to the simple operations. For that purpose, dummy ELF -relocations are generated with gcc for each constant parameter. Then, -the tool (@file{dyngen}) can locate the relocations and generate the -appriopriate C code to resolve them when building the dynamic code. - -That way, QEMU is no more difficult to port than a dynamic linker. - -To go even faster, GCC static register variables are used to keep the -state of the virtual CPU. - -@section Register allocation - -Since QEMU uses fixed simple instructions, no efficient register -allocation can be done. However, because RISC CPUs have a lot of -register, most of the virtual CPU state can be put in registers without -doing complicated register allocation. - -@section Condition code optimisations - -Good CPU condition codes emulation (@code{EFLAGS} register on x86) is a -critical point to get good performances. QEMU uses lazy condition code -evaluation: instead of computing the condition codes after each x86 -instruction, it just stores one operand (called @code{CC_SRC}), the -result (called @code{CC_DST}) and the type of operation (called -@code{CC_OP}). - -@code{CC_OP} is almost never explicitely set in the generated code -because it is known at translation time. - -In order to increase performances, a backward pass is performed on the -generated simple instructions (see -@code{target-i386/translate.c:optimize_flags()}). When it can be proved that -the condition codes are not needed by the next instructions, no -condition codes are computed at all. - -@section CPU state optimisations - -The x86 CPU has many internal states which change the way it evaluates -instructions. In order to achieve a good speed, the translation phase -considers that some state information of the virtual x86 CPU cannot -change in it. For example, if the SS, DS and ES segments have a zero -base, then the translator does not even generate an addition for the -segment base. - -[The FPU stack pointer register is not handled that way yet]. - -@section Translation cache - -A 2MByte cache holds the most recently used translations. For -simplicity, it is completely flushed when it is full. A translation unit -contains just a single basic block (a block of x86 instructions -terminated by a jump or by a virtual CPU state change which the -translator cannot deduce statically). - -@section Direct block chaining - -After each translated basic block is executed, QEMU uses the simulated -Program Counter (PC) and other cpu state informations (such as the CS -segment base value) to find the next basic block. - -In order to accelerate the most common cases where the new simulated PC -is known, QEMU can patch a basic block so that it jumps directly to the -next one. - -The most portable code uses an indirect jump. An indirect jump makes -it easier to make the jump target modification atomic. On some host -architectures (such as x86 or PowerPC), the @code{JUMP} opcode is -directly patched so that the block chaining has no overhead. - -@section Self-modifying code and translated code invalidation - -Self-modifying code is a special challenge in x86 emulation because no -instruction cache invalidation is signaled by the application when code -is modified. - -When translated code is generated for a basic block, the corresponding -host page is write protected if it is not already read-only (with the -system call @code{mprotect()}). Then, if a write access is done to the -page, Linux raises a SEGV signal. QEMU then invalidates all the -translated code in the page and enables write accesses to the page. - -Correct translated code invalidation is done efficiently by maintaining -a linked list of every translated block contained in a given page. Other -linked lists are also maintained to undo direct block chaining. - -Although the overhead of doing @code{mprotect()} calls is important, -most MSDOS programs can be emulated at reasonnable speed with QEMU and -DOSEMU. +@item 64 bit userland emulation is not implemented. -Note that QEMU also invalidates pages of translated code when it detects -that memory mappings are modified with @code{mmap()} or @code{munmap()}. +@item The system emulation is not complete enough to run real firmware. -When using a software MMU, the code invalidation is more efficient: if -a given code page is invalidated too often because of write accesses, -then a bitmap representing all the code inside the page is -built. Every store into that page checks the bitmap to see if the code -really needs to be invalidated. It avoids invalidating the code when -only data is modified in the page. +@item The watchpoint debug facility is not implemented. -@section Exception support - -longjmp() is used when an exception such as division by zero is -encountered. +@end itemize -The host SIGSEGV and SIGBUS signal handlers are used to get invalid -memory accesses. The exact CPU state can be retrieved because all the -x86 registers are stored in fixed host registers. The simulated program -counter is found by retranslating the corresponding basic block and by -looking where the host program counter was at the exception point. +@node PPC +@subsection PowerPC emulation -The virtual CPU cannot retrieve the exact @code{EFLAGS} register because -in some cases it is not computed because of condition code -optimisations. It is not a big concern because the emulated code can -still be restarted in any cases. +@itemize -@section MMU emulation +@item Full PowerPC 32 bit emulation, including privileged instructions, +FPU and MMU. -For system emulation, QEMU uses the mmap() system call to emulate the -target CPU MMU. It works as long the emulated OS does not use an area -reserved by the host OS (such as the area above 0xc0000000 on x86 -Linux). +@item Can run most PowerPC Linux binaries. -In order to be able to launch any OS, QEMU also supports a soft -MMU. In that mode, the MMU virtual to physical address translation is -done at every memory access. QEMU uses an address translation cache to -speed up the translation. +@end itemize -In order to avoid flushing the translated code each time the MMU -mappings change, QEMU uses a physically indexed translation cache. It -means that each basic block is indexed with its physical address. +@node SPARC +@subsection Sparc32 and Sparc64 emulation -When MMU mappings change, only the chaining of the basic blocks is -reset (i.e. a basic block can no longer jump directly to another one). +@itemize -@section Hardware interrupts +@item Full SPARC V8 emulation, including privileged +instructions, FPU and MMU. SPARC V9 emulation includes most privileged +and VIS instructions, FPU and I/D MMU. Alignment is fully enforced. -In order to be faster, QEMU does not check at every basic block if an -hardware interrupt is pending. Instead, the user must asynchrously -call a specific function to tell that an interrupt is pending. This -function resets the chaining of the currently executing basic -block. It ensures that the execution will return soon in the main loop -of the CPU emulator. Then the main loop can test if the interrupt is -pending and handle it. +@item Can run most 32-bit SPARC Linux binaries, SPARC32PLUS Linux binaries and +some 64-bit SPARC Linux binaries. -@section User emulation specific details +@end itemize -@subsection Linux system call translation +Current QEMU limitations: -QEMU includes a generic system call translator for Linux. It means that -the parameters of the system calls can be converted to fix the -endianness and 32/64 bit issues. The IOCTLs are converted with a generic -type description system (see @file{ioctls.h} and @file{thunk.c}). +@itemize -QEMU supports host CPUs which have pages bigger than 4KB. It records all -the mappings the process does and try to emulated the @code{mmap()} -system calls in cases where the host @code{mmap()} call would fail -because of bad page alignment. +@item IPC syscalls are missing. -@subsection Linux signals +@item Floating point exception support is buggy. -Normal and real-time signals are queued along with their information -(@code{siginfo_t}) as it is done in the Linux kernel. Then an interrupt -request is done to the virtual CPU. When it is interrupted, one queued -signal is handled by generating a stack frame in the virtual CPU as the -Linux kernel does. The @code{sigreturn()} system call is emulated to return -from the virtual signal handler. +@item Atomic instructions are not correctly implemented. -Some signals (such as SIGALRM) directly come from the host. Other -signals are synthetized from the virtual CPU exceptions such as SIGFPE -when a division by zero is done (see @code{main.c:cpu_loop()}). +@item There are still some problems with Sparc64 emulators. -The blocked signal mask is still handled by the host Linux kernel so -that most signal system calls can be redirected directly to the host -Linux kernel. Only the @code{sigaction()} and @code{sigreturn()} system -calls need to be fully emulated (see @file{signal.c}). +@end itemize -@subsection clone() system call and threads +@node Xtensa +@subsection Xtensa emulation -The Linux clone() system call is usually used to create a thread. QEMU -uses the host clone() system call so that real host threads are created -for each emulated thread. One virtual CPU instance is created for each -thread. +@itemize -The virtual x86 CPU atomic operations are emulated with a global lock so -that their semantic is preserved. +@item Core Xtensa ISA emulation, including most options: code density, +loop, extended L32R, 16- and 32-bit multiplication, 32-bit division, +MAC16, miscellaneous operations, boolean, FP coprocessor, coprocessor +context, debug, multiprocessor synchronization, +conditional store, exceptions, relocatable vectors, unaligned exception, +interrupts (including high priority and timer), hardware alignment, +region protection, region translation, MMU, windowed registers, thread +pointer, processor ID. -Note that currently there are still some locking issues in QEMU. In -particular, the translated cache flush is not protected yet against -reentrancy. +@item Not implemented options: data/instruction cache (including cache +prefetch and locking), XLMI, processor interface. Also options not +covered by the core ISA (e.g. FLIX, wide branches) are not implemented. -@subsection Self-virtualization +@item Can run most Xtensa Linux binaries. -QEMU was conceived so that ultimately it can emulate itself. Although -it is not very useful, it is an important test to show the power of the -emulator. +@item New core configuration that requires no additional instructions +may be created from overlay with minimal amount of hand-written code. -Achieving self-virtualization is not easy because there may be address -space conflicts. QEMU solves this problem by being an executable ELF -shared object as the ld-linux.so ELF interpreter. That way, it can be -relocated at load time. +@end itemize -@section Bibliography +@node Managed start up options +@section Managed start up options +In system mode emulation, it's possible to create a VM in a paused state using +the -S command line option. In this state the machine is completely initialized +according to command line options and ready to execute VM code but VCPU threads +are not executing any code. The VM state in this paused state depends on the way +QEMU was started. It could be in: @table @asis - -@item [1] -@url{http://citeseer.nj.nec.com/piumarta98optimizing.html}, Optimizing -direct threaded code by selective inlining (1998) by Ian Piumarta, Fabio -Riccardi. - -@item [2] -@url{http://developer.kde.org/~sewardj/}, Valgrind, an open-source -memory debugger for x86-GNU/Linux, by Julian Seward. - -@item [3] -@url{http://bochs.sourceforge.net/}, the Bochs IA-32 Emulator Project, -by Kevin Lawton et al. - -@item [4] -@url{http://www.cs.rose-hulman.edu/~donaldlf/em86/index.html}, the EM86 -x86 emulator on Alpha-Linux. - -@item [5] -@url{http://www.usenix.org/publications/library/proceedings/usenix-nt97/full_papers/chernoff/chernoff.pdf}, -DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT, by Anton -Chernoff and Ray Hookway. - -@item [6] -@url{http://www.willows.com/}, Windows API library emulation from -Willows Software. - -@item [7] -@url{http://user-mode-linux.sourceforge.net/}, -The User-mode Linux Kernel. - -@item [8] -@url{http://www.plex86.org/}, -The new Plex86 project. - -@item [9] -@url{http://www.vmware.com/}, -The VMWare PC virtualizer. - -@item [10] -@url{http://www.microsoft.com/windowsxp/virtualpc/}, -The VirtualPC PC virtualizer. - -@item [11] -@url{http://www.twoostwo.org/}, -The TwoOStwo PC virtualizer. - +@item initial state (after reset/power on state) +@item with direct kernel loading, the initial state could be amended to execute +code loaded by QEMU in the VM's RAM and with incoming migration +@item with incoming migration, initial state will by amended with the migrated +machine state after migration completes. @end table -@chapter Regression Tests - -In the directory @file{tests/}, various interesting testing programs -are available. There are used for regression testing. - -@section @file{test-i386} - -This program executes most of the 16 bit and 32 bit x86 instructions and -generates a text output. It can be compared with the output obtained with -a real CPU or another emulator. The target @code{make test} runs this -program and a @code{diff} on the generated output. - -The Linux system call @code{modify_ldt()} is used to create x86 selectors -to test some 16 bit addressing and 32 bit with segmentation cases. - -The Linux system call @code{vm86()} is used to test vm86 emulation. - -Various exceptions are raised to test most of the x86 user space -exception reporting. - -@section @file{linux-test} - -This program tests various Linux system calls. It is used to verify -that the system call parameters are correctly converted between target -and host CPUs. - -@section @file{hello-i386} - -Very simple statically linked x86 program, just to test QEMU during a -port to a new host CPU. - -@section @file{hello-arm} - -Very simple statically linked ARM program, just to test QEMU during a -port to a new host CPU. - -@section @file{sha1} - -It is a simple benchmark. Care must be taken to interpret the results -because it mostly tests the ability of the virtual CPU to optimize the -@code{rol} x86 instruction and the condition code computations. - +This paused state is typically used by users to query machine state and/or +additionally configure the machine (by hotplugging devices) in runtime before +allowing VM code to run. + +However, at the -S pause point, it's impossible to configure options that affect +initial VM creation (like: -smp/-m/-numa ...) or cold plug devices. The +experimental --preconfig command line option allows pausing QEMU +before the initial VM creation, in a ``preconfig'' state, where additional +queries and configuration can be performed via QMP before moving on to +the resulting configuration startup. In the preconfig state, QEMU only allows +a limited set of commands over the QMP monitor, where the commands do not +depend on an initialized machine, including but not limited to: +@table @asis +@item qmp_capabilities +@item query-qmp-schema +@item query-commands +@item query-status +@item x-exit-preconfig +@end table