]> git.proxmox.com Git - mirror_qemu.git/blame - qemu-tech.texi
doc update
[mirror_qemu.git] / qemu-tech.texi
CommitLineData
1f673135
FB
1\input texinfo @c -*- texinfo -*-
2
3@iftex
4@settitle QEMU Internals
5@titlepage
6@sp 7
7@center @titlefont{QEMU Internals}
8@sp 3
9@end titlepage
10@end iftex
11
12@chapter Introduction
13
14@section Features
15
16QEMU is a FAST! processor emulator using a portable dynamic
17translator.
18
19QEMU has two operating modes:
20
21@itemize @minus
22
23@item
24Full system emulation. In this mode, QEMU emulates a full system
25(usually a PC), including a processor and various peripherials. It can
26be used to launch an different Operating System without rebooting the
27PC or to debug system code.
28
29@item
30User mode emulation (Linux host only). In this mode, QEMU can launch
31Linux processes compiled for one CPU on another CPU. It can be used to
32launch the Wine Windows API emulator (@url{http://www.winehq.org}) or
33to ease cross-compilation and cross-debugging.
34
35@end itemize
36
37As QEMU requires no host kernel driver to run, it is very safe and
38easy to use.
39
40QEMU generic features:
41
42@itemize
43
44@item User space only or full system emulation.
45
46@item Using dynamic translation to native code for reasonnable speed.
47
48@item Working on x86 and PowerPC hosts. Being tested on ARM, Sparc32, Alpha and S390.
49
50@item Self-modifying code support.
51
52@item Precise exceptions support.
53
54@item The virtual CPU is a library (@code{libqemu}) which can be used
55in other projects.
56
57@end itemize
58
59QEMU user mode emulation features:
60@itemize
61@item Generic Linux system call converter, including most ioctls.
62
63@item clone() emulation using native CPU clone() to use Linux scheduler for threads.
64
65@item Accurate signal handling by remapping host signals to target signals.
66@end itemize
67@end itemize
68
69QEMU full system emulation features:
70@itemize
71@item QEMU can either use a full software MMU for maximum portability or use the host system call mmap() to simulate the target MMU.
72@end itemize
73
74@section x86 emulation
75
76QEMU x86 target features:
77
78@itemize
79
80@item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation.
81LDT/GDT and IDT are emulated. VM86 mode is also supported to run DOSEMU.
82
83@item Support of host page sizes bigger than 4KB in user mode emulation.
84
85@item QEMU can emulate itself on x86.
86
87@item An extensive Linux x86 CPU test program is included @file{tests/test-i386}.
88It can be used to test other x86 virtual CPUs.
89
90@end itemize
91
92Current QEMU limitations:
93
94@itemize
95
96@item No SSE/MMX support (yet).
97
98@item No x86-64 support.
99
100@item IPC syscalls are missing.
101
102@item The x86 segment limits and access rights are not tested at every
103memory access (yet). Hopefully, very few OSes seem to rely on that for
104normal use.
105
106@item On non x86 host CPUs, @code{double}s are used instead of the non standard
10710 byte @code{long double}s of x86 for floating point emulation to get
108maximum performances.
109
110@end itemize
111
112@section ARM emulation
113
114@itemize
115
116@item Full ARM 7 user emulation.
117
118@item NWFPE FPU support included in user Linux emulation.
119
120@item Can run most ARM Linux binaries.
121
122@end itemize
123
124@section PowerPC emulation
125
126@itemize
127
128@item Full PowerPC 32 bit emulation, including priviledged instructions,
129FPU and MMU.
130
131@item Can run most PowerPC Linux binaries.
132
133@end itemize
134
135@section SPARC emulation
136
137@itemize
138
139@item SPARC V8 user support, except FPU instructions.
140
141@item Can run some SPARC Linux binaries.
142
143@end itemize
144
145@chapter QEMU Internals
146
147@section QEMU compared to other emulators
148
149Like bochs [3], QEMU emulates an x86 CPU. But QEMU is much faster than
150bochs as it uses dynamic compilation. Bochs is closely tied to x86 PC
151emulation while QEMU can emulate several processors.
152
153Like Valgrind [2], QEMU does user space emulation and dynamic
154translation. Valgrind is mainly a memory debugger while QEMU has no
155support for it (QEMU could be used to detect out of bound memory
156accesses as Valgrind, but it has no support to track uninitialised data
157as Valgrind does). The Valgrind dynamic translator generates better code
158than QEMU (in particular it does register allocation) but it is closely
159tied to an x86 host and target and has no support for precise exceptions
160and system emulation.
161
162EM86 [4] is the closest project to user space QEMU (and QEMU still uses
163some of its code, in particular the ELF file loader). EM86 was limited
164to an alpha host and used a proprietary and slow interpreter (the
165interpreter part of the FX!32 Digital Win32 code translator [5]).
166
167TWIN [6] is a Windows API emulator like Wine. It is less accurate than
168Wine but includes a protected mode x86 interpreter to launch x86 Windows
169executables. Such an approach as greater potential because most of the
170Windows API is executed natively but it is far more difficult to develop
171because all the data structures and function parameters exchanged
172between the API and the x86 code must be converted.
173
174User mode Linux [7] was the only solution before QEMU to launch a
175Linux kernel as a process while not needing any host kernel
176patches. However, user mode Linux requires heavy kernel patches while
177QEMU accepts unpatched Linux kernels. The price to pay is that QEMU is
178slower.
179
180The new Plex86 [8] PC virtualizer is done in the same spirit as the
181qemu-fast system emulator. It requires a patched Linux kernel to work
182(you cannot launch the same kernel on your PC), but the patches are
183really small. As it is a PC virtualizer (no emulation is done except
184for some priveledged instructions), it has the potential of being
185faster than QEMU. The downside is that a complicated (and potentially
186unsafe) host kernel patch is needed.
187
188The commercial PC Virtualizers (VMWare [9], VirtualPC [10], TwoOStwo
189[11]) are faster than QEMU, but they all need specific, proprietary
190and potentially unsafe host drivers. Moreover, they are unable to
191provide cycle exact simulation as an emulator can.
192
193@section Portable dynamic translation
194
195QEMU is a dynamic translator. When it first encounters a piece of code,
196it converts it to the host instruction set. Usually dynamic translators
197are very complicated and highly CPU dependent. QEMU uses some tricks
198which make it relatively easily portable and simple while achieving good
199performances.
200
201The basic idea is to split every x86 instruction into fewer simpler
202instructions. Each simple instruction is implemented by a piece of C
203code (see @file{target-i386/op.c}). Then a compile time tool
204(@file{dyngen}) takes the corresponding object file (@file{op.o})
205to generate a dynamic code generator which concatenates the simple
206instructions to build a function (see @file{op.h:dyngen_code()}).
207
208In essence, the process is similar to [1], but more work is done at
209compile time.
210
211A key idea to get optimal performances is that constant parameters can
212be passed to the simple operations. For that purpose, dummy ELF
213relocations are generated with gcc for each constant parameter. Then,
214the tool (@file{dyngen}) can locate the relocations and generate the
215appriopriate C code to resolve them when building the dynamic code.
216
217That way, QEMU is no more difficult to port than a dynamic linker.
218
219To go even faster, GCC static register variables are used to keep the
220state of the virtual CPU.
221
222@section Register allocation
223
224Since QEMU uses fixed simple instructions, no efficient register
225allocation can be done. However, because RISC CPUs have a lot of
226register, most of the virtual CPU state can be put in registers without
227doing complicated register allocation.
228
229@section Condition code optimisations
230
231Good CPU condition codes emulation (@code{EFLAGS} register on x86) is a
232critical point to get good performances. QEMU uses lazy condition code
233evaluation: instead of computing the condition codes after each x86
234instruction, it just stores one operand (called @code{CC_SRC}), the
235result (called @code{CC_DST}) and the type of operation (called
236@code{CC_OP}).
237
238@code{CC_OP} is almost never explicitely set in the generated code
239because it is known at translation time.
240
241In order to increase performances, a backward pass is performed on the
242generated simple instructions (see
243@code{target-i386/translate.c:optimize_flags()}). When it can be proved that
244the condition codes are not needed by the next instructions, no
245condition codes are computed at all.
246
247@section CPU state optimisations
248
249The x86 CPU has many internal states which change the way it evaluates
250instructions. In order to achieve a good speed, the translation phase
251considers that some state information of the virtual x86 CPU cannot
252change in it. For example, if the SS, DS and ES segments have a zero
253base, then the translator does not even generate an addition for the
254segment base.
255
256[The FPU stack pointer register is not handled that way yet].
257
258@section Translation cache
259
260A 2MByte cache holds the most recently used translations. For
261simplicity, it is completely flushed when it is full. A translation unit
262contains just a single basic block (a block of x86 instructions
263terminated by a jump or by a virtual CPU state change which the
264translator cannot deduce statically).
265
266@section Direct block chaining
267
268After each translated basic block is executed, QEMU uses the simulated
269Program Counter (PC) and other cpu state informations (such as the CS
270segment base value) to find the next basic block.
271
272In order to accelerate the most common cases where the new simulated PC
273is known, QEMU can patch a basic block so that it jumps directly to the
274next one.
275
276The most portable code uses an indirect jump. An indirect jump makes
277it easier to make the jump target modification atomic. On some host
278architectures (such as x86 or PowerPC), the @code{JUMP} opcode is
279directly patched so that the block chaining has no overhead.
280
281@section Self-modifying code and translated code invalidation
282
283Self-modifying code is a special challenge in x86 emulation because no
284instruction cache invalidation is signaled by the application when code
285is modified.
286
287When translated code is generated for a basic block, the corresponding
288host page is write protected if it is not already read-only (with the
289system call @code{mprotect()}). Then, if a write access is done to the
290page, Linux raises a SEGV signal. QEMU then invalidates all the
291translated code in the page and enables write accesses to the page.
292
293Correct translated code invalidation is done efficiently by maintaining
294a linked list of every translated block contained in a given page. Other
295linked lists are also maintained to undo direct block chaining.
296
297Although the overhead of doing @code{mprotect()} calls is important,
298most MSDOS programs can be emulated at reasonnable speed with QEMU and
299DOSEMU.
300
301Note that QEMU also invalidates pages of translated code when it detects
302that memory mappings are modified with @code{mmap()} or @code{munmap()}.
303
304When using a software MMU, the code invalidation is more efficient: if
305a given code page is invalidated too often because of write accesses,
306then a bitmap representing all the code inside the page is
307built. Every store into that page checks the bitmap to see if the code
308really needs to be invalidated. It avoids invalidating the code when
309only data is modified in the page.
310
311@section Exception support
312
313longjmp() is used when an exception such as division by zero is
314encountered.
315
316The host SIGSEGV and SIGBUS signal handlers are used to get invalid
317memory accesses. The exact CPU state can be retrieved because all the
318x86 registers are stored in fixed host registers. The simulated program
319counter is found by retranslating the corresponding basic block and by
320looking where the host program counter was at the exception point.
321
322The virtual CPU cannot retrieve the exact @code{EFLAGS} register because
323in some cases it is not computed because of condition code
324optimisations. It is not a big concern because the emulated code can
325still be restarted in any cases.
326
327@section MMU emulation
328
329For system emulation, QEMU uses the mmap() system call to emulate the
330target CPU MMU. It works as long the emulated OS does not use an area
331reserved by the host OS (such as the area above 0xc0000000 on x86
332Linux).
333
334In order to be able to launch any OS, QEMU also supports a soft
335MMU. In that mode, the MMU virtual to physical address translation is
336done at every memory access. QEMU uses an address translation cache to
337speed up the translation.
338
339In order to avoid flushing the translated code each time the MMU
340mappings change, QEMU uses a physically indexed translation cache. It
341means that each basic block is indexed with its physical address.
342
343When MMU mappings change, only the chaining of the basic blocks is
344reset (i.e. a basic block can no longer jump directly to another one).
345
346@section Hardware interrupts
347
348In order to be faster, QEMU does not check at every basic block if an
349hardware interrupt is pending. Instead, the user must asynchrously
350call a specific function to tell that an interrupt is pending. This
351function resets the chaining of the currently executing basic
352block. It ensures that the execution will return soon in the main loop
353of the CPU emulator. Then the main loop can test if the interrupt is
354pending and handle it.
355
356@section User emulation specific details
357
358@subsection Linux system call translation
359
360QEMU includes a generic system call translator for Linux. It means that
361the parameters of the system calls can be converted to fix the
362endianness and 32/64 bit issues. The IOCTLs are converted with a generic
363type description system (see @file{ioctls.h} and @file{thunk.c}).
364
365QEMU supports host CPUs which have pages bigger than 4KB. It records all
366the mappings the process does and try to emulated the @code{mmap()}
367system calls in cases where the host @code{mmap()} call would fail
368because of bad page alignment.
369
370@subsection Linux signals
371
372Normal and real-time signals are queued along with their information
373(@code{siginfo_t}) as it is done in the Linux kernel. Then an interrupt
374request is done to the virtual CPU. When it is interrupted, one queued
375signal is handled by generating a stack frame in the virtual CPU as the
376Linux kernel does. The @code{sigreturn()} system call is emulated to return
377from the virtual signal handler.
378
379Some signals (such as SIGALRM) directly come from the host. Other
380signals are synthetized from the virtual CPU exceptions such as SIGFPE
381when a division by zero is done (see @code{main.c:cpu_loop()}).
382
383The blocked signal mask is still handled by the host Linux kernel so
384that most signal system calls can be redirected directly to the host
385Linux kernel. Only the @code{sigaction()} and @code{sigreturn()} system
386calls need to be fully emulated (see @file{signal.c}).
387
388@subsection clone() system call and threads
389
390The Linux clone() system call is usually used to create a thread. QEMU
391uses the host clone() system call so that real host threads are created
392for each emulated thread. One virtual CPU instance is created for each
393thread.
394
395The virtual x86 CPU atomic operations are emulated with a global lock so
396that their semantic is preserved.
397
398Note that currently there are still some locking issues in QEMU. In
399particular, the translated cache flush is not protected yet against
400reentrancy.
401
402@subsection Self-virtualization
403
404QEMU was conceived so that ultimately it can emulate itself. Although
405it is not very useful, it is an important test to show the power of the
406emulator.
407
408Achieving self-virtualization is not easy because there may be address
409space conflicts. QEMU solves this problem by being an executable ELF
410shared object as the ld-linux.so ELF interpreter. That way, it can be
411relocated at load time.
412
413@section Bibliography
414
415@table @asis
416
417@item [1]
418@url{http://citeseer.nj.nec.com/piumarta98optimizing.html}, Optimizing
419direct threaded code by selective inlining (1998) by Ian Piumarta, Fabio
420Riccardi.
421
422@item [2]
423@url{http://developer.kde.org/~sewardj/}, Valgrind, an open-source
424memory debugger for x86-GNU/Linux, by Julian Seward.
425
426@item [3]
427@url{http://bochs.sourceforge.net/}, the Bochs IA-32 Emulator Project,
428by Kevin Lawton et al.
429
430@item [4]
431@url{http://www.cs.rose-hulman.edu/~donaldlf/em86/index.html}, the EM86
432x86 emulator on Alpha-Linux.
433
434@item [5]
435@url{http://www.usenix.org/publications/library/proceedings/usenix-nt97/full_papers/chernoff/chernoff.pdf},
436DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT, by Anton
437Chernoff and Ray Hookway.
438
439@item [6]
440@url{http://www.willows.com/}, Windows API library emulation from
441Willows Software.
442
443@item [7]
444@url{http://user-mode-linux.sourceforge.net/},
445The User-mode Linux Kernel.
446
447@item [8]
448@url{http://www.plex86.org/},
449The new Plex86 project.
450
451@item [9]
452@url{http://www.vmware.com/},
453The VMWare PC virtualizer.
454
455@item [10]
456@url{http://www.microsoft.com/windowsxp/virtualpc/},
457The VirtualPC PC virtualizer.
458
459@item [11]
460@url{http://www.twoostwo.org/},
461The TwoOStwo PC virtualizer.
462
463@end table
464
465@chapter Regression Tests
466
467In the directory @file{tests/}, various interesting testing programs
468are available. There are used for regression testing.
469
470@section @file{test-i386}
471
472This program executes most of the 16 bit and 32 bit x86 instructions and
473generates a text output. It can be compared with the output obtained with
474a real CPU or another emulator. The target @code{make test} runs this
475program and a @code{diff} on the generated output.
476
477The Linux system call @code{modify_ldt()} is used to create x86 selectors
478to test some 16 bit addressing and 32 bit with segmentation cases.
479
480The Linux system call @code{vm86()} is used to test vm86 emulation.
481
482Various exceptions are raised to test most of the x86 user space
483exception reporting.
484
485@section @file{linux-test}
486
487This program tests various Linux system calls. It is used to verify
488that the system call parameters are correctly converted between target
489and host CPUs.
490
491@section @file{hello-i386}
492
493Very simple statically linked x86 program, just to test QEMU during a
494port to a new host CPU.
495
496@section @file{hello-arm}
497
498Very simple statically linked ARM program, just to test QEMU during a
499port to a new host CPU.
500
501@section @file{sha1}
502
503It is a simple benchmark. Care must be taken to interpret the results
504because it mostly tests the ability of the virtual CPU to optimize the
505@code{rol} x86 instruction and the condition code computations.
506