[qemu.git] / qemu-doc.texi

\input texinfo @c -*- texinfo -*-

@settitle QEMU CPU Emulator Reference Documentation
@titlepage
@sp 7
@center @titlefont{QEMU CPU Emulator Reference Documentation}
@sp 3
@end titlepage

@chapter Introduction

@section Features

QEMU is a FAST! processor emulator. Its purpose is to run Linux executables
compiled for one architecture on another. For example, x86 Linux
processes can be ran on PowerPC Linux architectures. By using dynamic
translation it achieves a reasonnable speed while being easy to port on
new host CPUs. Its main goal is to be able to launch the @code{Wine}
Windows API emulator (@url{http://www.winehq.org}) or @code{DOSEMU}
(@url{http://www.dosemu.org}) on non-x86 CPUs.

QEMU generic features:

@itemize 

@item User space only emulation.

@item Working on x86 and PowerPC hosts. Being tested on ARM, Sparc32, Alpha and S390.

@item Using dynamic translation to native code for reasonnable speed.

@item Generic Linux system call converter, including most ioctls.

@item clone() emulation using native CPU clone() to use Linux scheduler for threads.

@item Accurate signal handling by remapping host signals to target signals. 

@item Self-modifying code support.

@item The virtual CPU is a library (@code{libqemu}) which can be used 
in other projects.

@end itemize

@section x86 emulation

QEMU x86 target features:

@itemize 

@item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation. 
User space LDT and GDT are emulated. VM86 mode is also supported to run DOSEMU.

@item Precise user space x86 exceptions.

@item Support of host page sizes bigger than 4KB.

@item QEMU can emulate itself on x86.

@item An extensive Linux x86 CPU test program is included @file{tests/test-i386}. 
It can be used to test other x86 virtual CPUs.

@end itemize

Current QEMU limitations:

@itemize 

@item No SSE/MMX support (yet).

@item No x86-64 support.

@item IPC syscalls are missing.

@item The x86 segment limits and access rights are not tested at every 
memory access (and will never be to have good performances).

@item On non x86 host CPUs, @code{double}s are used instead of the non standard 
10 byte @code{long double}s of x86 for floating point emulation to get
maximum performances.

@end itemize

@section ARM emulation

@itemize

@item ARM emulation can currently launch small programs while using the
generic dynamic code generation architecture of QEMU.

@item No FPU support (yet).

@item No automatic regression testing (yet).

@end itemize

@chapter Invocation

@section Quick Start

If you need to compile QEMU, please read the @file{README} which gives
the related information.

In order to launch a Linux process, QEMU needs the process executable
itself and all the target (x86) dynamic libraries used by it. 

@itemize

@item On x86, you can just try to launch any process by using the native
libraries:

@example 
qemu -L / /bin/ls
@end example

@code{-L /} tells that the x86 dynamic linker must be searched with a
@file{/} prefix.

@item Since QEMU is also a linux process, you can launch qemu with qemu:

@example 
qemu -L / qemu -L / /bin/ls
@end example

@item On non x86 CPUs, you need first to download at least an x86 glibc
(@file{qemu-XXX-i386-glibc21.tar.gz} on the QEMU web page). Ensure that
@code{LD_LIBRARY_PATH} is not set:

@example
unset LD_LIBRARY_PATH 
@end example

Then you can launch the precompiled @file{ls} x86 executable:

@example
qemu /usr/local/qemu-i386/bin/ls-i386
@end example
You can look at @file{/usr/local/qemu-i386/bin/qemu-conf.sh} so that
QEMU is automatically launched by the Linux kernel when you try to
launch x86 executables. It requires the @code{binfmt_misc} module in the
Linux kernel.

@item The x86 version of QEMU is also included. You can try weird things such as:
@example
qemu /usr/local/qemu-i386/bin/qemu-i386 /usr/local/qemu-i386/bin/ls-i386
@end example

@end itemize

@section Wine launch

@itemize

@item Ensure that you have a working QEMU with the x86 glibc
distribution (see previous section). In order to verify it, you must be
able to do:

@example
qemu /usr/local/qemu-i386/bin/ls-i386
@end example

@item Download the binary x86 Wine install
(@file{qemu-XXX-i386-wine.tar.gz} on the QEMU web page). 

@item Configure Wine on your account. Look at the provided script
@file{/usr/local/qemu-i386/bin/wine-conf.sh}. Your previous
@code{$@{HOME@}/.wine} directory is saved to @code{$@{HOME@}/.wine.org}.

@item Then you can try the example @file{putty.exe}:

@example
qemu /usr/local/qemu-i386/wine/bin/wine /usr/local/qemu-i386/wine/c/Program\ Files/putty.exe
@end example

@end itemize

@section Command line options

@example
usage: qemu [-h] [-d] [-L path] [-s size] program [arguments...]
@end example

@table @option
@item -h
Print the help
@item -L path   
Set the x86 elf interpreter prefix (default=/usr/local/qemu-i386)
@item -s size
Set the x86 stack size in bytes (default=524288)
@end table

Debug options:

@table @option
@item -d
Activate log (logfile=/tmp/qemu.log)
@item -p pagesize
Act as if the host page size was 'pagesize' bytes
@end table

@chapter QEMU Internals

@section QEMU compared to other emulators

Unlike bochs [3], QEMU emulates only a user space x86 CPU. It means that
you cannot launch an operating system with it. The benefit is that it is
simpler and faster due to the fact that some of the low level CPU state
can be ignored (in particular, no virtual memory needs to be emulated).

Like Valgrind [2], QEMU does user space emulation and dynamic
translation. Valgrind is mainly a memory debugger while QEMU has no
support for it (QEMU could be used to detect out of bound memory accesses
as Valgrind, but it has no support to track uninitialised data as
Valgrind does). Valgrind dynamic translator generates better code than
QEMU (in particular it does register allocation) but it is closely tied
to an x86 host and target.

EM86 [4] is the closest project to QEMU (and QEMU still uses some of its
code, in particular the ELF file loader). EM86 was limited to an alpha
host and used a proprietary and slow interpreter (the interpreter part
of the FX!32 Digital Win32 code translator [5]).

TWIN [6] is a Windows API emulator like Wine. It is less accurate than
Wine but includes a protected mode x86 interpreter to launch x86 Windows
executables. Such an approach as greater potential because most of the
Windows API is executed natively but it is far more difficult to develop
because all the data structures and function parameters exchanged
between the API and the x86 code must be converted.

@section Portable dynamic translation

QEMU is a dynamic translator. When it first encounters a piece of code,
it converts it to the host instruction set. Usually dynamic translators
are very complicated and highly CPU dependent. QEMU uses some tricks
which make it relatively easily portable and simple while achieving good
performances.

The basic idea is to split every x86 instruction into fewer simpler
instructions. Each simple instruction is implemented by a piece of C
code (see @file{op-i386.c}). Then a compile time tool (@file{dyngen})
takes the corresponding object file (@file{op-i386.o}) to generate a
dynamic code generator which concatenates the simple instructions to
build a function (see @file{op-i386.h:dyngen_code()}).

In essence, the process is similar to [1], but more work is done at
compile time. 

A key idea to get optimal performances is that constant parameters can
be passed to the simple operations. For that purpose, dummy ELF
relocations are generated with gcc for each constant parameter. Then,
the tool (@file{dyngen}) can locate the relocations and generate the
appriopriate C code to resolve them when building the dynamic code.

That way, QEMU is no more difficult to port than a dynamic linker.

To go even faster, GCC static register variables are used to keep the
state of the virtual CPU.

@section Register allocation

Since QEMU uses fixed simple instructions, no efficient register
allocation can be done. However, because RISC CPUs have a lot of
register, most of the virtual CPU state can be put in registers without
doing complicated register allocation.

@section Condition code optimisations

Good CPU condition codes emulation (@code{EFLAGS} register on x86) is a
critical point to get good performances. QEMU uses lazy condition code
evaluation: instead of computing the condition codes after each x86
instruction, it just stores one operand (called @code{CC_SRC}), the
result (called @code{CC_DST}) and the type of operation (called
@code{CC_OP}).

@code{CC_OP} is almost never explicitely set in the generated code
because it is known at translation time.

In order to increase performances, a backward pass is performed on the
generated simple instructions (see
@code{translate-i386.c:optimize_flags()}). When it can be proved that
the condition codes are not needed by the next instructions, no
condition codes are computed at all.

@section CPU state optimisations

The x86 CPU has many internal states which change the way it evaluates
instructions. In order to achieve a good speed, the translation phase
considers that some state information of the virtual x86 CPU cannot
change in it. For example, if the SS, DS and ES segments have a zero
base, then the translator does not even generate an addition for the
segment base.

[The FPU stack pointer register is not handled that way yet].

@section Translation cache

A 2MByte cache holds the most recently used translations. For
simplicity, it is completely flushed when it is full. A translation unit
contains just a single basic block (a block of x86 instructions
terminated by a jump or by a virtual CPU state change which the
translator cannot deduce statically).

@section Direct block chaining

After each translated basic block is executed, QEMU uses the simulated
Program Counter (PC) and other cpu state informations (such as the CS
segment base value) to find the next basic block.

In order to accelerate the most common cases where the new simulated PC
is known, QEMU can patch a basic block so that it jumps directly to the
next one.

The most portable code uses an indirect jump. An indirect jump makes it
easier to make the jump target modification atomic. On some
architectures (such as PowerPC), the @code{JUMP} opcode is directly
patched so that the block chaining has no overhead.

@section Self-modifying code and translated code invalidation

Self-modifying code is a special challenge in x86 emulation because no
instruction cache invalidation is signaled by the application when code
is modified.

When translated code is generated for a basic block, the corresponding
host page is write protected if it is not already read-only (with the
system call @code{mprotect()}). Then, if a write access is done to the
page, Linux raises a SEGV signal. QEMU then invalidates all the
translated code in the page and enables write accesses to the page.

Correct translated code invalidation is done efficiently by maintaining
a linked list of every translated block contained in a given page. Other
linked lists are also maintained to undo direct block chaining. 

Althought the overhead of doing @code{mprotect()} calls is important,
most MSDOS programs can be emulated at reasonnable speed with QEMU and
DOSEMU.

Note that QEMU also invalidates pages of translated code when it detects
that memory mappings are modified with @code{mmap()} or @code{munmap()}.

@section Exception support

longjmp() is used when an exception such as division by zero is
encountered. 

The host SIGSEGV and SIGBUS signal handlers are used to get invalid
memory accesses. The exact CPU state can be retrieved because all the
x86 registers are stored in fixed host registers. The simulated program
counter is found by retranslating the corresponding basic block and by
looking where the host program counter was at the exception point.

The virtual CPU cannot retrieve the exact @code{EFLAGS} register because
in some cases it is not computed because of condition code
optimisations. It is not a big concern because the emulated code can
still be restarted in any cases.

@section Linux system call translation

QEMU includes a generic system call translator for Linux. It means that
the parameters of the system calls can be converted to fix the
endianness and 32/64 bit issues. The IOCTLs are converted with a generic
type description system (see @file{ioctls.h} and @file{thunk.c}).

QEMU supports host CPUs which have pages bigger than 4KB. It records all
the mappings the process does and try to emulated the @code{mmap()}
system calls in cases where the host @code{mmap()} call would fail
because of bad page alignment.

@section Linux signals

Normal and real-time signals are queued along with their information
(@code{siginfo_t}) as it is done in the Linux kernel. Then an interrupt
request is done to the virtual CPU. When it is interrupted, one queued
signal is handled by generating a stack frame in the virtual CPU as the
Linux kernel does. The @code{sigreturn()} system call is emulated to return
from the virtual signal handler.

Some signals (such as SIGALRM) directly come from the host. Other
signals are synthetized from the virtual CPU exceptions such as SIGFPE
when a division by zero is done (see @code{main.c:cpu_loop()}).

The blocked signal mask is still handled by the host Linux kernel so
that most signal system calls can be redirected directly to the host
Linux kernel. Only the @code{sigaction()} and @code{sigreturn()} system
calls need to be fully emulated (see @file{signal.c}).

@section clone() system call and threads

The Linux clone() system call is usually used to create a thread. QEMU
uses the host clone() system call so that real host threads are created
for each emulated thread. One virtual CPU instance is created for each
thread.

The virtual x86 CPU atomic operations are emulated with a global lock so
that their semantic is preserved.

Note that currently there are still some locking issues in QEMU. In
particular, the translated cache flush is not protected yet against
reentrancy.

@section Self-virtualization

QEMU was conceived so that ultimately it can emulate itself. Althought
it is not very useful, it is an important test to show the power of the
emulator.

Achieving self-virtualization is not easy because there may be address
space conflicts. QEMU solves this problem by being an executable ELF
shared object as the ld-linux.so ELF interpreter. That way, it can be
relocated at load time.

@section Bibliography

@table @asis

@item [1] 
@url{http://citeseer.nj.nec.com/piumarta98optimizing.html}, Optimizing
direct threaded code by selective inlining (1998) by Ian Piumarta, Fabio
Riccardi.

@item [2]
@url{http://developer.kde.org/~sewardj/}, Valgrind, an open-source
memory debugger for x86-GNU/Linux, by Julian Seward.

@item [3]
@url{http://bochs.sourceforge.net/}, the Bochs IA-32 Emulator Project,
by Kevin Lawton et al.

@item [4]
@url{http://www.cs.rose-hulman.edu/~donaldlf/em86/index.html}, the EM86
x86 emulator on Alpha-Linux.

@item [5]
@url{http://www.usenix.org/publications/library/proceedings/usenix-nt97/full_papers/chernoff/chernoff.pdf},
DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT, by Anton
Chernoff and Ray Hookway.

@item [6]
@url{http://www.willows.com/}, Windows API library emulation from
Willows Software.

@end table

@chapter Regression Tests

In the directory @file{tests/}, various interesting testing programs
are available. There are used for regression testing.

@section @file{hello-i386}

Very simple statically linked x86 program, just to test QEMU during a
port to a new host CPU.

@section @file{hello-arm}

Very simple statically linked ARM program, just to test QEMU during a
port to a new host CPU.

@section @file{test-i386}

This program executes most of the 16 bit and 32 bit x86 instructions and
generates a text output. It can be compared with the output obtained with
a real CPU or another emulator. The target @code{make test} runs this
program and a @code{diff} on the generated output.

The Linux system call @code{modify_ldt()} is used to create x86 selectors
to test some 16 bit addressing and 32 bit with segmentation cases.

The Linux system call @code{vm86()} is used to test vm86 emulation.

Various exceptions are raised to test most of the x86 user space
exception reporting.

@section @file{sha1}

It is a simple benchmark. Care must be taken to interpret the results
because it mostly tests the ability of the virtual CPU to optimize the
@code{rol} x86 instruction and the condition code computations.
Commit	Line	Data
	1	\input texinfo @c -- texinfo --
	2
	3	@settitle QEMU CPU Emulator Reference Documentation
	4	@titlepage
	5	@sp 7
	6	@center @titlefont{QEMU CPU Emulator Reference Documentation}
	7	@sp 3
	8	@end titlepage
	9
	10	@chapter Introduction
	11
	12	@section Features
	13
	14	QEMU is a FAST! processor emulator. Its purpose is to run Linux executables
	15	compiled for one architecture on another. For example, x86 Linux
	16	processes can be ran on PowerPC Linux architectures. By using dynamic
	17	translation it achieves a reasonnable speed while being easy to port on
	18	new host CPUs. Its main goal is to be able to launch the @code{Wine}
	19	Windows API emulator (@url{http://www.winehq.org}) or @code{DOSEMU}
	20	(@url{http://www.dosemu.org}) on non-x86 CPUs.
	21
	22	QEMU generic features:
	23
	24	@itemize
	25
	26	@item User space only emulation.
	27
	28	@item Working on x86 and PowerPC hosts. Being tested on ARM, Sparc32, Alpha and S390.
	29
	30	@item Using dynamic translation to native code for reasonnable speed.
	31
	32	@item Generic Linux system call converter, including most ioctls.
	33
	34	@item clone() emulation using native CPU clone() to use Linux scheduler for threads.
	35
	36	@item Accurate signal handling by remapping host signals to target signals.
	37
	38	@item Self-modifying code support.
	39
	40	@item The virtual CPU is a library (@code{libqemu}) which can be used
	41	in other projects.
	42
	43	@end itemize
	44
	45	@section x86 emulation
	46
	47	QEMU x86 target features:
	48
	49	@itemize
	50
	51	@item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation.
	52	User space LDT and GDT are emulated. VM86 mode is also supported to run DOSEMU.
	53
	54	@item Precise user space x86 exceptions.
	55
	56	@item Support of host page sizes bigger than 4KB.
	57
	58	@item QEMU can emulate itself on x86.
	59
	60	@item An extensive Linux x86 CPU test program is included @file{tests/test-i386}.
	61	It can be used to test other x86 virtual CPUs.
	62
	63	@end itemize
	64
	65	Current QEMU limitations:
	66
	67	@itemize
	68
	69	@item No SSE/MMX support (yet).
	70
	71	@item No x86-64 support.
	72
	73	@item IPC syscalls are missing.
	74
	75	@item The x86 segment limits and access rights are not tested at every
	76	memory access (and will never be to have good performances).
	77
	78	@item On non x86 host CPUs, @code{double}s are used instead of the non standard
	79	10 byte @code{long double}s of x86 for floating point emulation to get
	80	maximum performances.
	81
	82	@end itemize
	83
	84	@section ARM emulation
	85
	86	@itemize
	87
	88	@item ARM emulation can currently launch small programs while using the
	89	generic dynamic code generation architecture of QEMU.
	90
	91	@item No FPU support (yet).
	92
	93	@item No automatic regression testing (yet).
	94
	95	@end itemize
	96
	97	@chapter Invocation
	98
	99	@section Quick Start
	100
	101	If you need to compile QEMU, please read the @file{README} which gives
	102	the related information.
	103
	104	In order to launch a Linux process, QEMU needs the process executable
	105	itself and all the target (x86) dynamic libraries used by it.
	106
	107	@itemize
	108
	109	@item On x86, you can just try to launch any process by using the native
	110	libraries:
	111
	112	@example
	113	qemu -L / /bin/ls
	114	@end example
	115
	116	@code{-L /} tells that the x86 dynamic linker must be searched with a
	117	@file{/} prefix.
	118
	119	@item Since QEMU is also a linux process, you can launch qemu with qemu:
	120
	121	@example
	122	qemu -L / qemu -L / /bin/ls
	123	@end example
	124
	125	@item On non x86 CPUs, you need first to download at least an x86 glibc
	126	(@file{qemu-XXX-i386-glibc21.tar.gz} on the QEMU web page). Ensure that
	127	@code{LD_LIBRARY_PATH} is not set:
	128
	129	@example
	130	unset LD_LIBRARY_PATH
	131	@end example
	132
	133	Then you can launch the precompiled @file{ls} x86 executable:
	134
	135	@example
	136	qemu /usr/local/qemu-i386/bin/ls-i386
	137	@end example
	138	You can look at @file{/usr/local/qemu-i386/bin/qemu-conf.sh} so that
	139	QEMU is automatically launched by the Linux kernel when you try to
	140	launch x86 executables. It requires the @code{binfmt_misc} module in the
	141	Linux kernel.
	142
	143	@item The x86 version of QEMU is also included. You can try weird things such as:
	144	@example
	145	qemu /usr/local/qemu-i386/bin/qemu-i386 /usr/local/qemu-i386/bin/ls-i386
	146	@end example
	147
	148	@end itemize
	149
	150	@section Wine launch
	151
	152	@itemize
	153
	154	@item Ensure that you have a working QEMU with the x86 glibc
	155	distribution (see previous section). In order to verify it, you must be
	156	able to do:
	157
	158	@example
	159	qemu /usr/local/qemu-i386/bin/ls-i386
	160	@end example
	161
	162	@item Download the binary x86 Wine install
	163	(@file{qemu-XXX-i386-wine.tar.gz} on the QEMU web page).
	164
	165	@item Configure Wine on your account. Look at the provided script
	166	@file{/usr/local/qemu-i386/bin/wine-conf.sh}. Your previous
	167	@code{$@{HOME@}/.wine} directory is saved to @code{$@{HOME@}/.wine.org}.
	168
	169	@item Then you can try the example @file{putty.exe}:
	170
	171	@example
	172	qemu /usr/local/qemu-i386/wine/bin/wine /usr/local/qemu-i386/wine/c/Program\ Files/putty.exe
	173	@end example
	174
	175	@end itemize
	176
	177	@section Command line options
	178
	179	@example
	180	usage: qemu [-h] [-d] [-L path] [-s size] program [arguments...]
	181	@end example
	182
	183	@table @option
	184	@item -h
	185	Print the help
	186	@item -L path
	187	Set the x86 elf interpreter prefix (default=/usr/local/qemu-i386)
	188	@item -s size
	189	Set the x86 stack size in bytes (default=524288)
	190	@end table
	191
	192	Debug options:
	193
	194	@table @option
	195	@item -d
	196	Activate log (logfile=/tmp/qemu.log)
	197	@item -p pagesize
	198	Act as if the host page size was 'pagesize' bytes
	199	@end table
	200
	201	@chapter QEMU Internals
	202
	203	@section QEMU compared to other emulators
	204
	205	Unlike bochs [3], QEMU emulates only a user space x86 CPU. It means that
	206	you cannot launch an operating system with it. The benefit is that it is
	207	simpler and faster due to the fact that some of the low level CPU state
	208	can be ignored (in particular, no virtual memory needs to be emulated).
	209
	210	Like Valgrind [2], QEMU does user space emulation and dynamic
	211	translation. Valgrind is mainly a memory debugger while QEMU has no
	212	support for it (QEMU could be used to detect out of bound memory accesses
	213	as Valgrind, but it has no support to track uninitialised data as
	214	Valgrind does). Valgrind dynamic translator generates better code than
	215	QEMU (in particular it does register allocation) but it is closely tied
	216	to an x86 host and target.
	217
	218	EM86 [4] is the closest project to QEMU (and QEMU still uses some of its
	219	code, in particular the ELF file loader). EM86 was limited to an alpha
	220	host and used a proprietary and slow interpreter (the interpreter part
	221	of the FX!32 Digital Win32 code translator [5]).
	222
	223	TWIN [6] is a Windows API emulator like Wine. It is less accurate than
	224	Wine but includes a protected mode x86 interpreter to launch x86 Windows
	225	executables. Such an approach as greater potential because most of the
	226	Windows API is executed natively but it is far more difficult to develop
	227	because all the data structures and function parameters exchanged
	228	between the API and the x86 code must be converted.
	229
	230	@section Portable dynamic translation
	231
	232	QEMU is a dynamic translator. When it first encounters a piece of code,
	233	it converts it to the host instruction set. Usually dynamic translators
	234	are very complicated and highly CPU dependent. QEMU uses some tricks
	235	which make it relatively easily portable and simple while achieving good
	236	performances.
	237
	238	The basic idea is to split every x86 instruction into fewer simpler
	239	instructions. Each simple instruction is implemented by a piece of C
	240	code (see @file{op-i386.c}). Then a compile time tool (@file{dyngen})
	241	takes the corresponding object file (@file{op-i386.o}) to generate a
	242	dynamic code generator which concatenates the simple instructions to
	243	build a function (see @file{op-i386.h:dyngen_code()}).
	244
	245	In essence, the process is similar to [1], but more work is done at
	246	compile time.
	247
	248	A key idea to get optimal performances is that constant parameters can
	249	be passed to the simple operations. For that purpose, dummy ELF
	250	relocations are generated with gcc for each constant parameter. Then,
	251	the tool (@file{dyngen}) can locate the relocations and generate the
	252	appriopriate C code to resolve them when building the dynamic code.
	253
	254	That way, QEMU is no more difficult to port than a dynamic linker.
	255
	256	To go even faster, GCC static register variables are used to keep the
	257	state of the virtual CPU.
	258
	259	@section Register allocation
	260
	261	Since QEMU uses fixed simple instructions, no efficient register
	262	allocation can be done. However, because RISC CPUs have a lot of
	263	register, most of the virtual CPU state can be put in registers without
	264	doing complicated register allocation.
	265
	266	@section Condition code optimisations
	267
	268	Good CPU condition codes emulation (@code{EFLAGS} register on x86) is a
	269	critical point to get good performances. QEMU uses lazy condition code
	270	evaluation: instead of computing the condition codes after each x86
	271	instruction, it just stores one operand (called @code{CC_SRC}), the
	272	result (called @code{CC_DST}) and the type of operation (called
	273	@code{CC_OP}).
	274
	275	@code{CC_OP} is almost never explicitely set in the generated code
	276	because it is known at translation time.
	277
	278	In order to increase performances, a backward pass is performed on the
	279	generated simple instructions (see
	280	@code{translate-i386.c:optimize_flags()}). When it can be proved that
	281	the condition codes are not needed by the next instructions, no
	282	condition codes are computed at all.
	283
	284	@section CPU state optimisations
	285
	286	The x86 CPU has many internal states which change the way it evaluates
	287	instructions. In order to achieve a good speed, the translation phase
	288	considers that some state information of the virtual x86 CPU cannot
	289	change in it. For example, if the SS, DS and ES segments have a zero
	290	base, then the translator does not even generate an addition for the
	291	segment base.
	292
	293	[The FPU stack pointer register is not handled that way yet].
	294
	295	@section Translation cache
	296
	297	A 2MByte cache holds the most recently used translations. For
	298	simplicity, it is completely flushed when it is full. A translation unit
	299	contains just a single basic block (a block of x86 instructions
	300	terminated by a jump or by a virtual CPU state change which the
	301	translator cannot deduce statically).
	302
	303	@section Direct block chaining
	304
	305	After each translated basic block is executed, QEMU uses the simulated
	306	Program Counter (PC) and other cpu state informations (such as the CS
	307	segment base value) to find the next basic block.
	308
	309	In order to accelerate the most common cases where the new simulated PC
	310	is known, QEMU can patch a basic block so that it jumps directly to the
	311	next one.
	312
	313	The most portable code uses an indirect jump. An indirect jump makes it
	314	easier to make the jump target modification atomic. On some
	315	architectures (such as PowerPC), the @code{JUMP} opcode is directly
	316	patched so that the block chaining has no overhead.
	317
	318	@section Self-modifying code and translated code invalidation
	319
	320	Self-modifying code is a special challenge in x86 emulation because no
	321	instruction cache invalidation is signaled by the application when code
	322	is modified.
	323
	324	When translated code is generated for a basic block, the corresponding
	325	host page is write protected if it is not already read-only (with the
	326	system call @code{mprotect()}). Then, if a write access is done to the
	327	page, Linux raises a SEGV signal. QEMU then invalidates all the
	328	translated code in the page and enables write accesses to the page.
	329
	330	Correct translated code invalidation is done efficiently by maintaining
	331	a linked list of every translated block contained in a given page. Other
	332	linked lists are also maintained to undo direct block chaining.
	333
	334	Althought the overhead of doing @code{mprotect()} calls is important,
	335	most MSDOS programs can be emulated at reasonnable speed with QEMU and
	336	DOSEMU.
	337
	338	Note that QEMU also invalidates pages of translated code when it detects
	339	that memory mappings are modified with @code{mmap()} or @code{munmap()}.
	340
	341	@section Exception support
	342
	343	longjmp() is used when an exception such as division by zero is
	344	encountered.
	345
	346	The host SIGSEGV and SIGBUS signal handlers are used to get invalid
	347	memory accesses. The exact CPU state can be retrieved because all the
	348	x86 registers are stored in fixed host registers. The simulated program
	349	counter is found by retranslating the corresponding basic block and by
	350	looking where the host program counter was at the exception point.
	351
	352	The virtual CPU cannot retrieve the exact @code{EFLAGS} register because
	353	in some cases it is not computed because of condition code
	354	optimisations. It is not a big concern because the emulated code can
	355	still be restarted in any cases.
	356
	357	@section Linux system call translation
	358
	359	QEMU includes a generic system call translator for Linux. It means that
	360	the parameters of the system calls can be converted to fix the
	361	endianness and 32/64 bit issues. The IOCTLs are converted with a generic
	362	type description system (see @file{ioctls.h} and @file{thunk.c}).
	363
	364	QEMU supports host CPUs which have pages bigger than 4KB. It records all
	365	the mappings the process does and try to emulated the @code{mmap()}
	366	system calls in cases where the host @code{mmap()} call would fail
	367	because of bad page alignment.
	368
	369	@section Linux signals
	370
	371	Normal and real-time signals are queued along with their information
	372	(@code{siginfo_t}) as it is done in the Linux kernel. Then an interrupt
	373	request is done to the virtual CPU. When it is interrupted, one queued
	374	signal is handled by generating a stack frame in the virtual CPU as the
	375	Linux kernel does. The @code{sigreturn()} system call is emulated to return
	376	from the virtual signal handler.
	377
	378	Some signals (such as SIGALRM) directly come from the host. Other
	379	signals are synthetized from the virtual CPU exceptions such as SIGFPE
	380	when a division by zero is done (see @code{main.c:cpu_loop()}).
	381
	382	The blocked signal mask is still handled by the host Linux kernel so
	383	that most signal system calls can be redirected directly to the host
	384	Linux kernel. Only the @code{sigaction()} and @code{sigreturn()} system
	385	calls need to be fully emulated (see @file{signal.c}).
	386
	387	@section clone() system call and threads
	388
	389	The Linux clone() system call is usually used to create a thread. QEMU
	390	uses the host clone() system call so that real host threads are created
	391	for each emulated thread. One virtual CPU instance is created for each
	392	thread.
	393
	394	The virtual x86 CPU atomic operations are emulated with a global lock so
	395	that their semantic is preserved.
	396
	397	Note that currently there are still some locking issues in QEMU. In
	398	particular, the translated cache flush is not protected yet against
	399	reentrancy.
	400
	401	@section Self-virtualization
	402
	403	QEMU was conceived so that ultimately it can emulate itself. Althought
	404	it is not very useful, it is an important test to show the power of the
	405	emulator.
	406
	407	Achieving self-virtualization is not easy because there may be address
	408	space conflicts. QEMU solves this problem by being an executable ELF
	409	shared object as the ld-linux.so ELF interpreter. That way, it can be
	410	relocated at load time.
	411
	412	@section Bibliography
	413
	414	@table @asis
	415
	416	@item [1]
	417	@url{http://citeseer.nj.nec.com/piumarta98optimizing.html}, Optimizing
	418	direct threaded code by selective inlining (1998) by Ian Piumarta, Fabio
	419	Riccardi.
	420
	421	@item [2]
	422	@url{http://developer.kde.org/~sewardj/}, Valgrind, an open-source
	423	memory debugger for x86-GNU/Linux, by Julian Seward.
	424
	425	@item [3]
	426	@url{http://bochs.sourceforge.net/}, the Bochs IA-32 Emulator Project,
	427	by Kevin Lawton et al.
	428
	429	@item [4]
	430	@url{http://www.cs.rose-hulman.edu/~donaldlf/em86/index.html}, the EM86
	431	x86 emulator on Alpha-Linux.
	432
	433	@item [5]
	434	@url{http://www.usenix.org/publications/library/proceedings/usenix-nt97/full_papers/chernoff/chernoff.pdf},
	435	DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT, by Anton
	436	Chernoff and Ray Hookway.
	437
	438	@item [6]
	439	@url{http://www.willows.com/}, Windows API library emulation from
	440	Willows Software.
	441
	442	@end table
	443
	444	@chapter Regression Tests
	445
	446	In the directory @file{tests/}, various interesting testing programs
	447	are available. There are used for regression testing.
	448
	449	@section @file{hello-i386}
	450
	451	Very simple statically linked x86 program, just to test QEMU during a
	452	port to a new host CPU.
	453
	454	@section @file{hello-arm}
	455
	456	Very simple statically linked ARM program, just to test QEMU during a
	457	port to a new host CPU.
	458
	459	@section @file{test-i386}
	460
	461	This program executes most of the 16 bit and 32 bit x86 instructions and
	462	generates a text output. It can be compared with the output obtained with
	463	a real CPU or another emulator. The target @code{make test} runs this
	464	program and a @code{diff} on the generated output.
	465
	466	The Linux system call @code{modify_ldt()} is used to create x86 selectors
	467	to test some 16 bit addressing and 32 bit with segmentation cases.
	468
	469	The Linux system call @code{vm86()} is used to test vm86 emulation.
	470
	471	Various exceptions are raised to test most of the x86 user space
	472	exception reporting.
	473
	474	@section @file{sha1}
	475
	476	It is a simple benchmark. Care must be taken to interpret the results
	477	because it mostly tests the ability of the virtual CPU to optimize the
	478	@code{rol} x86 instruction and the condition code computations.
	479