]> git.proxmox.com Git - mirror_qemu.git/blame - docs/replay.txt
parallels: wrong call to bdrv_truncate
[mirror_qemu.git] / docs / replay.txt
CommitLineData
d73abd6d
PD
1Copyright (c) 2010-2015 Institute for System Programming
2 of the Russian Academy of Sciences.
3
4This work is licensed under the terms of the GNU GPL, version 2 or later.
5See the COPYING file in the top-level directory.
6
7Record/replay
8-------------
9
10Record/replay functions are used for the reverse execution and deterministic
11replay of qemu execution. This implementation of deterministic replay can
12be used for deterministic debugging of guest code through a gdb remote
13interface.
14
15Execution recording writes a non-deterministic events log, which can be later
16used for replaying the execution anywhere and for unlimited number of times.
17It also supports checkpointing for faster rewinding during reverse debugging.
18Execution replaying reads the log and replays all non-deterministic events
19including external input, hardware clocks, and interrupts.
20
21Deterministic replay has the following features:
22 * Deterministically replays whole system execution and all contents of
23 the memory, state of the hardware devices, clocks, and screen of the VM.
24 * Writes execution log into the file for later replaying for multiple times
25 on different machines.
26 * Supports i386, x86_64, and ARM hardware platforms.
27 * Performs deterministic replay of all operations with keyboard and mouse
28 input devices.
29
30Usage of the record/replay:
31 * First, record the execution, by adding the following arguments to the command line:
32 '-icount shift=7,rr=record,rrfile=replay.bin -net none'.
33 Block devices' images are not actually changed in the recording mode,
34 because all of the changes are written to the temporary overlay file.
35 * Then you can replay it by using another command
36 line option: '-icount shift=7,rr=replay,rrfile=replay.bin -net none'
37 * '-net none' option should also be specified if network replay patches
38 are not applied.
39
40Papers with description of deterministic replay implementation:
41http://www.computer.org/csdl/proceedings/csmr/2012/4666/00/4666a553-abs.html
42http://dl.acm.org/citation.cfm?id=2786805.2803179
43
44Modifications of qemu include:
45 * wrappers for clock and time functions to save their return values in the log
46 * saving different asynchronous events (e.g. system shutdown) into the log
47 * synchronization of the bottom halves execution
48 * synchronization of the threads from thread pool
49 * recording/replaying user input (mouse and keyboard)
50 * adding internal checkpoints for cpu and io synchronization
51
52Non-deterministic events
53------------------------
54
55Our record/replay system is based on saving and replaying non-deterministic
56events (e.g. keyboard input) and simulating deterministic ones (e.g. reading
57from HDD or memory of the VM). Saving only non-deterministic events makes
58log file smaller, simulation faster, and allows using reverse debugging even
59for realtime applications.
60
61The following non-deterministic data from peripheral devices is saved into
62the log: mouse and keyboard input, network packets, audio controller input,
63USB packets, serial port input, and hardware clocks (they are non-deterministic
64too, because their values are taken from the host machine). Inputs from
65simulated hardware, memory of VM, software interrupts, and execution of
66instructions are not saved into the log, because they are deterministic and
67can be replayed by simulating the behavior of virtual machine starting from
68initial state.
69
70We had to solve three tasks to implement deterministic replay: recording
71non-deterministic events, replaying non-deterministic events, and checking
72that there is no divergence between record and replay modes.
73
74We changed several parts of QEMU to make event log recording and replaying.
75Devices' models that have non-deterministic input from external devices were
76changed to write every external event into the execution log immediately.
77E.g. network packets are written into the log when they arrive into the virtual
78network adapter.
79
80All non-deterministic events are coming from these devices. But to
81replay them we need to know at which moments they occur. We specify
82these moments by counting the number of instructions executed between
83every pair of consecutive events.
84
85Instruction counting
86--------------------
87
88QEMU should work in icount mode to use record/replay feature. icount was
89designed to allow deterministic execution in absence of external inputs
90of the virtual machine. We also use icount to control the occurrence of the
91non-deterministic events. The number of instructions elapsed from the last event
92is written to the log while recording the execution. In replay mode we
93can predict when to inject that event using the instruction counter.
94
95Timers
96------
97
98Timers are used to execute callbacks from different subsystems of QEMU
99at the specified moments of time. There are several kinds of timers:
100 * Real time clock. Based on host time and used only for callbacks that
101 do not change the virtual machine state. For this reason real time
102 clock and timers does not affect deterministic replay at all.
103 * Virtual clock. These timers run only during the emulation. In icount
104 mode virtual clock value is calculated using executed instructions counter.
105 That is why it is completely deterministic and does not have to be recorded.
106 * Host clock. This clock is used by device models that simulate real time
107 sources (e.g. real time clock chip). Host clock is the one of the sources
108 of non-determinism. Host clock read operations should be logged to
109 make the execution deterministic.
e76d1798 110 * Virtual real time clock. This clock is similar to real time clock but
d73abd6d
PD
111 it is used only for increasing virtual clock while virtual machine is
112 sleeping. Due to its nature it is also non-deterministic as the host clock
113 and has to be logged too.
114
115Checkpoints
116-----------
117
118Replaying of the execution of virtual machine is bound by sources of
119non-determinism. These are inputs from clock and peripheral devices,
120and QEMU thread scheduling. Thread scheduling affect on processing events
121from timers, asynchronous input-output, and bottom halves.
122
123Invocations of timers are coupled with clock reads and changing the state
124of the virtual machine. Reads produce non-deterministic data taken from
125host clock. And VM state changes should preserve their order. Their relative
126order in replay mode must replicate the order of callbacks in record mode.
127To preserve this order we use checkpoints. When a specific clock is processed
128in record mode we save to the log special "checkpoint" event.
129Checkpoints here do not refer to virtual machine snapshots. They are just
130record/replay events used for synchronization.
131
132QEMU in replay mode will try to invoke timers processing in random moment
133of time. That's why we do not process a group of timers until the checkpoint
134event will be read from the log. Such an event allows synchronizing CPU
135execution and timer events.
136
e76d1798
PD
137Two other checkpoints govern the "warping" of the virtual clock.
138While the virtual machine is idle, the virtual clock increments at
1391 ns per *real time* nanosecond. This is done by setting up a timer
140(called the warp timer) on the virtual real time clock, so that the
141timer fires at the next deadline of the virtual clock; the virtual clock
142is then incremented (which is called "warping" the virtual clock) as
143soon as the timer fires or the CPUs need to go out of the idle state.
144Two functions are used for this purpose; because these actions change
145virtual machine state and must be deterministic, each of them creates a
146checkpoint. qemu_start_warp_timer checks if the CPUs are idle and if so
147starts accounting real time to virtual clock. qemu_account_warp_timer
148is called when the CPUs get an interrupt or when the warp timer fires,
149and it warps the virtual clock by the amount of real time that has passed
150since qemu_start_warp_timer.
d73abd6d
PD
151
152Bottom halves
153-------------
154
155Disk I/O events are completely deterministic in our model, because
156in both record and replay modes we start virtual machine from the same
157disk state. But callbacks that virtual disk controller uses for reading and
158writing the disk may occur at different moments of time in record and replay
159modes.
160
161Reading and writing requests are created by CPU thread of QEMU. Later these
162requests proceed to block layer which creates "bottom halves". Bottom
163halves consist of callback and its parameters. They are processed when
164main loop locks the global mutex. These locks are not synchronized with
165replaying process because main loop also processes the events that do not
166affect the virtual machine state (like user interaction with monitor).
167
168That is why we had to implement saving and replaying bottom halves callbacks
169synchronously to the CPU execution. When the callback is about to execute
170it is added to the queue in the replay module. This queue is written to the
171log when its callbacks are executed. In replay mode callbacks are not processed
172until the corresponding event is read from the events log file.
173
174Sometimes the block layer uses asynchronous callbacks for its internal purposes
175(like reading or writing VM snapshots or disk image cluster tables). In this
176case bottom halves are not marked as "replayable" and do not saved
177into the log.
63785678
PD
178
179Block devices
180-------------
181
182Block devices record/replay module intercepts calls of
183bdrv coroutine functions at the top of block drivers stack.
184To record and replay block operations the drive must be configured
185as following:
186 -drive file=disk.qcow,if=none,id=img-direct
187 -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay
188 -device ide-hd,drive=img-blkreplay
189
190blkreplay driver should be inserted between disk image and virtual driver
191controller. Therefore all disk requests may be recorded and replayed.
192
193All block completion operations are added to the queue in the coroutines.
194Queue is flushed at checkpoints and information about processed requests
195is recorded to the log. In replay phase the queue is matched with
196events read from the log. Therefore block devices requests are processed
197deterministically.
646c5478 198
9c2037d0
PD
199Snapshotting
200------------
201
202New VM snapshots may be created in replay mode. They can be used later
203to recover the desired VM state. All VM states created in replay mode
204are associated with the moment of time in the replay scenario.
205After recovering the VM state replay will start from that position.
206
207Default starting snapshot name may be specified with icount field
208rrsnapshot as follows:
209 -icount shift=7,rr=record,rrfile=replay.bin,rrsnapshot=snapshot_name
210
211This snapshot is created at start of recording and restored at start
212of replaying. It also can be loaded while replaying to roll back
213the execution.
214
646c5478
PD
215Network devices
216---------------
217
218Record and replay for network interactions is performed with the network filter.
219Each backend must have its own instance of the replay filter as follows:
220 -netdev user,id=net1 -device rtl8139,netdev=net1
221 -object filter-replay,id=replay,netdev=net1
222
223Replay network filter is used to record and replay network packets. While
224recording the virtual machine this filter puts all packets coming from
225the outer world into the log. In replay mode packets from the log are
226injected into the network device. All interactions with network backend
227in replay mode are disabled.
3d4d16f4
PD
228
229Audio devices
230-------------
231
232Audio data is recorded and replay automatically. The command line for recording
233and replaying must contain identical specifications of audio hardware, e.g.:
234 -soundhw ac97