]>
Commit | Line | Data |
---|---|---|
81a97d9d SH |
1 | = Tracing = |
2 | ||
3 | == Introduction == | |
4 | ||
5 | This document describes the tracing infrastructure in QEMU and how to use it | |
6 | for debugging, profiling, and observing execution. | |
7 | ||
8 | == Quickstart == | |
9 | ||
10 | 1. Build with the 'simple' trace backend: | |
11 | ||
5b808275 | 12 | ./configure --enable-trace-backends=simple |
81a97d9d SH |
13 | make |
14 | ||
03727e6a | 15 | 2. Create a file with the events you want to trace: |
81a97d9d | 16 | |
304187c5 | 17 | echo memory_region_ops_read >/tmp/events |
81a97d9d | 18 | |
03727e6a LV |
19 | 3. Run the virtual machine to produce a trace file: |
20 | ||
db817b8c | 21 | qemu --trace events=/tmp/events ... # your normal QEMU invocation |
03727e6a LV |
22 | |
23 | 4. Pretty-print the binary trace file: | |
81a97d9d | 24 | |
1412cf58 | 25 | ./scripts/simpletrace.py trace-events-all trace-* # Override * with QEMU <pid> |
81a97d9d SH |
26 | |
27 | == Trace events == | |
28 | ||
d4fa8436 | 29 | === Sub-directory setup === |
81a97d9d | 30 | |
d4fa8436 DB |
31 | Each directory in the source tree can declare a set of static trace events |
32 | in a local "trace-events" file. All directories which contain "trace-events" | |
33 | files must be listed in the "trace-events-subdirs" make variable in the top | |
34 | level Makefile.objs. During build, the "trace-events" file in each listed | |
35 | subdirectory will be processed by the "tracetool" script to generate code for | |
36 | the trace events. | |
37 | ||
38 | The individual "trace-events" files are merged into a "trace-events-all" file, | |
39 | which is also installed into "/usr/share/qemu" with the name "trace-events". | |
40 | This merged file is to be used by the "simpletrace.py" script to later analyse | |
41 | traces in the simpletrace data format. | |
42 | ||
43 | In the sub-directory the following files will be automatically generated | |
44 | ||
45 | - trace.c - the trace event state declarations | |
46 | - trace.h - the trace event enums and probe functions | |
47 | - trace-dtrace.h - DTrace event probe specification | |
48 | - trace-dtrace.dtrace - DTrace event probe helper declaration | |
49 | - trace-dtrace.o - binary DTrace provider (generated by dtrace) | |
50 | - trace-ust.h - UST event probe helper declarations | |
51 | ||
52 | Source files in the sub-directory should #include the local 'trace.h' file, | |
53 | without any sub-directory path prefix. eg io/channel-buffer.c would do | |
54 | ||
55 | #include "trace.h" | |
56 | ||
57 | To access the 'io/trace.h' file. While it is possible to include a trace.h | |
58 | file from outside a source files' own sub-directory, this is discouraged in | |
59 | general. It is strongly preferred that all events be declared directly in | |
60 | the sub-directory that uses them. The only exception is where there are some | |
61 | shared trace events defined in the top level directory trace-events file. | |
62 | The top level directory generates trace files with a filename prefix of | |
63 | "trace-root" instead of just "trace". This is to avoid ambiguity between | |
64 | a trace.h in the current directory, vs the top level directory. | |
65 | ||
66 | === Using trace events === | |
1412cf58 DB |
67 | |
68 | Trace events are invoked directly from source code like this: | |
81a97d9d SH |
69 | |
70 | #include "trace.h" /* needed for trace event prototype */ | |
49926043 | 71 | |
4b710a3c | 72 | void *qemu_vmalloc(size_t size) |
81a97d9d SH |
73 | { |
74 | void *ptr; | |
4b710a3c LV |
75 | size_t align = QEMU_VMALLOC_ALIGN; |
76 | ||
77 | if (size < align) { | |
78 | align = getpagesize(); | |
81a97d9d | 79 | } |
4b710a3c LV |
80 | ptr = qemu_memalign(align, size); |
81 | trace_qemu_vmalloc(size, ptr); | |
81a97d9d SH |
82 | return ptr; |
83 | } | |
84 | ||
85 | === Declaring trace events === | |
86 | ||
7b92e5bc | 87 | The "tracetool" script produces the trace.h header file which is included by |
81a97d9d | 88 | every source file that uses trace events. Since many source files include |
7b92e5bc LV |
89 | trace.h, it uses a minimum of types and other header files included to keep the |
90 | namespace clean and compile times and dependencies down. | |
81a97d9d SH |
91 | |
92 | Trace events should use types as follows: | |
93 | ||
94 | * Use stdint.h types for fixed-size types. Most offsets and guest memory | |
95 | addresses are best represented with uint32_t or uint64_t. Use fixed-size | |
96 | types over primitive types whose size may change depending on the host | |
97 | (32-bit versus 64-bit) so trace events don't truncate values or break | |
98 | the build. | |
99 | ||
100 | * Use void * for pointers to structs or for arrays. The trace.h header | |
101 | cannot include all user-defined struct declarations and it is therefore | |
102 | necessary to use void * for pointers to structs. | |
103 | ||
104 | * For everything else, use primitive scalar types (char, int, long) with the | |
105 | appropriate signedness. | |
106 | ||
ec09f877 SH |
107 | * Avoid floating point types (float and double) because SystemTap does not |
108 | support them. In most cases it is possible to round to an integer type | |
109 | instead. This may require scaling the value first by multiplying it by 1000 | |
110 | or the like when digits after the decimal point need to be preserved. | |
111 | ||
9a85d394 SH |
112 | Format strings should reflect the types defined in the trace event. Take |
113 | special care to use PRId64 and PRIu64 for int64_t and uint64_t types, | |
913540a3 | 114 | respectively. This ensures portability between 32- and 64-bit platforms. |
9f7ad79c PMD |
115 | Format strings must not end with a newline character. It is the responsibility |
116 | of backends to adapt line ending for proper logging. | |
9a85d394 | 117 | |
d4fa8436 DB |
118 | Each event declaration will start with the event name, then its arguments, |
119 | finally a format string for pretty-printing. For example: | |
120 | ||
121 | qemu_vmalloc(size_t size, void *ptr) "size %zu ptr %p" | |
122 | qemu_vfree(void *ptr) "ptr %p" | |
123 | ||
124 | ||
81a97d9d SH |
125 | === Hints for adding new trace events === |
126 | ||
127 | 1. Trace state changes in the code. Interesting points in the code usually | |
128 | involve a state change like starting, stopping, allocating, freeing. State | |
129 | changes are good trace events because they can be used to understand the | |
130 | execution of the system. | |
131 | ||
132 | 2. Trace guest operations. Guest I/O accesses like reading device registers | |
133 | are good trace events because they can be used to understand guest | |
134 | interactions. | |
135 | ||
136 | 3. Use correlator fields so the context of an individual line of trace output | |
137 | can be understood. For example, trace the pointer returned by malloc and | |
138 | used as an argument to free. This way mallocs and frees can be matched up. | |
139 | Trace events with no context are not very useful. | |
140 | ||
141 | 4. Name trace events after their function. If there are multiple trace events | |
142 | in one function, append a unique distinguisher at the end of the name. | |
143 | ||
31965ae2 LV |
144 | == Generic interface and monitor commands == |
145 | ||
b1bae816 LV |
146 | You can programmatically query and control the state of trace events through a |
147 | backend-agnostic interface provided by the header "trace/control.h". | |
31965ae2 | 148 | |
b1bae816 LV |
149 | Note that some of the backends do not provide an implementation for some parts |
150 | of this interface, in which case QEMU will just print a warning (please refer to | |
151 | header "trace/control.h" to see which routines are backend-dependent). | |
31965ae2 | 152 | |
b1bae816 | 153 | The state of events can also be queried and modified through monitor commands: |
31965ae2 LV |
154 | |
155 | * info trace-events | |
156 | View available trace events and their state. State 1 means enabled, state 0 | |
157 | means disabled. | |
158 | ||
159 | * trace-event NAME on|off | |
b1bae816 | 160 | Enable/disable a given trace event or a group of events (using wildcards). |
31965ae2 | 161 | |
db817b8c | 162 | The "--trace events=<file>" command line argument can be used to enable the |
23d15e86 LV |
163 | events listed in <file> from the very beginning of the program. This file must |
164 | contain one event name per line. | |
165 | ||
db817b8c | 166 | If a line in the "--trace events=<file>" file begins with a '-', the trace event |
8f5a0fb1 SH |
167 | will be disabled instead of enabled. This is useful when a wildcard was used |
168 | to enable an entire family of events but one noisy event needs to be disabled. | |
169 | ||
b1bae816 LV |
170 | Wildcard matching is supported in both the monitor command "trace-event" and the |
171 | events list file. That means you can enable/disable the events having a common | |
172 | prefix in a batch. For example, virtio-blk trace events could be enabled using | |
173 | the following monitor command: | |
174 | ||
175 | trace-event virtio_blk_* on | |
176 | ||
81a97d9d SH |
177 | == Trace backends == |
178 | ||
7b92e5bc | 179 | The "tracetool" script automates tedious trace event code generation and also |
81a97d9d SH |
180 | keeps the trace event declarations independent of the trace backend. The trace |
181 | events are not tightly coupled to a specific trace backend, such as LTTng or | |
7b92e5bc | 182 | SystemTap. Support for trace backends can be added by extending the "tracetool" |
81a97d9d SH |
183 | script. |
184 | ||
b73e8bd4 | 185 | The trace backends are chosen at configure time: |
81a97d9d | 186 | |
b73e8bd4 | 187 | ./configure --enable-trace-backends=simple |
81a97d9d SH |
188 | |
189 | For a list of supported trace backends, try ./configure --help or see below. | |
b73e8bd4 | 190 | If multiple backends are enabled, the trace is sent to them all. |
81a97d9d | 191 | |
3b0fc80d PM |
192 | If no backends are explicitly selected, configure will default to the |
193 | "log" backend. | |
194 | ||
81a97d9d SH |
195 | The following subsections describe the supported trace backends. |
196 | ||
197 | === Nop === | |
198 | ||
199 | The "nop" backend generates empty trace event functions so that the compiler | |
3b0fc80d PM |
200 | can optimize out trace events completely. This imposes no performance |
201 | penalty. | |
81a97d9d | 202 | |
dd215f64 LV |
203 | Note that regardless of the selected trace backend, events with the "disable" |
204 | property will be generated with the "nop" backend. | |
205 | ||
ab8eb29c | 206 | === Log === |
b48c20f7 | 207 | |
ab8eb29c | 208 | The "log" backend sends trace events directly to standard error. This |
b48c20f7 SH |
209 | effectively turns trace events into debug printfs. |
210 | ||
211 | This is the simplest backend and can be used together with existing code that | |
212 | uses DPRINTF(). | |
213 | ||
81a97d9d SH |
214 | === Simpletrace === |
215 | ||
216 | The "simple" backend supports common use cases and comes as part of the QEMU | |
217 | source tree. It may not be as powerful as platform-specific or third-party | |
218 | trace backends but it is portable. This is the recommended trace backend | |
219 | unless you have specific needs for more advanced backends. | |
220 | ||
e64dd5ef ET |
221 | === Ftrace === |
222 | ||
223 | The "ftrace" backend writes trace data to ftrace marker. This effectively | |
224 | sends trace events to ftrace ring buffer, and you can compare qemu trace | |
225 | data and kernel(especially kvm.ko when using KVM) trace data. | |
226 | ||
227 | if you use KVM, enable kvm events in ftrace: | |
228 | ||
229 | # echo 1 > /sys/kernel/debug/tracing/events/kvm/enable | |
230 | ||
231 | After running qemu by root user, you can get the trace: | |
232 | ||
233 | # cat /sys/kernel/debug/tracing/trace | |
234 | ||
235 | Restriction: "ftrace" backend is restricted to Linux only. | |
236 | ||
0a852417 PD |
237 | === Syslog === |
238 | ||
239 | The "syslog" backend sends trace events using the POSIX syslog API. The log | |
240 | is opened specifying the LOG_DAEMON facility and LOG_PID option (so events | |
241 | are tagged with the pid of the particular QEMU process that generated | |
242 | them). All events are logged at LOG_INFO level. | |
243 | ||
244 | NOTE: syslog may squash duplicate consecutive trace events and apply rate | |
245 | limiting. | |
246 | ||
247 | Restriction: "syslog" backend is restricted to POSIX compliant OS. | |
248 | ||
81a97d9d SH |
249 | ==== Monitor commands ==== |
250 | ||
81a97d9d SH |
251 | * trace-file on|off|flush|set <path> |
252 | Enable/disable/flush the trace file or set the trace file name. | |
253 | ||
81a97d9d SH |
254 | ==== Analyzing trace files ==== |
255 | ||
256 | The "simple" backend produces binary trace files that can be formatted with the | |
1412cf58 DB |
257 | simpletrace.py script. The script takes the "trace-events-all" file and the |
258 | binary trace: | |
81a97d9d | 259 | |
1412cf58 | 260 | ./scripts/simpletrace.py trace-events-all trace-12345 |
81a97d9d | 261 | |
1412cf58 | 262 | You must ensure that the same "trace-events-all" file was used to build QEMU, |
81a97d9d SH |
263 | otherwise trace event declarations may have changed and output will not be |
264 | consistent. | |
265 | ||
266 | === LTTng Userspace Tracer === | |
267 | ||
268 | The "ust" backend uses the LTTng Userspace Tracer library. There are no | |
269 | monitor commands built into QEMU, instead UST utilities should be used to list, | |
270 | enable/disable, and dump traces. | |
b48c20f7 | 271 | |
ef3ef4a0 MG |
272 | Package lttng-tools is required for userspace tracing. You must ensure that the |
273 | current user belongs to the "tracing" group, or manually launch the | |
274 | lttng-sessiond daemon for the current user prior to running any instance of | |
275 | QEMU. | |
276 | ||
277 | While running an instrumented QEMU, LTTng should be able to list all available | |
278 | events: | |
279 | ||
280 | lttng list -u | |
281 | ||
282 | Create tracing session: | |
283 | ||
284 | lttng create mysession | |
285 | ||
286 | Enable events: | |
287 | ||
288 | lttng enable-event qemu:g_malloc -u | |
289 | ||
290 | Where the events can either be a comma-separated list of events, or "-a" to | |
291 | enable all tracepoint events. Start and stop tracing as needed: | |
292 | ||
293 | lttng start | |
294 | lttng stop | |
295 | ||
296 | View the trace: | |
297 | ||
298 | lttng view | |
299 | ||
300 | Destroy tracing session: | |
301 | ||
302 | lttng destroy | |
303 | ||
304 | Babeltrace can be used at any later time to view the trace: | |
305 | ||
306 | babeltrace $HOME/lttng-traces/mysession-<date>-<time> | |
307 | ||
b48c20f7 SH |
308 | === SystemTap === |
309 | ||
310 | The "dtrace" backend uses DTrace sdt probes but has only been tested with | |
311 | SystemTap. When SystemTap support is detected a .stp file with wrapper probes | |
312 | is generated to make use in scripts more convenient. This step can also be | |
313 | performed manually after a build in order to change the binary name in the .stp | |
314 | probes: | |
315 | ||
2e4ccbbc LM |
316 | scripts/tracetool.py --backends=dtrace --format=stap \ |
317 | --binary path/to/qemu-binary \ | |
318 | --target-type system \ | |
319 | --target-name x86_64 \ | |
bd200384 SH |
320 | --group=all \ |
321 | trace-events-all >qemu.stp | |
b7d66a76 | 322 | |
62dd1048 DB |
323 | To facilitate simple usage of systemtap where there merely needs to be printf |
324 | logging of certain probes, a helper script "qemu-trace-stap" is provided. | |
325 | Consult its manual page for guidance on its usage. | |
326 | ||
b7d66a76 LV |
327 | == Trace event properties == |
328 | ||
1412cf58 | 329 | Each event in the "trace-events-all" file can be prefixed with a space-separated |
b7d66a76 LV |
330 | list of zero or more of the following event properties. |
331 | ||
332 | === "disable" === | |
333 | ||
334 | If a specific trace event is going to be invoked a huge number of times, this | |
335 | might have a noticeable performance impact even when the event is | |
336 | programmatically disabled. | |
337 | ||
338 | In this case you should declare such event with the "disable" property. This | |
339 | will effectively disable the event at compile time (by using the "nop" backend), | |
340 | thus having no performance impact at all on regular builds (i.e., unless you | |
1412cf58 | 341 | edit the "trace-events-all" file). |
b7d66a76 LV |
342 | |
343 | In addition, there might be cases where relatively complex computations must be | |
344 | performed to generate values that are only used as arguments for a trace | |
345 | function. In these cases you can use the macro 'TRACE_${EVENT_NAME}_ENABLED' to | |
346 | guard such computations and avoid its compilation when the event is disabled: | |
347 | ||
348 | #include "trace.h" /* needed for trace event prototype */ | |
349 | ||
350 | void *qemu_vmalloc(size_t size) | |
351 | { | |
352 | void *ptr; | |
353 | size_t align = QEMU_VMALLOC_ALIGN; | |
354 | ||
355 | if (size < align) { | |
356 | align = getpagesize(); | |
357 | } | |
358 | ptr = qemu_memalign(align, size); | |
359 | if (TRACE_QEMU_VMALLOC_ENABLED) { /* preprocessor macro */ | |
360 | void *complex; | |
361 | /* some complex computations to produce the 'complex' value */ | |
362 | trace_qemu_vmalloc(size, ptr, complex); | |
363 | } | |
364 | return ptr; | |
365 | } | |
b1bae816 LV |
366 | |
367 | You can check both if the event has been disabled and is dynamically enabled at | |
d87aa138 | 368 | the same time using the 'trace_event_get_state_backends' routine (see header |
b1bae816 | 369 | "trace/control.h" for more information). |
0bb403b0 LV |
370 | |
371 | === "tcg" === | |
372 | ||
373 | Guest code generated by TCG can be traced by defining an event with the "tcg" | |
374 | event property. Internally, this property generates two events: | |
375 | "<eventname>_trans" to trace the event at translation time, and | |
376 | "<eventname>_exec" to trace the event at execution time. | |
377 | ||
378 | Instead of using these two events, you should instead use the function | |
379 | "trace_<eventname>_tcg" during translation (TCG code generation). This function | |
380 | will automatically call "trace_<eventname>_trans", and will generate the | |
381 | necessary TCG code to call "trace_<eventname>_exec" during guest code execution. | |
382 | ||
383 | Events with the "tcg" property can be declared in the "trace-events" file with a | |
384 | mix of native and TCG types, and "trace_<eventname>_tcg" will gracefully forward | |
385 | them to the "<eventname>_trans" and "<eventname>_exec" events. Since TCG values | |
386 | are not known at translation time, these are ignored by the "<eventname>_trans" | |
387 | event. Because of this, the entry in the "trace-events" file needs two printing | |
388 | formats (separated by a comma): | |
389 | ||
390 | tcg foo(uint8_t a1, TCGv_i32 a2) "a1=%d", "a1=%d a2=%d" | |
391 | ||
392 | For example: | |
393 | ||
394 | #include "trace-tcg.h" | |
395 | ||
396 | void some_disassembly_func (...) | |
397 | { | |
398 | uint8_t a1 = ...; | |
399 | TCGv_i32 a2 = ...; | |
400 | trace_foo_tcg(a1, a2); | |
401 | } | |
402 | ||
403 | This will immediately call: | |
404 | ||
405 | void trace_foo_trans(uint8_t a1); | |
406 | ||
407 | and will generate the TCG code to call: | |
408 | ||
409 | void trace_foo(uint8_t a1, uint32_t a2); | |
3d211d9f LV |
410 | |
411 | === "vcpu" === | |
412 | ||
413 | Identifies events that trace vCPU-specific information. It implicitly adds a | |
414 | "CPUState*" argument, and extends the tracing print format to show the vCPU | |
415 | information. If used together with the "tcg" property, it adds a second | |
416 | "TCGv_env" argument that must point to the per-target global TCG register that | |
417 | points to the vCPU when guest code is executed (usually the "cpu_env" variable). | |
418 | ||
7609ffb9 SH |
419 | The "tcg" and "vcpu" properties are currently only honored in the root |
420 | ./trace-events file. | |
421 | ||
3d211d9f LV |
422 | The following example events: |
423 | ||
424 | foo(uint32_t a) "a=%x" | |
425 | vcpu bar(uint32_t a) "a=%x" | |
426 | tcg vcpu baz(uint32_t a) "a=%x", "a=%x" | |
427 | ||
428 | Can be used as: | |
429 | ||
430 | #include "trace-tcg.h" | |
431 | ||
432 | CPUArchState *env; | |
433 | TCGv_ptr cpu_env; | |
434 | ||
435 | void some_disassembly_func(...) | |
436 | { | |
437 | /* trace emitted at this point */ | |
438 | trace_foo(0xd1); | |
439 | /* trace emitted at this point */ | |
29a0af61 | 440 | trace_bar(env_cpu(env), 0xd2); |
3d211d9f | 441 | /* trace emitted at this point (env) and when guest code is executed (cpu_env) */ |
29a0af61 | 442 | trace_baz_tcg(env_cpu(env), cpu_env, 0xd3); |
3d211d9f LV |
443 | } |
444 | ||
445 | If the translating vCPU has address 0xc1 and code is later executed by vCPU | |
446 | 0xc2, this would be an example output: | |
447 | ||
448 | // at guest code translation | |
449 | foo a=0xd1 | |
450 | bar cpu=0xc1 a=0xd2 | |
451 | baz_trans cpu=0xc1 a=0xd3 | |
452 | // at guest code execution | |
453 | baz_exec cpu=0xc2 a=0xd3 |