]>
Commit | Line | Data |
---|---|---|
f3a0208f AB |
1 | ======== |
2 | Fuzzing | |
3 | ======== | |
4 | ||
5 | This document describes the virtual-device fuzzing infrastructure in QEMU and | |
6 | how to use it to implement additional fuzzers. | |
7 | ||
8 | Basics | |
9 | ------ | |
10 | ||
11 | Fuzzing operates by passing inputs to an entry point/target function. The | |
12 | fuzzer tracks the code coverage triggered by the input. Based on these | |
13 | findings, the fuzzer mutates the input and repeats the fuzzing. | |
14 | ||
15 | To fuzz QEMU, we rely on libfuzzer. Unlike other fuzzers such as AFL, libfuzzer | |
16 | is an *in-process* fuzzer. For the developer, this means that it is their | |
17 | responsibility to ensure that state is reset between fuzzing-runs. | |
18 | ||
19 | Building the fuzzers | |
20 | -------------------- | |
21 | ||
22 | *NOTE*: If possible, build a 32-bit binary. When forking, the 32-bit fuzzer is | |
23 | much faster, since the page-map has a smaller size. This is due to the fact that | |
24 | AddressSanitizer maps ~20TB of memory, as part of its detection. This results | |
25 | in a large page-map, and a much slower ``fork()``. | |
26 | ||
27 | To build the fuzzers, install a recent version of clang: | |
28 | Configure with (substitute the clang binaries with the version you installed). | |
29 | Here, enable-sanitizers, is optional but it allows us to reliably detect bugs | |
30 | such as out-of-bounds accesses, use-after-frees, double-frees etc.:: | |
31 | ||
32 | CC=clang-8 CXX=clang++-8 /path/to/configure --enable-fuzzing \ | |
33 | --enable-sanitizers | |
34 | ||
35 | Fuzz targets are built similarly to system targets:: | |
36 | ||
e6a3e132 | 37 | make qemu-fuzz-i386 |
f3a0208f | 38 | |
e6a3e132 | 39 | This builds ``./qemu-fuzz-i386`` |
f3a0208f AB |
40 | |
41 | The first option to this command is: ``--fuzz-target=FUZZ_NAME`` | |
42 | To list all of the available fuzzers run ``qemu-fuzz-i386`` with no arguments. | |
43 | ||
44 | For example:: | |
45 | ||
e6a3e132 | 46 | ./qemu-fuzz-i386 --fuzz-target=virtio-scsi-fuzz |
f3a0208f AB |
47 | |
48 | Internally, libfuzzer parses all arguments that do not begin with ``"--"``. | |
49 | Information about these is available by passing ``-help=1`` | |
50 | ||
51 | Now the only thing left to do is wait for the fuzzer to trigger potential | |
52 | crashes. | |
53 | ||
54 | Useful libFuzzer flags | |
55 | ---------------------- | |
56 | ||
57 | As mentioned above, libFuzzer accepts some arguments. Passing ``-help=1`` will | |
58 | list the available arguments. In particular, these arguments might be helpful: | |
59 | ||
60 | * ``CORPUS_DIR/`` : Specify a directory as the last argument to libFuzzer. | |
61 | libFuzzer stores each "interesting" input in this corpus directory. The next | |
62 | time you run libFuzzer, it will read all of the inputs from the corpus, and | |
63 | continue fuzzing from there. You can also specify multiple directories. | |
64 | libFuzzer loads existing inputs from all specified directories, but will only | |
65 | write new ones to the first one specified. | |
66 | ||
67 | * ``-max_len=4096`` : specify the maximum byte-length of the inputs libFuzzer | |
68 | will generate. | |
69 | ||
70 | * ``-close_fd_mask={1,2,3}`` : close, stderr, or both. Useful for targets that | |
71 | trigger many debug/error messages, or create output on the serial console. | |
72 | ||
73 | * ``-jobs=4 -workers=4`` : These arguments configure libFuzzer to run 4 fuzzers in | |
74 | parallel (4 fuzzing jobs in 4 worker processes). Alternatively, with only | |
75 | ``-jobs=N``, libFuzzer automatically spawns a number of workers less than or equal | |
76 | to half the available CPU cores. Replace 4 with a number appropriate for your | |
77 | machine. Make sure to specify a ``CORPUS_DIR``, which will allow the parallel | |
78 | fuzzers to share information about the interesting inputs they find. | |
79 | ||
80 | * ``-use_value_profile=1`` : For each comparison operation, libFuzzer computes | |
81 | ``(caller_pc&4095) | (popcnt(Arg1 ^ Arg2) << 12)`` and places this in the | |
82 | coverage table. Useful for targets with "magic" constants. If Arg1 came from | |
83 | the fuzzer's input and Arg2 is a magic constant, then each time the Hamming | |
84 | distance between Arg1 and Arg2 decreases, libFuzzer adds the input to the | |
85 | corpus. | |
86 | ||
87 | * ``-shrink=1`` : Tries to make elements of the corpus "smaller". Might lead to | |
88 | better coverage performance, depending on the target. | |
89 | ||
90 | Note that libFuzzer's exact behavior will depend on the version of | |
91 | clang and libFuzzer used to build the device fuzzers. | |
92 | ||
93 | Generating Coverage Reports | |
94 | --------------------------- | |
95 | ||
96 | Code coverage is a crucial metric for evaluating a fuzzer's performance. | |
97 | libFuzzer's output provides a "cov: " column that provides a total number of | |
98 | unique blocks/edges covered. To examine coverage on a line-by-line basis we | |
99 | can use Clang coverage: | |
100 | ||
101 | 1. Configure libFuzzer to store a corpus of all interesting inputs (see | |
102 | CORPUS_DIR above) | |
103 | 2. ``./configure`` the QEMU build with :: | |
104 | ||
105 | --enable-fuzzing \ | |
106 | --extra-cflags="-fprofile-instr-generate -fcoverage-mapping" | |
107 | ||
108 | 3. Re-run the fuzzer. Specify $CORPUS_DIR/* as an argument, telling libfuzzer | |
109 | to execute all of the inputs in $CORPUS_DIR and exit. Once the process | |
110 | exits, you should find a file, "default.profraw" in the working directory. | |
111 | 4. Execute these commands to generate a detailed HTML coverage-report:: | |
112 | ||
113 | llvm-profdata merge -output=default.profdata default.profraw | |
114 | llvm-cov show ./path/to/qemu-fuzz-i386 -instr-profile=default.profdata \ | |
115 | --format html -output-dir=/path/to/output/report | |
116 | ||
117 | Adding a new fuzzer | |
118 | ------------------- | |
119 | ||
120 | Coverage over virtual devices can be improved by adding additional fuzzers. | |
121 | Fuzzers are kept in ``tests/qtest/fuzz/`` and should be added to | |
92381157 | 122 | ``tests/qtest/fuzz/meson.build`` |
f3a0208f AB |
123 | |
124 | Fuzzers can rely on both qtest and libqos to communicate with virtual devices. | |
125 | ||
126 | 1. Create a new source file. For example ``tests/qtest/fuzz/foo-device-fuzz.c``. | |
127 | ||
128 | 2. Write the fuzzing code using the libqtest/libqos API. See existing fuzzers | |
129 | for reference. | |
130 | ||
92381157 | 131 | 3. Add the fuzzer to ``tests/qtest/fuzz/meson.build``. |
f3a0208f AB |
132 | |
133 | Fuzzers can be more-or-less thought of as special qtest programs which can | |
134 | modify the qtest commands and/or qtest command arguments based on inputs | |
135 | provided by libfuzzer. Libfuzzer passes a byte array and length. Commonly the | |
136 | fuzzer loops over the byte-array interpreting it as a list of qtest commands, | |
137 | addresses, or values. | |
138 | ||
139 | The Generic Fuzzer | |
140 | ------------------ | |
141 | ||
142 | Writing a fuzz target can be a lot of effort (especially if a device driver has | |
143 | not be built-out within libqos). Many devices can be fuzzed to some degree, | |
144 | without any device-specific code, using the generic-fuzz target. | |
145 | ||
146 | The generic-fuzz target is capable of fuzzing devices over their PIO, MMIO, | |
147 | and DMA input-spaces. To apply the generic-fuzz to a device, we need to define | |
148 | two env-variables, at minimum: | |
149 | ||
150 | * ``QEMU_FUZZ_ARGS=`` is the set of QEMU arguments used to configure a machine, with | |
151 | the device attached. For example, if we want to fuzz the virtio-net device | |
152 | attached to a pc-i440fx machine, we can specify:: | |
153 | ||
154 | QEMU_FUZZ_ARGS="-M pc -nodefaults -netdev user,id=user0 \ | |
155 | -device virtio-net,netdev=user0" | |
156 | ||
157 | * ``QEMU_FUZZ_OBJECTS=`` is a set of space-delimited strings used to identify | |
158 | the MemoryRegions that will be fuzzed. These strings are compared against | |
159 | MemoryRegion names and MemoryRegion owner names, to decide whether each | |
160 | MemoryRegion should be fuzzed. These strings support globbing. For the | |
161 | virtio-net example, we could use one of :: | |
162 | ||
163 | QEMU_FUZZ_OBJECTS='virtio-net' | |
164 | QEMU_FUZZ_OBJECTS='virtio*' | |
165 | QEMU_FUZZ_OBJECTS='virtio* pcspk' # Fuzz the virtio devices and the speaker | |
166 | QEMU_FUZZ_OBJECTS='*' # Fuzz the whole machine`` | |
167 | ||
168 | The ``"info mtree"`` and ``"info qom-tree"`` monitor commands can be especially | |
169 | useful for identifying the ``MemoryRegion`` and ``Object`` names used for | |
170 | matching. | |
171 | ||
172 | As a generic rule-of-thumb, the more ``MemoryRegions``/Devices we match, the | |
173 | greater the input-space, and the smaller the probability of finding crashing | |
174 | inputs for individual devices. As such, it is usually a good idea to limit the | |
175 | fuzzer to only a few ``MemoryRegions``. | |
176 | ||
177 | To ensure that these env variables have been configured correctly, we can use:: | |
178 | ||
179 | ./qemu-fuzz-i386 --fuzz-target=generic-fuzz -runs=0 | |
180 | ||
181 | The output should contain a complete list of matched MemoryRegions. | |
182 | ||
3ca45fb4 AB |
183 | OSS-Fuzz |
184 | -------- | |
450e0f28 JS |
185 | QEMU is continuously fuzzed on `OSS-Fuzz |
186 | <https://github.com/google/oss-fuzz>`_. By default, the OSS-Fuzz build | |
187 | will try to fuzz every fuzz-target. Since the generic-fuzz target | |
188 | requires additional information provided in environment variables, we | |
189 | pre-define some generic-fuzz configs in | |
3ca45fb4 AB |
190 | ``tests/qtest/fuzz/generic_fuzz_configs.h``. Each config must specify: |
191 | ||
192 | - ``.name``: To identify the fuzzer config | |
193 | ||
194 | - ``.args`` OR ``.argfunc``: A string or pointer to a function returning a | |
195 | string. These strings are used to specify the ``QEMU_FUZZ_ARGS`` | |
196 | environment variable. ``argfunc`` is useful when the config relies on e.g. | |
197 | a dynamically created temp directory, or a free tcp/udp port. | |
198 | ||
199 | - ``.objects``: A string that specifies the ``QEMU_FUZZ_OBJECTS`` environment | |
200 | variable. | |
201 | ||
202 | To fuzz additional devices/device configuration on OSS-Fuzz, send patches for | |
203 | either a new device-specific fuzzer or a new generic-fuzz config. | |
204 | ||
205 | Build details: | |
206 | ||
207 | - The Dockerfile that sets up the environment for building QEMU's | |
208 | fuzzers on OSS-Fuzz can be fund in the OSS-Fuzz repository | |
209 | __(https://github.com/google/oss-fuzz/blob/master/projects/qemu/Dockerfile) | |
210 | ||
211 | - The script responsible for building the fuzzers can be found in the | |
212 | QEMU source tree at ``scripts/oss-fuzz/build.sh`` | |
213 | ||
56f8f888 AB |
214 | Building Crash Reproducers |
215 | ----------------------------------------- | |
216 | When we find a crash, we should try to create an independent reproducer, that | |
217 | can be used on a non-fuzzer build of QEMU. This filters out any potential | |
218 | false-positives, and improves the debugging experience for developers. | |
219 | Here are the steps for building a reproducer for a crash found by the | |
220 | generic-fuzz target. | |
221 | ||
222 | - Ensure the crash reproduces:: | |
223 | ||
224 | qemu-fuzz-i386 --fuzz-target... ./crash-... | |
225 | ||
226 | - Gather the QTest output for the crash:: | |
227 | ||
228 | QEMU_FUZZ_TIMEOUT=0 QTEST_LOG=1 FUZZ_SERIALIZE_QTEST=1 \ | |
229 | qemu-fuzz-i386 --fuzz-target... ./crash-... &> /tmp/trace | |
230 | ||
231 | - Reorder and clean-up the resulting trace:: | |
232 | ||
233 | scripts/oss-fuzz/reorder_fuzzer_qtest_trace.py /tmp/trace > /tmp/reproducer | |
234 | ||
235 | - Get the arguments needed to start qemu, and provide a path to qemu:: | |
236 | ||
237 | less /tmp/trace # The args should be logged at the top of this file | |
238 | export QEMU_ARGS="-machine ..." | |
239 | export QEMU_PATH="path/to/qemu-system" | |
240 | ||
241 | - Ensure the crash reproduces in qemu-system:: | |
242 | ||
243 | $QEMU_PATH $QEMU_ARGS -qtest stdio < /tmp/reproducer | |
244 | ||
245 | - From the crash output, obtain some string that identifies the crash. This | |
246 | can be a line in the stack-trace, for example:: | |
247 | ||
248 | export CRASH_TOKEN="hw/usb/hcd-xhci.c:1865" | |
249 | ||
250 | - Minimize the reproducer:: | |
251 | ||
252 | scripts/oss-fuzz/minimize_qtest_trace.py -M1 -M2 \ | |
253 | /tmp/reproducer /tmp/reproducer-minimized | |
254 | ||
255 | - Confirm that the minimized reproducer still crashes:: | |
256 | ||
257 | $QEMU_PATH $QEMU_ARGS -qtest stdio < /tmp/reproducer-minimized | |
258 | ||
259 | - Create a one-liner reproducer that can be sent over email:: | |
260 | ||
261 | ./scripts/oss-fuzz/output_reproducer.py -bash /tmp/reproducer-minimized | |
262 | ||
263 | - Output the C source code for a test case that will reproduce the bug:: | |
264 | ||
265 | ./scripts/oss-fuzz/output_reproducer.py -owner "John Smith <john@smith.com>"\ | |
266 | -name "test_function_name" /tmp/reproducer-minimized | |
267 | ||
268 | - Report the bug and send a patch with the C reproducer upstream | |
269 | ||
f3a0208f AB |
270 | Implementation Details / Fuzzer Lifecycle |
271 | ----------------------------------------- | |
272 | ||
273 | The fuzzer has two entrypoints that libfuzzer calls. libfuzzer provides it's | |
274 | own ``main()``, which performs some setup, and calls the entrypoints: | |
275 | ||
276 | ``LLVMFuzzerInitialize``: called prior to fuzzing. Used to initialize all of the | |
277 | necessary state | |
278 | ||
279 | ``LLVMFuzzerTestOneInput``: called for each fuzzing run. Processes the input and | |
280 | resets the state at the end of each run. | |
281 | ||
282 | In more detail: | |
283 | ||
284 | ``LLVMFuzzerInitialize`` parses the arguments to the fuzzer (must start with two | |
285 | dashes, so they are ignored by libfuzzer ``main()``). Currently, the arguments | |
286 | select the fuzz target. Then, the qtest client is initialized. If the target | |
287 | requires qos, qgraph is set up and the QOM/LIBQOS modules are initialized. | |
288 | Then the QGraph is walked and the QEMU cmd_line is determined and saved. | |
289 | ||
bab6a301 AO |
290 | After this, the ``vl.c:main`` is called to set up the guest. There are |
291 | target-specific hooks that can be called before and after main, for | |
f3a0208f AB |
292 | additional setup(e.g. PCI setup, or VM snapshotting). |
293 | ||
294 | ``LLVMFuzzerTestOneInput``: Uses qtest/qos functions to act based on the fuzz | |
295 | input. It is also responsible for manually calling ``main_loop_wait`` to ensure | |
296 | that bottom halves are executed and any cleanup required before the next input. | |
297 | ||
298 | Since the same process is reused for many fuzzing runs, QEMU state needs to | |
299 | be reset at the end of each run. There are currently two implemented | |
300 | options for resetting state: | |
301 | ||
302 | - Reboot the guest between runs. | |
303 | - *Pros*: Straightforward and fast for simple fuzz targets. | |
304 | ||
305 | - *Cons*: Depending on the device, does not reset all device state. If the | |
306 | device requires some initialization prior to being ready for fuzzing (common | |
307 | for QOS-based targets), this initialization needs to be done after each | |
308 | reboot. | |
309 | ||
310 | - *Example target*: ``i440fx-qtest-reboot-fuzz`` | |
311 | ||
312 | - Run each test case in a separate forked process and copy the coverage | |
313 | information back to the parent. This is fairly similar to AFL's "deferred" | |
314 | fork-server mode [3] | |
315 | ||
316 | - *Pros*: Relatively fast. Devices only need to be initialized once. No need to | |
317 | do slow reboots or vmloads. | |
318 | ||
319 | - *Cons*: Not officially supported by libfuzzer. Does not work well for | |
320 | devices that rely on dedicated threads. | |
321 | ||
322 | - *Example target*: ``virtio-net-fork-fuzz`` |