]> git.proxmox.com Git - mirror_qemu.git/blob - docs/devel/multiple-iothreads.txt
hw/arm: Build various units only once
[mirror_qemu.git] / docs / devel / multiple-iothreads.txt
1 Copyright (c) 2014-2017 Red Hat Inc.
2
3 This work is licensed under the terms of the GNU GPL, version 2 or later. See
4 the COPYING file in the top-level directory.
5
6
7 This document explains the IOThread feature and how to write code that runs
8 outside the BQL.
9
10 The main loop and IOThreads
11 ---------------------------
12 QEMU is an event-driven program that can do several things at once using an
13 event loop. The VNC server and the QMP monitor are both processed from the
14 same event loop, which monitors their file descriptors until they become
15 readable and then invokes a callback.
16
17 The default event loop is called the main loop (see main-loop.c). It is
18 possible to create additional event loop threads using -object
19 iothread,id=my-iothread.
20
21 Side note: The main loop and IOThread are both event loops but their code is
22 not shared completely. Sometimes it is useful to remember that although they
23 are conceptually similar they are currently not interchangeable.
24
25 Why IOThreads are useful
26 ------------------------
27 IOThreads allow the user to control the placement of work. The main loop is a
28 scalability bottleneck on hosts with many CPUs. Work can be spread across
29 several IOThreads instead of just one main loop. When set up correctly this
30 can improve I/O latency and reduce jitter seen by the guest.
31
32 The main loop is also deeply associated with the BQL, which is a
33 scalability bottleneck in itself. vCPU threads and the main loop use the BQL
34 to serialize execution of QEMU code. This mutex is necessary because a lot of
35 QEMU's code historically was not thread-safe.
36
37 The fact that all I/O processing is done in a single main loop and that the
38 BQL is contended by all vCPU threads and the main loop explain
39 why it is desirable to place work into IOThreads.
40
41 The experimental virtio-blk data-plane implementation has been benchmarked and
42 shows these effects:
43 ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
44
45 How to program for IOThreads
46 ----------------------------
47 The main difference between legacy code and new code that can run in an
48 IOThread is dealing explicitly with the event loop object, AioContext
49 (see include/block/aio.h). Code that only works in the main loop
50 implicitly uses the main loop's AioContext. Code that supports running
51 in IOThreads must be aware of its AioContext.
52
53 AioContext supports the following services:
54 * File descriptor monitoring (read/write/error on POSIX hosts)
55 * Event notifiers (inter-thread signalling)
56 * Timers
57 * Bottom Halves (BH) deferred callbacks
58
59 There are several old APIs that use the main loop AioContext:
60 * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
61 * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
62 * LEGACY timer_new_ms() - create a timer
63 * LEGACY qemu_bh_new() - create a BH
64 * LEGACY qemu_bh_new_guarded() - create a BH with a device re-entrancy guard
65 * LEGACY qemu_aio_wait() - run an event loop iteration
66
67 Since they implicitly work on the main loop they cannot be used in code that
68 runs in an IOThread. They might cause a crash or deadlock if called from an
69 IOThread since the BQL is not held.
70
71 Instead, use the AioContext functions directly (see include/block/aio.h):
72 * aio_set_fd_handler() - monitor a file descriptor
73 * aio_set_event_notifier() - monitor an event notifier
74 * aio_timer_new() - create a timer
75 * aio_bh_new() - create a BH
76 * aio_bh_new_guarded() - create a BH with a device re-entrancy guard
77 * aio_poll() - run an event loop iteration
78
79 The qemu_bh_new_guarded/aio_bh_new_guarded APIs accept a "MemReentrancyGuard"
80 argument, which is used to check for and prevent re-entrancy problems. For
81 BHs associated with devices, the reentrancy-guard is contained in the
82 corresponding DeviceState and named "mem_reentrancy_guard".
83
84 The AioContext can be obtained from the IOThread using
85 iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
86 Code that takes an AioContext argument works both in IOThreads or the main
87 loop, depending on which AioContext instance the caller passes in.
88
89 How to synchronize with an IOThread
90 -----------------------------------
91 Variables that can be accessed by multiple threads require some form of
92 synchronization such as qemu_mutex_lock(), rcu_read_lock(), etc.
93
94 AioContext functions like aio_set_fd_handler(), aio_set_event_notifier(),
95 aio_bh_new(), and aio_timer_new() are thread-safe. They can be used to trigger
96 activity in an IOThread.
97
98 Side note: the best way to schedule a function call across threads is to call
99 aio_bh_schedule_oneshot().
100
101 The main loop thread can wait synchronously for a condition using
102 AIO_WAIT_WHILE().
103
104 AioContext and the block layer
105 ------------------------------
106 The AioContext originates from the QEMU block layer, even though nowadays
107 AioContext is a generic event loop that can be used by any QEMU subsystem.
108
109 The block layer has support for AioContext integrated. Each BlockDriverState
110 is associated with an AioContext using bdrv_try_change_aio_context() and
111 bdrv_get_aio_context(). This allows block layer code to process I/O inside the
112 right AioContext. Other subsystems may wish to follow a similar approach.
113
114 Block layer code must therefore expect to run in an IOThread and avoid using
115 old APIs that implicitly use the main loop. See the "How to program for
116 IOThreads" above for information on how to do that.
117
118 Code running in the monitor typically needs to ensure that past
119 requests from the guest are completed. When a block device is running
120 in an IOThread, the IOThread can also process requests from the guest
121 (via ioeventfd). To achieve both objects, wrap the code between
122 bdrv_drained_begin() and bdrv_drained_end(), thus creating a "drained
123 section".
124
125 Long-running jobs (usually in the form of coroutines) are often scheduled in
126 the BlockDriverState's AioContext. The functions
127 bdrv_add/remove_aio_context_notifier, or alternatively
128 blk_add/remove_aio_context_notifier if you use BlockBackends, can be used to
129 get a notification whenever bdrv_try_change_aio_context() moves a
130 BlockDriverState to a different AioContext.