]> git.proxmox.com Git - mirror_qemu.git/blob - docs/lockcnt.txt
qemu-thread: introduce QemuLockCnt
[mirror_qemu.git] / docs / lockcnt.txt
1 DOCUMENTATION FOR LOCKED COUNTERS (aka QemuLockCnt)
2 ===================================================
3
4 QEMU often uses reference counts to track data structures that are being
5 accessed and should not be freed. For example, a loop that invoke
6 callbacks like this is not safe:
7
8 QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
9 if (ioh->revents & G_IO_OUT) {
10 ioh->fd_write(ioh->opaque);
11 }
12 }
13
14 QLIST_FOREACH_SAFE protects against deletion of the current node (ioh)
15 by stashing away its "next" pointer. However, ioh->fd_write could
16 actually delete the next node from the list. The simplest way to
17 avoid this is to mark the node as deleted, and remove it from the
18 list in the above loop:
19
20 QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
21 if (ioh->deleted) {
22 QLIST_REMOVE(ioh, next);
23 g_free(ioh);
24 } else {
25 if (ioh->revents & G_IO_OUT) {
26 ioh->fd_write(ioh->opaque);
27 }
28 }
29 }
30
31 If however this loop must also be reentrant, i.e. it is possible that
32 ioh->fd_write invokes the loop again, some kind of counting is needed:
33
34 walking_handlers++;
35 QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
36 if (ioh->deleted) {
37 if (walking_handlers == 1) {
38 QLIST_REMOVE(ioh, next);
39 g_free(ioh);
40 }
41 } else {
42 if (ioh->revents & G_IO_OUT) {
43 ioh->fd_write(ioh->opaque);
44 }
45 }
46 }
47 walking_handlers--;
48
49 One may think of using the RCU primitives, rcu_read_lock() and
50 rcu_read_unlock(); effectively, the RCU nesting count would take
51 the place of the walking_handlers global variable. Indeed,
52 reference counting and RCU have similar purposes, but their usage in
53 general is complementary:
54
55 - reference counting is fine-grained and limited to a single data
56 structure; RCU delays reclamation of *all* RCU-protected data
57 structures;
58
59 - reference counting works even in the presence of code that keeps
60 a reference for a long time; RCU critical sections in principle
61 should be kept short;
62
63 - reference counting is often applied to code that is not thread-safe
64 but is reentrant; in fact, usage of reference counting in QEMU predates
65 the introduction of threads by many years. RCU is generally used to
66 protect readers from other threads freeing memory after concurrent
67 modifications to a data structure.
68
69 - reclaiming data can be done by a separate thread in the case of RCU;
70 this can improve performance, but also delay reclamation undesirably.
71 With reference counting, reclamation is deterministic.
72
73 This file documents QemuLockCnt, an abstraction for using reference
74 counting in code that has to be both thread-safe and reentrant.
75
76
77 QemuLockCnt concepts
78 --------------------
79
80 A QemuLockCnt comprises both a counter and a mutex; it has primitives
81 to increment and decrement the counter, and to take and release the
82 mutex. The counter notes how many visits to the data structures are
83 taking place (the visits could be from different threads, or there could
84 be multiple reentrant visits from the same thread). The basic rules
85 governing the counter/mutex pair then are the following:
86
87 - Data protected by the QemuLockCnt must not be freed unless the
88 counter is zero and the mutex is taken.
89
90 - A new visit cannot be started while the counter is zero and the
91 mutex is taken.
92
93 Most of the time, the mutex protects all writes to the data structure,
94 not just frees, though there could be cases where this is not necessary.
95
96 Reads, instead, can be done without taking the mutex, as long as the
97 readers and writers use the same macros that are used for RCU, for
98 example atomic_rcu_read, atomic_rcu_set, QLIST_FOREACH_RCU, etc. This is
99 because the reads are done outside a lock and a set or QLIST_INSERT_HEAD
100 can happen concurrently with the read. The RCU API ensures that the
101 processor and the compiler see all required memory barriers.
102
103 This could be implemented simply by protecting the counter with the
104 mutex, for example:
105
106 // (1)
107 qemu_mutex_lock(&walking_handlers_mutex);
108 walking_handlers++;
109 qemu_mutex_unlock(&walking_handlers_mutex);
110
111 ...
112
113 // (2)
114 qemu_mutex_lock(&walking_handlers_mutex);
115 if (--walking_handlers == 0) {
116 QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
117 if (ioh->deleted) {
118 QLIST_REMOVE(ioh, next);
119 g_free(ioh);
120 }
121 }
122 }
123 qemu_mutex_unlock(&walking_handlers_mutex);
124
125 Here, no frees can happen in the code represented by the ellipsis.
126 If another thread is executing critical section (2), that part of
127 the code cannot be entered, because the thread will not be able
128 to increment the walking_handlers variable. And of course
129 during the visit any other thread will see a nonzero value for
130 walking_handlers, as in the single-threaded code.
131
132 Note that it is possible for multiple concurrent accesses to delay
133 the cleanup arbitrarily; in other words, for the walking_handlers
134 counter to never become zero. For this reason, this technique is
135 more easily applicable if concurrent access to the structure is rare.
136
137 However, critical sections are easy to forget since you have to do
138 them for each modification of the counter. QemuLockCnt ensures that
139 all modifications of the counter take the lock appropriately, and it
140 can also be more efficient in two ways:
141
142 - it avoids taking the lock for many operations (for example
143 incrementing the counter while it is non-zero);
144
145 - on some platforms, one could implement QemuLockCnt to hold the
146 lock and the mutex in a single word, making it no more expensive
147 than simply managing a counter using atomic operations (see
148 docs/atomics.txt). This is not implemented yet, but can be
149 very helpful if concurrent access to the data structure is
150 expected to be rare.
151
152
153 Using the same mutex for frees and writes can still incur some small
154 inefficiencies; for example, a visit can never start if the counter is
155 zero and the mutex is taken---even if the mutex is taken by a write,
156 which in principle need not block a visit of the data structure.
157 However, these are usually not a problem if any of the following
158 assumptions are valid:
159
160 - concurrent access is possible but rare
161
162 - writes are rare
163
164 - writes are frequent, but this kind of write (e.g. appending to a
165 list) has a very small critical section.
166
167 For example, QEMU uses QemuLockCnt to manage an AioContext's list of
168 bottom halves and file descriptor handlers. Modifications to the list
169 of file descriptor handlers are rare. Creation of a new bottom half is
170 frequent and can happen on a fast path; however: 1) it is almost never
171 concurrent with a visit to the list of bottom halves; 2) it only has
172 three instructions in the critical path, two assignments and a smp_wmb().
173
174
175 QemuLockCnt API
176 ---------------
177
178 The QemuLockCnt API is described in include/qemu/thread.h.
179
180
181 QemuLockCnt usage
182 -----------------
183
184 This section explains the typical usage patterns for QemuLockCnt functions.
185
186 Setting a variable to a non-NULL value can be done between
187 qemu_lockcnt_lock and qemu_lockcnt_unlock:
188
189 qemu_lockcnt_lock(&xyz_lockcnt);
190 if (!xyz) {
191 new_xyz = g_new(XYZ, 1);
192 ...
193 atomic_rcu_set(&xyz, new_xyz);
194 }
195 qemu_lockcnt_unlock(&xyz_lockcnt);
196
197 Accessing the value can be done between qemu_lockcnt_inc and
198 qemu_lockcnt_dec:
199
200 qemu_lockcnt_inc(&xyz_lockcnt);
201 if (xyz) {
202 XYZ *p = atomic_rcu_read(&xyz);
203 ...
204 /* Accesses can now be done through "p". */
205 }
206 qemu_lockcnt_dec(&xyz_lockcnt);
207
208 Freeing the object can similarly use qemu_lockcnt_lock and
209 qemu_lockcnt_unlock, but you also need to ensure that the count
210 is zero (i.e. there is no concurrent visit). Because qemu_lockcnt_inc
211 takes the QemuLockCnt's lock, the count cannot become non-zero while
212 the object is being freed. Freeing an object looks like this:
213
214 qemu_lockcnt_lock(&xyz_lockcnt);
215 if (!qemu_lockcnt_count(&xyz_lockcnt)) {
216 g_free(xyz);
217 xyz = NULL;
218 }
219 qemu_lockcnt_unlock(&xyz_lockcnt);
220
221 If an object has to be freed right after a visit, you can combine
222 the decrement, the locking and the check on count as follows:
223
224 qemu_lockcnt_inc(&xyz_lockcnt);
225 if (xyz) {
226 XYZ *p = atomic_rcu_read(&xyz);
227 ...
228 /* Accesses can now be done through "p". */
229 }
230 if (qemu_lockcnt_dec_and_lock(&xyz_lockcnt)) {
231 g_free(xyz);
232 xyz = NULL;
233 qemu_lockcnt_unlock(&xyz_lockcnt);
234 }
235
236 QemuLockCnt can also be used to access a list as follows:
237
238 qemu_lockcnt_inc(&io_handlers_lockcnt);
239 QLIST_FOREACH_RCU(ioh, &io_handlers, pioh) {
240 if (ioh->revents & G_IO_OUT) {
241 ioh->fd_write(ioh->opaque);
242 }
243 }
244
245 if (qemu_lockcnt_dec_and_lock(&io_handlers_lockcnt)) {
246 QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
247 if (ioh->deleted) {
248 QLIST_REMOVE(ioh, next);
249 g_free(ioh);
250 }
251 }
252 qemu_lockcnt_unlock(&io_handlers_lockcnt);
253 }
254
255 Again, the RCU primitives are used because new items can be added to the
256 list during the walk. QLIST_FOREACH_RCU ensures that the processor and
257 the compiler see the appropriate memory barriers.
258
259 An alternative pattern uses qemu_lockcnt_dec_if_lock:
260
261 qemu_lockcnt_inc(&io_handlers_lockcnt);
262 QLIST_FOREACH_SAFE_RCU(ioh, &io_handlers, next, pioh) {
263 if (ioh->deleted) {
264 if (qemu_lockcnt_dec_if_lock(&io_handlers_lockcnt)) {
265 QLIST_REMOVE(ioh, next);
266 g_free(ioh);
267 qemu_lockcnt_inc_and_unlock(&io_handlers_lockcnt);
268 }
269 } else {
270 if (ioh->revents & G_IO_OUT) {
271 ioh->fd_write(ioh->opaque);
272 }
273 }
274 }
275 qemu_lockcnt_dec(&io_handlers_lockcnt);
276
277 Here you can use qemu_lockcnt_dec instead of qemu_lockcnt_dec_and_lock,
278 because there is no special task to do if the count goes from 1 to 0.