]>
Commit | Line | Data |
---|---|---|
ebb477cb PM |
1 | CONTROL DEPENDENCIES |
2 | ==================== | |
3 | ||
4 | A major difficulty with control dependencies is that current compilers | |
5 | do not support them. One purpose of this document is therefore to | |
6 | help you prevent your compiler from breaking your code. However, | |
7 | control dependencies also pose other challenges, which leads to the | |
8 | second purpose of this document, namely to help you to avoid breaking | |
9 | your own code, even in the absence of help from your compiler. | |
10 | ||
11 | One such challenge is that control dependencies order only later stores. | |
12 | Therefore, a load-load control dependency will not preserve ordering | |
13 | unless a read memory barrier is provided. Consider the following code: | |
14 | ||
15 | q = READ_ONCE(a); | |
16 | if (q) | |
17 | p = READ_ONCE(b); | |
18 | ||
19 | This is not guaranteed to provide any ordering because some types of CPUs | |
20 | are permitted to predict the result of the load from "b". This prediction | |
21 | can cause other CPUs to see this load as having happened before the load | |
22 | from "a". This means that an explicit read barrier is required, for example | |
23 | as follows: | |
24 | ||
25 | q = READ_ONCE(a); | |
26 | if (q) { | |
27 | smp_rmb(); | |
28 | p = READ_ONCE(b); | |
29 | } | |
30 | ||
31 | However, stores are not speculated. This means that ordering is | |
32 | (usually) guaranteed for load-store control dependencies, as in the | |
33 | following example: | |
34 | ||
35 | q = READ_ONCE(a); | |
36 | if (q) | |
37 | WRITE_ONCE(b, 1); | |
38 | ||
39 | Control dependencies can pair with each other and with other types | |
40 | of ordering. But please note that neither the READ_ONCE() nor the | |
41 | WRITE_ONCE() are optional. Without the READ_ONCE(), the compiler might | |
42 | fuse the load from "a" with other loads. Without the WRITE_ONCE(), | |
43 | the compiler might fuse the store to "b" with other stores. Worse yet, | |
44 | the compiler might convert the store into a load and a check followed | |
45 | by a store, and this compiler-generated load would not be ordered by | |
46 | the control dependency. | |
47 | ||
48 | Furthermore, if the compiler is able to prove that the value of variable | |
49 | "a" is always non-zero, it would be well within its rights to optimize | |
50 | the original example by eliminating the "if" statement as follows: | |
51 | ||
52 | q = a; | |
53 | b = 1; /* BUG: Compiler and CPU can both reorder!!! */ | |
54 | ||
55 | So don't leave out either the READ_ONCE() or the WRITE_ONCE(). | |
56 | In particular, although READ_ONCE() does force the compiler to emit a | |
57 | load, it does *not* force the compiler to actually use the loaded value. | |
58 | ||
59 | It is tempting to try use control dependencies to enforce ordering on | |
60 | identical stores on both branches of the "if" statement as follows: | |
61 | ||
62 | q = READ_ONCE(a); | |
63 | if (q) { | |
64 | barrier(); | |
65 | WRITE_ONCE(b, 1); | |
66 | do_something(); | |
67 | } else { | |
68 | barrier(); | |
69 | WRITE_ONCE(b, 1); | |
70 | do_something_else(); | |
71 | } | |
72 | ||
73 | Unfortunately, current compilers will transform this as follows at high | |
74 | optimization levels: | |
75 | ||
76 | q = READ_ONCE(a); | |
77 | barrier(); | |
78 | WRITE_ONCE(b, 1); /* BUG: No ordering vs. load from a!!! */ | |
79 | if (q) { | |
80 | /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */ | |
81 | do_something(); | |
82 | } else { | |
83 | /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */ | |
84 | do_something_else(); | |
85 | } | |
86 | ||
87 | Now there is no conditional between the load from "a" and the store to | |
88 | "b", which means that the CPU is within its rights to reorder them: The | |
89 | conditional is absolutely required, and must be present in the final | |
90 | assembly code, after all of the compiler and link-time optimizations | |
91 | have been applied. Therefore, if you need ordering in this example, | |
92 | you must use explicit memory ordering, for example, smp_store_release(): | |
93 | ||
94 | q = READ_ONCE(a); | |
95 | if (q) { | |
96 | smp_store_release(&b, 1); | |
97 | do_something(); | |
98 | } else { | |
99 | smp_store_release(&b, 1); | |
100 | do_something_else(); | |
101 | } | |
102 | ||
103 | Without explicit memory ordering, control-dependency-based ordering is | |
104 | guaranteed only when the stores differ, for example: | |
105 | ||
106 | q = READ_ONCE(a); | |
107 | if (q) { | |
108 | WRITE_ONCE(b, 1); | |
109 | do_something(); | |
110 | } else { | |
111 | WRITE_ONCE(b, 2); | |
112 | do_something_else(); | |
113 | } | |
114 | ||
115 | The initial READ_ONCE() is still required to prevent the compiler from | |
116 | knowing too much about the value of "a". | |
117 | ||
118 | But please note that you need to be careful what you do with the local | |
119 | variable "q", otherwise the compiler might be able to guess the value | |
120 | and again remove the conditional branch that is absolutely required to | |
121 | preserve ordering. For example: | |
122 | ||
123 | q = READ_ONCE(a); | |
124 | if (q % MAX) { | |
125 | WRITE_ONCE(b, 1); | |
126 | do_something(); | |
127 | } else { | |
128 | WRITE_ONCE(b, 2); | |
129 | do_something_else(); | |
130 | } | |
131 | ||
132 | If MAX is compile-time defined to be 1, then the compiler knows that | |
133 | (q % MAX) must be equal to zero, regardless of the value of "q". | |
134 | The compiler is therefore within its rights to transform the above code | |
135 | into the following: | |
136 | ||
137 | q = READ_ONCE(a); | |
138 | WRITE_ONCE(b, 2); | |
139 | do_something_else(); | |
140 | ||
141 | Given this transformation, the CPU is not required to respect the ordering | |
142 | between the load from variable "a" and the store to variable "b". It is | |
143 | tempting to add a barrier(), but this does not help. The conditional | |
144 | is gone, and the barrier won't bring it back. Therefore, if you need | |
145 | to relying on control dependencies to produce this ordering, you should | |
146 | make sure that MAX is greater than one, perhaps as follows: | |
147 | ||
148 | q = READ_ONCE(a); | |
149 | BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */ | |
150 | if (q % MAX) { | |
151 | WRITE_ONCE(b, 1); | |
152 | do_something(); | |
153 | } else { | |
154 | WRITE_ONCE(b, 2); | |
155 | do_something_else(); | |
156 | } | |
157 | ||
158 | Please note once again that each leg of the "if" statement absolutely | |
159 | must store different values to "b". As in previous examples, if the two | |
160 | values were identical, the compiler could pull this store outside of the | |
161 | "if" statement, destroying the control dependency's ordering properties. | |
162 | ||
163 | You must also be careful avoid relying too much on boolean short-circuit | |
164 | evaluation. Consider this example: | |
165 | ||
166 | q = READ_ONCE(a); | |
167 | if (q || 1 > 0) | |
168 | WRITE_ONCE(b, 1); | |
169 | ||
170 | Because the first condition cannot fault and the second condition is | |
171 | always true, the compiler can transform this example as follows, again | |
172 | destroying the control dependency's ordering: | |
173 | ||
174 | q = READ_ONCE(a); | |
175 | WRITE_ONCE(b, 1); | |
176 | ||
177 | This is yet another example showing the importance of preventing the | |
178 | compiler from out-guessing your code. Again, although READ_ONCE() really | |
179 | does force the compiler to emit code for a given load, the compiler is | |
180 | within its rights to discard the loaded value. | |
181 | ||
182 | In addition, control dependencies apply only to the then-clause and | |
183 | else-clause of the "if" statement in question. In particular, they do | |
184 | not necessarily order the code following the entire "if" statement: | |
185 | ||
186 | q = READ_ONCE(a); | |
187 | if (q) { | |
188 | WRITE_ONCE(b, 1); | |
189 | } else { | |
190 | WRITE_ONCE(b, 2); | |
191 | } | |
192 | WRITE_ONCE(c, 1); /* BUG: No ordering against the read from "a". */ | |
193 | ||
194 | It is tempting to argue that there in fact is ordering because the | |
195 | compiler cannot reorder volatile accesses and also cannot reorder | |
196 | the writes to "b" with the condition. Unfortunately for this line | |
197 | of reasoning, the compiler might compile the two writes to "b" as | |
198 | conditional-move instructions, as in this fanciful pseudo-assembly | |
199 | language: | |
200 | ||
201 | ld r1,a | |
202 | cmp r1,$0 | |
203 | cmov,ne r4,$1 | |
204 | cmov,eq r4,$2 | |
205 | st r4,b | |
206 | st $1,c | |
207 | ||
208 | The control dependencies would then extend only to the pair of cmov | |
209 | instructions and the store depending on them. This means that a weakly | |
210 | ordered CPU would have no dependency of any sort between the load from | |
211 | "a" and the store to "c". In short, control dependencies provide ordering | |
212 | only to the stores in the then-clause and else-clause of the "if" statement | |
213 | in question (including functions invoked by those two clauses), and not | |
214 | to code following that "if" statement. | |
215 | ||
216 | ||
217 | In summary: | |
218 | ||
219 | (*) Control dependencies can order prior loads against later stores. | |
220 | However, they do *not* guarantee any other sort of ordering: | |
221 | Not prior loads against later loads, nor prior stores against | |
222 | later anything. If you need these other forms of ordering, use | |
223 | smp_load_acquire(), smp_store_release(), or, in the case of prior | |
224 | stores and later loads, smp_mb(). | |
225 | ||
226 | (*) If both legs of the "if" statement contain identical stores to | |
227 | the same variable, then you must explicitly order those stores, | |
228 | either by preceding both of them with smp_mb() or by using | |
229 | smp_store_release(). Please note that it is *not* sufficient to use | |
230 | barrier() at beginning and end of each leg of the "if" statement | |
231 | because, as shown by the example above, optimizing compilers can | |
232 | destroy the control dependency while respecting the letter of the | |
233 | barrier() law. | |
234 | ||
235 | (*) Control dependencies require at least one run-time conditional | |
236 | between the prior load and the subsequent store, and this | |
237 | conditional must involve the prior load. If the compiler is able | |
238 | to optimize the conditional away, it will have also optimized | |
239 | away the ordering. Careful use of READ_ONCE() and WRITE_ONCE() | |
240 | can help to preserve the needed conditional. | |
241 | ||
242 | (*) Control dependencies require that the compiler avoid reordering the | |
243 | dependency into nonexistence. Careful use of READ_ONCE() or | |
244 | atomic{,64}_read() can help to preserve your control dependency. | |
245 | ||
246 | (*) Control dependencies apply only to the then-clause and else-clause | |
247 | of the "if" statement containing the control dependency, including | |
248 | any functions that these two clauses call. Control dependencies | |
249 | do *not* apply to code beyond the end of that "if" statement. | |
250 | ||
251 | (*) Control dependencies pair normally with other types of barriers. | |
252 | ||
253 | (*) Control dependencies do *not* provide multicopy atomicity. If you | |
254 | need all the CPUs to agree on the ordering of a given store against | |
255 | all other accesses, use smp_mb(). | |
256 | ||
257 | (*) Compilers do not understand control dependencies. It is therefore | |
258 | your job to ensure that they do not break your code. |