]>
Commit | Line | Data |
---|---|---|
2c00a5a8 XL |
1 | # Introduction: the sleep module |
2 | ||
3 | The code in this module governs when worker threads should go to | |
1b1a35ee XL |
4 | sleep. The system used in this code was introduced in [Rayon RFC #5]. |
5 | There is also a [video walkthrough] available. Both of those may be | |
6 | valuable resources to understanding the code, though naturally they | |
7 | will also grow stale over time. The comments in this file are | |
8 | extracted from the RFC and meant to be kept up to date. | |
9 | ||
10 | [Rayon RFC #5]: https://github.com/rayon-rs/rfcs/pull/5 | |
11 | [video walkthrough]: https://youtu.be/HvmQsE5M4cY | |
12 | ||
13 | # The `Sleep` struct | |
14 | ||
15 | The `Sleep` struct is embedded into each registry. It performs several functions: | |
16 | ||
17 | * It tracks when workers are awake or asleep. | |
18 | * It decides how long a worker should look for work before it goes to sleep, | |
19 | via a callback that is invoked periodically from the worker's search loop. | |
20 | * It is notified when latches are set, jobs are published, or other | |
21 | events occur, and it will go and wake the appropriate threads if | |
22 | they are sleeping. | |
23 | ||
24 | # Thread states | |
25 | ||
26 | There are three main thread states: | |
27 | ||
28 | * An **active** thread is one that is actively executing a job. | |
29 | * An **idle** thread is one that is searching for work to do. It will be | |
30 | trying to steal work or pop work from the global injector queue. | |
31 | * A **sleeping** thread is one that is blocked on a condition variable, | |
32 | waiting to be awoken. | |
33 | ||
34 | We sometimes refer to the final two states collectively as **inactive**. | |
35 | Threads begin as idle but transition to idle and finally sleeping when | |
36 | they're unable to find work to do. | |
37 | ||
38 | ## Sleepy threads | |
39 | ||
40 | There is one other special state worth mentioning. During the idle state, | |
41 | threads can get **sleepy**. A sleepy thread is still idle, in that it is still | |
42 | searching for work, but it is *about* to go to sleep after it does one more | |
43 | search (or some other number, potentially). When a thread enters the sleepy | |
44 | state, it signals (via the **jobs event counter**, described below) that it is | |
45 | about to go to sleep. If new work is published, this will lead to the counter | |
46 | being adjusted. When the thread actually goes to sleep, it will (hopefully, but | |
47 | not guaranteed) see that the counter has changed and elect not to sleep, but | |
48 | instead to search again. See the section on the **jobs event counter** for more | |
49 | details. | |
50 | ||
51 | # The counters | |
52 | ||
53 | One of the key structs in the sleep module is `AtomicCounters`, found in | |
54 | `counters.rs`. It packs three counters into one atomically managed value: | |
55 | ||
56 | * Two **thread counters**, which track the number of threads in a particular state. | |
57 | * The **jobs event counter**, which is used to signal when new work is available. | |
58 | It (sort of) tracks the number of jobs posted, but not quite, and it can rollover. | |
59 | ||
60 | ## Thread counters | |
61 | ||
62 | There are two thread counters, one that tracks **inactive** threads and one that | |
63 | tracks **sleeping** threads. From this, one can deduce the number of threads | |
64 | that are idle by subtracting sleeping threads from inactive threads. We track | |
65 | the counters in this way because it permits simpler atomic operations. One can | |
66 | increment the number of sleeping threads (and thus decrease the number of idle | |
67 | threads) simply by doing one atomic increment, for example. Similarly, one can | |
68 | decrease the number of sleeping threads (and increase the number of idle | |
69 | threads) through one atomic decrement. | |
70 | ||
71 | These counters are adjusted as follows: | |
72 | ||
73 | * When a thread enters the idle state: increment the inactive thread counter. | |
74 | * When a thread enters the sleeping state: increment the sleeping thread counter. | |
75 | * When a thread awakens a sleeping thread: decrement the sleeping thread counter. | |
76 | * Subtle point: the thread that *awakens* the sleeping thread decrements the | |
77 | counter, not the thread that is *sleeping*. This is because there is a delay | |
78 | between siganling a thread to wake and the thread actually waking: | |
79 | decrementing the counter when awakening the thread means that other threads | |
80 | that may be posting work will see the up-to-date value that much faster. | |
81 | * When a thread finds work, exiting the idle state: decrement the inactive | |
82 | thread counter. | |
83 | ||
84 | ## Jobs event counter | |
85 | ||
86 | The final counter is the **jobs event counter**. The role of this counter is to | |
87 | help sleepy threads detect when new work is posted in a lightweight fashion. In | |
88 | its simplest form, we would simply have a counter that gets incremented each | |
89 | time a new job is posted. This way, when a thread gets sleepy, it could read the | |
90 | counter, and then compare to see if the value has changed before it actually | |
91 | goes to sleep. But this [turns out to be too expensive] in practice, so we use a | |
92 | somewhat more complex scheme. | |
93 | ||
94 | [turns out to be too expensive]: https://github.com/rayon-rs/rayon/pull/746#issuecomment-624802747 | |
95 | ||
96 | The idea is that the counter toggles between two states, depending on whether | |
97 | its value is even or odd (or, equivalently, on the value of its low bit): | |
98 | ||
99 | * Even -- If the low bit is zero, then it means that there has been no new work | |
100 | since the last thread got sleepy. | |
101 | * Odd -- If the low bit is one, then it means that new work was posted since | |
102 | the last thread got sleepy. | |
103 | ||
104 | ### New work is posted | |
105 | ||
106 | When new work is posted, we check the value of the counter: if it is even, | |
107 | then we increment it by one, so that it becomes odd. | |
108 | ||
109 | ### Worker thread gets sleepy | |
110 | ||
111 | When a worker thread gets sleepy, it will read the value of the counter. If the | |
112 | counter is odd, it will increment the counter so that it is even. Either way, it | |
113 | remembers the final value of the counter. The final value will be used later, | |
114 | when the thread is going to sleep. If at that time the counter has not changed, | |
115 | then we can assume no new jobs have been posted (though note the remote | |
116 | possibility of rollover, discussed in detail below). | |
117 | ||
118 | # Protocol for a worker thread to post work | |
119 | ||
120 | The full protocol for a thread to post work is as follows | |
121 | ||
122 | * If the work is posted into the injection queue, then execute a seq-cst fence (see below). | |
123 | * Load the counters, incrementing the JEC if it is even so that it is odd. | |
124 | * Check if there are idle threads available to handle this new job. If not, | |
125 | and there are sleeping threads, then wake one or more threads. | |
126 | ||
127 | # Protocol for a worker thread to fall asleep | |
128 | ||
129 | The full protocol for a thread to fall asleep is as follows: | |
130 | ||
131 | * After completing all its jobs, the worker goes idle and begins to | |
132 | search for work. As it searches, it counts "rounds". In each round, | |
133 | it searches all other work threads' queues, plus the 'injector queue' for | |
134 | work injected from the outside. If work is found in this search, the thread | |
135 | becomes active again and hence restarts this protocol from the top. | |
136 | * After a certain number of rounds, the thread "gets sleepy" and executes `get_sleepy` | |
137 | above, remembering the `final_value` of the JEC. It does one more search for work. | |
138 | * If no work is found, the thread atomically: | |
139 | * Checks the JEC to see that it has not changed from `final_value`. | |
140 | * If it has, then the thread goes back to searchin for work. We reset to | |
141 | just before we got sleepy, so that we will do one more search | |
142 | before attending to sleep again (rather than searching for many rounds). | |
143 | * Increments the number of sleeping threads by 1. | |
144 | * The thread then executes a seq-cst fence operation (see below). | |
145 | * The thread then does one final check for injected jobs (see below). If any | |
146 | are available, it returns to the 'pre-sleepy' state as if the JEC had changed. | |
147 | * The thread waits to be signaled. Once signaled, it returns to the idle state. | |
148 | ||
149 | # The jobs event counter and deadlock | |
150 | ||
151 | As described in the section on the JEC, the main concern around going to sleep | |
152 | is avoiding a race condition wherein: | |
153 | ||
154 | * Thread A looks for work, finds none. | |
155 | * Thread B posts work but sees no sleeping threads. | |
156 | * Thread A goes to sleep. | |
157 | ||
158 | The JEC protocol largely prevents this, but due to rollover, this prevention is | |
159 | not complete. It is possible -- if unlikely -- that enough activity occurs for | |
160 | Thread A to observe the same JEC value that it saw when getting sleepy. If the | |
161 | new work being published came from *inside* the thread-pool, then this race | |
162 | condition isn't too harmful. It means that we have fewer workers processing the | |
163 | work then we should, but we won't deadlock. This seems like an acceptable risk | |
164 | given that this is unlikely in practice. | |
165 | ||
166 | However, if the work was posted as an *external* job, that is a problem. In that | |
167 | case, it's possible that all of our workers could go to sleep, and the external | |
168 | job would never get processed. To prevent that, the sleeping protocol includes | |
169 | one final check to see if the injector queue is empty before fully falling | |
170 | asleep. Note that this final check occurs **after** the number of sleeping | |
171 | threads has been incremented. We are not concerned therefore with races against | |
172 | injections that occur after that increment, only before. | |
173 | ||
174 | Unfortunately, there is one rather subtle point concerning this final check: | |
175 | we wish to avoid the possibility that: | |
176 | ||
177 | * work is pushed into the injection queue by an outside thread X, | |
178 | * the sleepy thread S sees the JEC but it has rolled over and is equal | |
179 | * the sleepy thread S reads the injection queue but does not see the work posted by X. | |
180 | ||
181 | This is possible because the C++ memory model typically offers guarantees of the | |
182 | form "if you see the access A, then you must see those other accesses" -- but it | |
183 | doesn't guarantee that you will see the access A (i.e., if you think of | |
184 | processors with independent caches, you may be operating on very out of date | |
185 | cache state). | |
186 | ||
187 | ## Using seq-cst fences to prevent deadlock | |
188 | ||
189 | To overcome this problem, we have inserted two sequentially consistent fence | |
190 | operations into the protocols above: | |
191 | ||
192 | * One fence occurs after work is posted into the injection queue, but before the | |
193 | counters are read (including the number of sleeping threads). | |
194 | * Note that no fence is needed for work posted to internal queues, since it is ok | |
195 | to overlook work in that case. | |
196 | * One fence occurs after the number of sleeping threads is incremented, but | |
197 | before the injection queue is read. | |
198 | ||
199 | ### Proof sketch | |
200 | ||
201 | What follows is a "proof sketch" that the protocol is deadlock free. We model | |
202 | two relevant bits of memory, the job injector queue J and the atomic counters C. | |
203 | ||
204 | Consider the actions of the injecting thread: | |
205 | ||
206 | * PushJob: Job is injected, which can be modeled as an atomic write to J with release semantics. | |
207 | * PushFence: A sequentially consistent fence is executed. | |
208 | * ReadSleepers: The counters C are read (they may also be incremented, but we just consider the read that comes first). | |
209 | ||
210 | Meanwhile, the sleepy thread does the following: | |
211 | ||
212 | * IncSleepers: The number of sleeping threads is incremented, which is atomic exchange to C. | |
213 | * SleepFence: A sequentially consistent fence is executed. | |
214 | * ReadJob: We look to see if the queue is empty, which is a read of J with acquire semantics. | |
215 | ||
216 | Either PushFence or SleepFence must come first: | |
217 | ||
218 | * If PushFence comes first, then PushJob must be visible to ReadJob. | |
219 | * If SleepFence comes first, then IncSleepers is visible to ReadSleepers. |