]>
Commit | Line | Data |
---|---|---|
5e928f77 RW |
1 | Run-time Power Management Framework for I/O Devices |
2 | ||
3 | (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. | |
4 | ||
5 | 1. Introduction | |
6 | ||
7 | Support for run-time power management (run-time PM) of I/O devices is provided | |
8 | at the power management core (PM core) level by means of: | |
9 | ||
10 | * The power management workqueue pm_wq in which bus types and device drivers can | |
11 | put their PM-related work items. It is strongly recommended that pm_wq be | |
12 | used for queuing all work items related to run-time PM, because this allows | |
13 | them to be synchronized with system-wide power transitions (suspend to RAM, | |
14 | hibernation and resume from system sleep states). pm_wq is declared in | |
15 | include/linux/pm_runtime.h and defined in kernel/power/main.c. | |
16 | ||
17 | * A number of run-time PM fields in the 'power' member of 'struct device' (which | |
18 | is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can | |
19 | be used for synchronizing run-time PM operations with one another. | |
20 | ||
21 | * Three device run-time PM callbacks in 'struct dev_pm_ops' (defined in | |
22 | include/linux/pm.h). | |
23 | ||
24 | * A set of helper functions defined in drivers/base/power/runtime.c that can be | |
25 | used for carrying out run-time PM operations in such a way that the | |
26 | synchronization between them is taken care of by the PM core. Bus types and | |
27 | device drivers are encouraged to use these functions. | |
28 | ||
29 | The run-time PM callbacks present in 'struct dev_pm_ops', the device run-time PM | |
30 | fields of 'struct dev_pm_info' and the core helper functions provided for | |
31 | run-time PM are described below. | |
32 | ||
33 | 2. Device Run-time PM Callbacks | |
34 | ||
35 | There are three device run-time PM callbacks defined in 'struct dev_pm_ops': | |
36 | ||
37 | struct dev_pm_ops { | |
38 | ... | |
39 | int (*runtime_suspend)(struct device *dev); | |
40 | int (*runtime_resume)(struct device *dev); | |
e1b1903e | 41 | int (*runtime_idle)(struct device *dev); |
5e928f77 RW |
42 | ... |
43 | }; | |
44 | ||
a6ab7aa9 RW |
45 | The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks are |
46 | executed by the PM core for either the bus type, or device type (if the bus | |
47 | type's callback is not defined), or device class (if the bus type's and device | |
48 | type's callbacks are not defined) of given device. The bus type, device type | |
49 | and device class callbacks are referred to as subsystem-level callbacks in what | |
50 | follows. | |
51 | ||
52 | The subsystem-level suspend callback is _entirely_ _responsible_ for handling | |
53 | the suspend of the device as appropriate, which may, but need not include | |
54 | executing the device driver's own ->runtime_suspend() callback (from the | |
5e928f77 | 55 | PM core's point of view it is not necessary to implement a ->runtime_suspend() |
a6ab7aa9 RW |
56 | callback in a device driver as long as the subsystem-level suspend callback |
57 | knows what to do to handle the device). | |
5e928f77 | 58 | |
a6ab7aa9 | 59 | * Once the subsystem-level suspend callback has completed successfully |
5e928f77 RW |
60 | for given device, the PM core regards the device as suspended, which need |
61 | not mean that the device has been put into a low power state. It is | |
62 | supposed to mean, however, that the device will not process data and will | |
a6ab7aa9 RW |
63 | not communicate with the CPU(s) and RAM until the subsystem-level resume |
64 | callback is executed for it. The run-time PM status of a device after | |
65 | successful execution of the subsystem-level suspend callback is 'suspended'. | |
66 | ||
67 | * If the subsystem-level suspend callback returns -EBUSY or -EAGAIN, | |
68 | the device's run-time PM status is 'active', which means that the device | |
69 | _must_ be fully operational afterwards. | |
70 | ||
71 | * If the subsystem-level suspend callback returns an error code different | |
72 | from -EBUSY or -EAGAIN, the PM core regards this as a fatal error and will | |
73 | refuse to run the helper functions described in Section 4 for the device, | |
74 | until the status of it is directly set either to 'active', or to 'suspended' | |
75 | (the PM core provides special helper functions for this purpose). | |
76 | ||
77 | In particular, if the driver requires remote wake-up capability (i.e. hardware | |
78 | mechanism allowing the device to request a change of its power state, such as | |
79 | PCI PME) for proper functioning and device_run_wake() returns 'false' for the | |
80 | device, then ->runtime_suspend() should return -EBUSY. On the other hand, if | |
81 | device_run_wake() returns 'true' for the device and the device is put into a low | |
82 | power state during the execution of the subsystem-level suspend callback, it is | |
83 | expected that remote wake-up will be enabled for the device. Generally, remote | |
84 | wake-up should be enabled for all input devices put into a low power state at | |
85 | run time. | |
86 | ||
87 | The subsystem-level resume callback is _entirely_ _responsible_ for handling the | |
88 | resume of the device as appropriate, which may, but need not include executing | |
89 | the device driver's own ->runtime_resume() callback (from the PM core's point of | |
90 | view it is not necessary to implement a ->runtime_resume() callback in a device | |
91 | driver as long as the subsystem-level resume callback knows what to do to handle | |
92 | the device). | |
93 | ||
94 | * Once the subsystem-level resume callback has completed successfully, the PM | |
95 | core regards the device as fully operational, which means that the device | |
96 | _must_ be able to complete I/O operations as needed. The run-time PM status | |
97 | of the device is then 'active'. | |
98 | ||
99 | * If the subsystem-level resume callback returns an error code, the PM core | |
100 | regards this as a fatal error and will refuse to run the helper functions | |
101 | described in Section 4 for the device, until its status is directly set | |
102 | either to 'active' or to 'suspended' (the PM core provides special helper | |
103 | functions for this purpose). | |
104 | ||
105 | The subsystem-level idle callback is executed by the PM core whenever the device | |
106 | appears to be idle, which is indicated to the PM core by two counters, the | |
107 | device's usage counter and the counter of 'active' children of the device. | |
5e928f77 RW |
108 | |
109 | * If any of these counters is decreased using a helper function provided by | |
110 | the PM core and it turns out to be equal to zero, the other counter is | |
111 | checked. If that counter also is equal to zero, the PM core executes the | |
a6ab7aa9 | 112 | subsystem-level idle callback with the device as an argument. |
5e928f77 | 113 | |
a6ab7aa9 RW |
114 | The action performed by a subsystem-level idle callback is totally dependent on |
115 | the subsystem in question, but the expected and recommended action is to check | |
116 | if the device can be suspended (i.e. if all of the conditions necessary for | |
117 | suspending the device are satisfied) and to queue up a suspend request for the | |
118 | device in that case. The value returned by this callback is ignored by the PM | |
119 | core. | |
5e928f77 RW |
120 | |
121 | The helper functions provided by the PM core, described in Section 4, guarantee | |
122 | that the following constraints are met with respect to the bus type's run-time | |
123 | PM callbacks: | |
124 | ||
125 | (1) The callbacks are mutually exclusive (e.g. it is forbidden to execute | |
126 | ->runtime_suspend() in parallel with ->runtime_resume() or with another | |
127 | instance of ->runtime_suspend() for the same device) with the exception that | |
128 | ->runtime_suspend() or ->runtime_resume() can be executed in parallel with | |
129 | ->runtime_idle() (although ->runtime_idle() will not be started while any | |
130 | of the other callbacks is being executed for the same device). | |
131 | ||
132 | (2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active' | |
133 | devices (i.e. the PM core will only execute ->runtime_idle() or | |
134 | ->runtime_suspend() for the devices the run-time PM status of which is | |
135 | 'active'). | |
136 | ||
137 | (3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device | |
138 | the usage counter of which is equal to zero _and_ either the counter of | |
139 | 'active' children of which is equal to zero, or the 'power.ignore_children' | |
140 | flag of which is set. | |
141 | ||
142 | (4) ->runtime_resume() can only be executed for 'suspended' devices (i.e. the | |
143 | PM core will only execute ->runtime_resume() for the devices the run-time | |
144 | PM status of which is 'suspended'). | |
145 | ||
146 | Additionally, the helper functions provided by the PM core obey the following | |
147 | rules: | |
148 | ||
149 | * If ->runtime_suspend() is about to be executed or there's a pending request | |
150 | to execute it, ->runtime_idle() will not be executed for the same device. | |
151 | ||
152 | * A request to execute or to schedule the execution of ->runtime_suspend() | |
153 | will cancel any pending requests to execute ->runtime_idle() for the same | |
154 | device. | |
155 | ||
156 | * If ->runtime_resume() is about to be executed or there's a pending request | |
157 | to execute it, the other callbacks will not be executed for the same device. | |
158 | ||
159 | * A request to execute ->runtime_resume() will cancel any pending or | |
160 | scheduled requests to execute the other callbacks for the same device. | |
161 | ||
162 | 3. Run-time PM Device Fields | |
163 | ||
164 | The following device run-time PM fields are present in 'struct dev_pm_info', as | |
165 | defined in include/linux/pm.h: | |
166 | ||
167 | struct timer_list suspend_timer; | |
168 | - timer used for scheduling (delayed) suspend request | |
169 | ||
170 | unsigned long timer_expires; | |
171 | - timer expiration time, in jiffies (if this is different from zero, the | |
172 | timer is running and will expire at that time, otherwise the timer is not | |
173 | running) | |
174 | ||
175 | struct work_struct work; | |
176 | - work structure used for queuing up requests (i.e. work items in pm_wq) | |
177 | ||
178 | wait_queue_head_t wait_queue; | |
179 | - wait queue used if any of the helper functions needs to wait for another | |
180 | one to complete | |
181 | ||
182 | spinlock_t lock; | |
183 | - lock used for synchronisation | |
184 | ||
185 | atomic_t usage_count; | |
186 | - the usage counter of the device | |
187 | ||
188 | atomic_t child_count; | |
189 | - the count of 'active' children of the device | |
190 | ||
191 | unsigned int ignore_children; | |
192 | - if set, the value of child_count is ignored (but still updated) | |
193 | ||
194 | unsigned int disable_depth; | |
195 | - used for disabling the helper funcions (they work normally if this is | |
196 | equal to zero); the initial value of it is 1 (i.e. run-time PM is | |
197 | initially disabled for all devices) | |
198 | ||
199 | unsigned int runtime_error; | |
200 | - if set, there was a fatal error (one of the callbacks returned error code | |
201 | as described in Section 2), so the helper funtions will not work until | |
202 | this flag is cleared; this is the error code returned by the failing | |
203 | callback | |
204 | ||
205 | unsigned int idle_notification; | |
206 | - if set, ->runtime_idle() is being executed | |
207 | ||
208 | unsigned int request_pending; | |
209 | - if set, there's a pending request (i.e. a work item queued up into pm_wq) | |
210 | ||
211 | enum rpm_request request; | |
212 | - type of request that's pending (valid if request_pending is set) | |
213 | ||
214 | unsigned int deferred_resume; | |
215 | - set if ->runtime_resume() is about to be run while ->runtime_suspend() is | |
216 | being executed for that device and it is not practical to wait for the | |
217 | suspend to complete; means "start a resume as soon as you've suspended" | |
218 | ||
7a1a8eb5 RW |
219 | unsigned int run_wake; |
220 | - set if the device is capable of generating run-time wake-up events | |
221 | ||
5e928f77 RW |
222 | enum rpm_status runtime_status; |
223 | - the run-time PM status of the device; this field's initial value is | |
224 | RPM_SUSPENDED, which means that each device is initially regarded by the | |
225 | PM core as 'suspended', regardless of its real hardware status | |
226 | ||
227 | All of the above fields are members of the 'power' member of 'struct device'. | |
228 | ||
229 | 4. Run-time PM Device Helper Functions | |
230 | ||
231 | The following run-time PM helper functions are defined in | |
232 | drivers/base/power/runtime.c and include/linux/pm_runtime.h: | |
233 | ||
234 | void pm_runtime_init(struct device *dev); | |
235 | - initialize the device run-time PM fields in 'struct dev_pm_info' | |
236 | ||
237 | void pm_runtime_remove(struct device *dev); | |
238 | - make sure that the run-time PM of the device will be disabled after | |
239 | removing the device from device hierarchy | |
240 | ||
241 | int pm_runtime_idle(struct device *dev); | |
a6ab7aa9 RW |
242 | - execute the subsystem-level idle callback for the device; returns 0 on |
243 | success or error code on failure, where -EINPROGRESS means that | |
244 | ->runtime_idle() is already being executed | |
5e928f77 RW |
245 | |
246 | int pm_runtime_suspend(struct device *dev); | |
a6ab7aa9 | 247 | - execute the subsystem-level suspend callback for the device; returns 0 on |
5e928f77 RW |
248 | success, 1 if the device's run-time PM status was already 'suspended', or |
249 | error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt | |
250 | to suspend the device again in future | |
251 | ||
252 | int pm_runtime_resume(struct device *dev); | |
a6ab7aa9 | 253 | - execute the subsystem-leve resume callback for the device; returns 0 on |
5e928f77 RW |
254 | success, 1 if the device's run-time PM status was already 'active' or |
255 | error code on failure, where -EAGAIN means it may be safe to attempt to | |
256 | resume the device again in future, but 'power.runtime_error' should be | |
257 | checked additionally | |
258 | ||
259 | int pm_request_idle(struct device *dev); | |
a6ab7aa9 RW |
260 | - submit a request to execute the subsystem-level idle callback for the |
261 | device (the request is represented by a work item in pm_wq); returns 0 on | |
262 | success or error code if the request has not been queued up | |
5e928f77 RW |
263 | |
264 | int pm_schedule_suspend(struct device *dev, unsigned int delay); | |
a6ab7aa9 RW |
265 | - schedule the execution of the subsystem-level suspend callback for the |
266 | device in future, where 'delay' is the time to wait before queuing up a | |
267 | suspend work item in pm_wq, in milliseconds (if 'delay' is zero, the work | |
268 | item is queued up immediately); returns 0 on success, 1 if the device's PM | |
5e928f77 RW |
269 | run-time status was already 'suspended', or error code if the request |
270 | hasn't been scheduled (or queued up if 'delay' is 0); if the execution of | |
271 | ->runtime_suspend() is already scheduled and not yet expired, the new | |
272 | value of 'delay' will be used as the time to wait | |
273 | ||
274 | int pm_request_resume(struct device *dev); | |
a6ab7aa9 RW |
275 | - submit a request to execute the subsystem-level resume callback for the |
276 | device (the request is represented by a work item in pm_wq); returns 0 on | |
5e928f77 RW |
277 | success, 1 if the device's run-time PM status was already 'active', or |
278 | error code if the request hasn't been queued up | |
279 | ||
280 | void pm_runtime_get_noresume(struct device *dev); | |
281 | - increment the device's usage counter | |
282 | ||
283 | int pm_runtime_get(struct device *dev); | |
284 | - increment the device's usage counter, run pm_request_resume(dev) and | |
285 | return its result | |
286 | ||
287 | int pm_runtime_get_sync(struct device *dev); | |
288 | - increment the device's usage counter, run pm_runtime_resume(dev) and | |
289 | return its result | |
290 | ||
291 | void pm_runtime_put_noidle(struct device *dev); | |
292 | - decrement the device's usage counter | |
293 | ||
294 | int pm_runtime_put(struct device *dev); | |
295 | - decrement the device's usage counter, run pm_request_idle(dev) and return | |
296 | its result | |
297 | ||
298 | int pm_runtime_put_sync(struct device *dev); | |
299 | - decrement the device's usage counter, run pm_runtime_idle(dev) and return | |
300 | its result | |
301 | ||
302 | void pm_runtime_enable(struct device *dev); | |
303 | - enable the run-time PM helper functions to run the device bus type's | |
304 | run-time PM callbacks described in Section 2 | |
305 | ||
306 | int pm_runtime_disable(struct device *dev); | |
a6ab7aa9 RW |
307 | - prevent the run-time PM helper functions from running subsystem-level |
308 | run-time PM callbacks for the device, make sure that all of the pending | |
309 | run-time PM operations on the device are either completed or canceled; | |
310 | returns 1 if there was a resume request pending and it was necessary to | |
311 | execute the subsystem-level resume callback for the device to satisfy that | |
312 | request, otherwise 0 is returned | |
5e928f77 RW |
313 | |
314 | void pm_suspend_ignore_children(struct device *dev, bool enable); | |
315 | - set/unset the power.ignore_children flag of the device | |
316 | ||
317 | int pm_runtime_set_active(struct device *dev); | |
318 | - clear the device's 'power.runtime_error' flag, set the device's run-time | |
319 | PM status to 'active' and update its parent's counter of 'active' | |
320 | children as appropriate (it is only valid to use this function if | |
321 | 'power.runtime_error' is set or 'power.disable_depth' is greater than | |
322 | zero); it will fail and return error code if the device has a parent | |
323 | which is not active and the 'power.ignore_children' flag of which is unset | |
324 | ||
325 | void pm_runtime_set_suspended(struct device *dev); | |
326 | - clear the device's 'power.runtime_error' flag, set the device's run-time | |
327 | PM status to 'suspended' and update its parent's counter of 'active' | |
328 | children as appropriate (it is only valid to use this function if | |
329 | 'power.runtime_error' is set or 'power.disable_depth' is greater than | |
330 | zero) | |
331 | ||
332 | It is safe to execute the following helper functions from interrupt context: | |
333 | ||
334 | pm_request_idle() | |
335 | pm_schedule_suspend() | |
336 | pm_request_resume() | |
337 | pm_runtime_get_noresume() | |
338 | pm_runtime_get() | |
339 | pm_runtime_put_noidle() | |
340 | pm_runtime_put() | |
341 | pm_suspend_ignore_children() | |
342 | pm_runtime_set_active() | |
343 | pm_runtime_set_suspended() | |
344 | pm_runtime_enable() | |
345 | ||
346 | 5. Run-time PM Initialization, Device Probing and Removal | |
347 | ||
348 | Initially, the run-time PM is disabled for all devices, which means that the | |
349 | majority of the run-time PM helper funtions described in Section 4 will return | |
350 | -EAGAIN until pm_runtime_enable() is called for the device. | |
351 | ||
352 | In addition to that, the initial run-time PM status of all devices is | |
353 | 'suspended', but it need not reflect the actual physical state of the device. | |
354 | Thus, if the device is initially active (i.e. it is able to process I/O), its | |
355 | run-time PM status must be changed to 'active', with the help of | |
356 | pm_runtime_set_active(), before pm_runtime_enable() is called for the device. | |
357 | ||
358 | However, if the device has a parent and the parent's run-time PM is enabled, | |
359 | calling pm_runtime_set_active() for the device will affect the parent, unless | |
360 | the parent's 'power.ignore_children' flag is set. Namely, in that case the | |
361 | parent won't be able to suspend at run time, using the PM core's helper | |
362 | functions, as long as the child's status is 'active', even if the child's | |
363 | run-time PM is still disabled (i.e. pm_runtime_enable() hasn't been called for | |
364 | the child yet or pm_runtime_disable() has been called for it). For this reason, | |
365 | once pm_runtime_set_active() has been called for the device, pm_runtime_enable() | |
366 | should be called for it too as soon as reasonably possible or its run-time PM | |
367 | status should be changed back to 'suspended' with the help of | |
368 | pm_runtime_set_suspended(). | |
369 | ||
370 | If the default initial run-time PM status of the device (i.e. 'suspended') | |
371 | reflects the actual state of the device, its bus type's or its driver's | |
372 | ->probe() callback will likely need to wake it up using one of the PM core's | |
373 | helper functions described in Section 4. In that case, pm_runtime_resume() | |
374 | should be used. Of course, for this purpose the device's run-time PM has to be | |
375 | enabled earlier by calling pm_runtime_enable(). | |
376 | ||
377 | If the device bus type's or driver's ->probe() or ->remove() callback runs | |
378 | pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts, | |
379 | they will fail returning -EAGAIN, because the device's usage counter is | |
380 | incremented by the core before executing ->probe() and ->remove(). Still, it | |
381 | may be desirable to suspend the device as soon as ->probe() or ->remove() has | |
a6ab7aa9 RW |
382 | finished, so the PM core uses pm_runtime_idle_sync() to invoke the |
383 | subsystem-level idle callback for the device at that time. | |
f1212ae1 AS |
384 | |
385 | 6. Run-time PM and System Sleep | |
386 | ||
387 | Run-time PM and system sleep (i.e., system suspend and hibernation, also known | |
388 | as suspend-to-RAM and suspend-to-disk) interact with each other in a couple of | |
389 | ways. If a device is active when a system sleep starts, everything is | |
390 | straightforward. But what should happen if the device is already suspended? | |
391 | ||
392 | The device may have different wake-up settings for run-time PM and system sleep. | |
393 | For example, remote wake-up may be enabled for run-time suspend but disallowed | |
394 | for system sleep (device_may_wakeup(dev) returns 'false'). When this happens, | |
395 | the subsystem-level system suspend callback is responsible for changing the | |
396 | device's wake-up setting (it may leave that to the device driver's system | |
397 | suspend routine). It may be necessary to resume the device and suspend it again | |
398 | in order to do so. The same is true if the driver uses different power levels | |
399 | or other settings for run-time suspend and system sleep. | |
400 | ||
401 | During system resume, devices generally should be brought back to full power, | |
402 | even if they were suspended before the system sleep began. There are several | |
403 | reasons for this, including: | |
404 | ||
405 | * The device might need to switch power levels, wake-up settings, etc. | |
406 | ||
407 | * Remote wake-up events might have been lost by the firmware. | |
408 | ||
409 | * The device's children may need the device to be at full power in order | |
410 | to resume themselves. | |
411 | ||
412 | * The driver's idea of the device state may not agree with the device's | |
413 | physical state. This can happen during resume from hibernation. | |
414 | ||
415 | * The device might need to be reset. | |
416 | ||
417 | * Even though the device was suspended, if its usage counter was > 0 then most | |
418 | likely it would need a run-time resume in the near future anyway. | |
419 | ||
420 | * Always going back to full power is simplest. | |
421 | ||
422 | If the device was suspended before the sleep began, then its run-time PM status | |
423 | will have to be updated to reflect the actual post-system sleep status. The way | |
424 | to do this is: | |
425 | ||
426 | pm_runtime_disable(dev); | |
427 | pm_runtime_set_active(dev); | |
428 | pm_runtime_enable(dev); | |
429 | ||
430 | The PM core always increments the run-time usage counter before calling the | |
431 | ->prepare() callback and decrements it after calling the ->complete() callback. | |
432 | Hence disabling run-time PM temporarily like this will not cause any run-time | |
433 | suspend callbacks to be lost. |