]>
Commit | Line | Data |
---|---|---|
1da177e4 LT |
1 | |
2 | Device Power Management | |
3 | ||
4 | ||
5 | Device power management encompasses two areas - the ability to save | |
6 | state and transition a device to a low-power state when the system is | |
7 | entering a low-power state; and the ability to transition a device to | |
8 | a low-power state while the system is running (and independently of | |
9 | any other power management activity). | |
10 | ||
11 | ||
12 | Methods | |
13 | ||
14 | The methods to suspend and resume devices reside in struct bus_type: | |
15 | ||
16 | struct bus_type { | |
17 | ... | |
18 | int (*suspend)(struct device * dev, pm_message_t state); | |
19 | int (*resume)(struct device * dev); | |
20 | }; | |
21 | ||
22 | Each bus driver is responsible implementing these methods, translating | |
23 | the call into a bus-specific request and forwarding the call to the | |
24 | bus-specific drivers. For example, PCI drivers implement suspend() and | |
25 | resume() methods in struct pci_driver. The PCI core is simply | |
26 | responsible for translating the pointers to PCI-specific ones and | |
27 | calling the low-level driver. | |
28 | ||
29 | This is done to a) ease transition to the new power management methods | |
30 | and leverage the existing PM code in various bus drivers; b) allow | |
31 | buses to implement generic and default PM routines for devices, and c) | |
32 | make the flow of execution obvious to the reader. | |
33 | ||
34 | ||
35 | System Power Management | |
36 | ||
37 | When the system enters a low-power state, the device tree is walked in | |
38 | a depth-first fashion to transition each device into a low-power | |
39 | state. The ordering of the device tree is guaranteed by the order in | |
40 | which devices get registered - children are never registered before | |
41 | their ancestors, and devices are placed at the back of the list when | |
42 | registered. By walking the list in reverse order, we are guaranteed to | |
43 | suspend devices in the proper order. | |
44 | ||
45 | Devices are suspended once with interrupts enabled. Drivers are | |
46 | expected to stop I/O transactions, save device state, and place the | |
47 | device into a low-power state. Drivers may sleep, allocate memory, | |
48 | etc. at will. | |
49 | ||
50 | Some devices are broken and will inevitably have problems powering | |
51 | down or disabling themselves with interrupts enabled. For these | |
52 | special cases, they may return -EAGAIN. This will put the device on a | |
53 | list to be taken care of later. When interrupts are disabled, before | |
54 | we enter the low-power state, their drivers are called again to put | |
55 | their device to sleep. | |
56 | ||
57 | On resume, the devices that returned -EAGAIN will be called to power | |
58 | themselves back on with interrupts disabled. Once interrupts have been | |
59 | re-enabled, the rest of the drivers will be called to resume their | |
60 | devices. On resume, a driver is responsible for powering back on each | |
61 | device, restoring state, and re-enabling I/O transactions for that | |
62 | device. | |
63 | ||
64 | System devices follow a slightly different API, which can be found in | |
65 | ||
66 | include/linux/sysdev.h | |
67 | drivers/base/sys.c | |
68 | ||
69 | System devices will only be suspended with interrupts disabled, and | |
70 | after all other devices have been suspended. On resume, they will be | |
71 | resumed before any other devices, and also with interrupts disabled. | |
72 | ||
73 | ||
74 | Runtime Power Management | |
75 | ||
76 | Many devices are able to dynamically power down while the system is | |
77 | still running. This feature is useful for devices that are not being | |
78 | used, and can offer significant power savings on a running system. | |
79 | ||
80 | In each device's directory, there is a 'power' directory, which | |
81 | contains at least a 'state' file. Reading from this file displays what | |
82 | power state the device is currently in. Writing to this file initiates | |
83 | a transition to the specified power state, which must be a decimal in | |
84 | the range 1-3, inclusive; or 0 for 'On'. | |
85 | ||
86 | The PM core will call the ->suspend() method in the bus_type object | |
87 | that the device belongs to if the specified state is not 0, or | |
88 | ->resume() if it is. | |
89 | ||
90 | Nothing will happen if the specified state is the same state the | |
91 | device is currently in. | |
92 | ||
93 | If the device is already in a low-power state, and the specified state | |
94 | is another, but different, low-power state, the ->resume() method will | |
95 | first be called to power the device back on, then ->suspend() will be | |
96 | called again with the new state. | |
97 | ||
98 | The driver is responsible for saving the working state of the device | |
99 | and putting it into the low-power state specified. If this was | |
100 | successful, it returns 0, and the device's power_state field is | |
101 | updated. | |
102 | ||
103 | The driver must take care to know whether or not it is able to | |
104 | properly resume the device, including all step of reinitialization | |
105 | necessary. (This is the hardest part, and the one most protected by | |
106 | NDA'd documents). | |
107 | ||
108 | The driver must also take care not to suspend a device that is | |
109 | currently in use. It is their responsibility to provide their own | |
110 | exclusion mechanisms. | |
111 | ||
112 | The runtime power transition happens with interrupts enabled. If a | |
113 | device cannot support being powered down with interrupts, it may | |
114 | return -EAGAIN (as it would during a system power management | |
115 | transition), but it will _not_ be called again, and the transaction | |
116 | will fail. | |
117 | ||
118 | There is currently no way to know what states a device or driver | |
119 | supports a priori. This will change in the future. | |
120 | ||
121 | pm_message_t meaning | |
122 | ||
123 | pm_message_t has two fields. event ("major"), and flags. If driver | |
124 | does not know event code, it aborts the request, returning error. Some | |
125 | drivers may need to deal with special cases based on the actual type | |
126 | of suspend operation being done at the system level. This is why | |
127 | there are flags. | |
128 | ||
129 | Event codes are: | |
130 | ||
131 | ON -- no need to do anything except special cases like broken | |
132 | HW. | |
133 | ||
134 | # NOTIFICATION -- pretty much same as ON? | |
135 | ||
136 | FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from | |
137 | scratch. That probably means stop accepting upstream requests, the | |
138 | actual policy of what to do with them beeing specific to a given | |
139 | driver. It's acceptable for a network driver to just drop packets | |
140 | while a block driver is expected to block the queue so no request is | |
141 | lost. (Use IDE as an example on how to do that). FREEZE requires no | |
142 | power state change, and it's expected for drivers to be able to | |
143 | quickly transition back to operating state. | |
144 | ||
145 | SUSPEND -- like FREEZE, but also put hardware into low-power state. If | |
146 | there's need to distinguish several levels of sleep, additional flag | |
147 | is probably best way to do that. | |
148 | ||
149 | Transitions are only from a resumed state to a suspended state, never | |
150 | between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen, | |
151 | FREEZE -> SUSPEND or SUSPEND -> FREEZE can not). | |
152 | ||
153 | All events are: | |
154 | ||
155 | [NOTE NOTE NOTE: If you are driver author, you should not care; you | |
156 | should only look at event, and ignore flags.] | |
157 | ||
158 | #Prepare for suspend -- userland is still running but we are going to | |
159 | #enter suspend state. This gives drivers chance to load firmware from | |
160 | #disk and store it in memory, or do other activities taht require | |
161 | #operating userland, ability to kmalloc GFP_KERNEL, etc... All of these | |
162 | #are forbiden once the suspend dance is started.. event = ON, flags = | |
163 | #PREPARE_TO_SUSPEND | |
164 | ||
165 | Apm standby -- prepare for APM event. Quiesce devices to make life | |
166 | easier for APM BIOS. event = FREEZE, flags = APM_STANDBY | |
167 | ||
168 | Apm suspend -- same as APM_STANDBY, but it we should probably avoid | |
169 | spinning down disks. event = FREEZE, flags = APM_SUSPEND | |
170 | ||
171 | System halt, reboot -- quiesce devices to make life easier for BIOS. event | |
172 | = FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT | |
173 | ||
174 | System shutdown -- at least disks need to be spun down, or data may be | |
175 | lost. Quiesce devices, just to make life easier for BIOS. event = | |
176 | FREEZE, flags = SYSTEM_SHUTDOWN | |
177 | ||
178 | Kexec -- turn off DMAs and put hardware into some state where new | |
179 | kernel can take over. event = FREEZE, flags = KEXEC | |
180 | ||
181 | Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake | |
182 | may need to be enabled on some devices. This actually has at least 3 | |
183 | subtypes, system can reboot, enter S4 and enter S5 at the end of | |
184 | swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT, | |
185 | SYSTEM_SHUTDOWN, SYSTEM_S4 | |
186 | ||
187 | Suspend to ram -- put devices into low power state. event = SUSPEND, | |
188 | flags = SUSPEND_TO_RAM | |
189 | ||
190 | Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put | |
191 | devices into low power mode, but you must be able to reinitialize | |
192 | device from scratch in resume method. This has two flavors, its done | |
193 | once on suspending kernel, once on resuming kernel. event = FREEZE, | |
194 | flags = DURING_SUSPEND or DURING_RESUME | |
195 | ||
196 | Device detach requested from /sys -- deinitialize device; proably same as | |
197 | SYSTEM_SHUTDOWN, I do not understand this one too much. probably event | |
198 | = FREEZE, flags = DEV_DETACH. | |
199 | ||
200 | #These are not really events sent: | |
201 | # | |
202 | #System fully on -- device is working normally; this is probably never | |
203 | #passed to suspend() method... event = ON, flags = 0 | |
204 | # | |
205 | #Ready after resume -- userland is now running, again. Time to free any | |
206 | #memory you ate during prepare to suspend... event = ON, flags = | |
207 | #READY_AFTER_RESUME | |
208 | # | |
209 | ||
210 | Driver Detach Power Management | |
211 | ||
212 | The kernel now supports the ability to place a device in a low-power | |
213 | state when it is detached from its driver, which happens when its | |
214 | module is removed. | |
215 | ||
216 | Each device contains a 'detach_state' file in its sysfs directory | |
217 | which can be used to control this state. Reading from this file | |
218 | displays what the current detach state is set to. This is 0 (On) by | |
219 | default. A user may write a positive integer value to this file in the | |
220 | range of 1-4 inclusive. | |
221 | ||
222 | A value of 1-3 will indicate the device should be placed in that | |
223 | low-power state, which will cause ->suspend() to be called for that | |
224 | device. A value of 4 indicates that the device should be shutdown, so | |
225 | ->shutdown() will be called for that device. | |
226 | ||
227 | The driver is responsible for reinitializing the device when the | |
228 | module is re-inserted during it's ->probe() (or equivalent) method. | |
229 | The driver core will not call any extra functions when binding the | |
230 | device to the driver. | |
231 | ||
232 | pm_message_t meaning | |
233 | ||
234 | pm_message_t has two fields. event ("major"), and flags. If driver | |
235 | does not know event code, it aborts the request, returning error. Some | |
236 | drivers may need to deal with special cases based on the actual type | |
237 | of suspend operation being done at the system level. This is why | |
238 | there are flags. | |
239 | ||
240 | Event codes are: | |
241 | ||
242 | ON -- no need to do anything except special cases like broken | |
243 | HW. | |
244 | ||
245 | # NOTIFICATION -- pretty much same as ON? | |
246 | ||
247 | FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from | |
248 | scratch. That probably means stop accepting upstream requests, the | |
249 | actual policy of what to do with them being specific to a given | |
250 | driver. It's acceptable for a network driver to just drop packets | |
251 | while a block driver is expected to block the queue so no request is | |
252 | lost. (Use IDE as an example on how to do that). FREEZE requires no | |
253 | power state change, and it's expected for drivers to be able to | |
254 | quickly transition back to operating state. | |
255 | ||
256 | SUSPEND -- like FREEZE, but also put hardware into low-power state. If | |
257 | there's need to distinguish several levels of sleep, additional flag | |
258 | is probably best way to do that. | |
259 | ||
260 | Transitions are only from a resumed state to a suspended state, never | |
261 | between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen, | |
262 | FREEZE -> SUSPEND or SUSPEND -> FREEZE can not). | |
263 | ||
264 | All events are: | |
265 | ||
266 | [NOTE NOTE NOTE: If you are driver author, you should not care; you | |
267 | should only look at event, and ignore flags.] | |
268 | ||
269 | #Prepare for suspend -- userland is still running but we are going to | |
270 | #enter suspend state. This gives drivers chance to load firmware from | |
271 | #disk and store it in memory, or do other activities taht require | |
272 | #operating userland, ability to kmalloc GFP_KERNEL, etc... All of these | |
273 | #are forbiden once the suspend dance is started.. event = ON, flags = | |
274 | #PREPARE_TO_SUSPEND | |
275 | ||
276 | Apm standby -- prepare for APM event. Quiesce devices to make life | |
277 | easier for APM BIOS. event = FREEZE, flags = APM_STANDBY | |
278 | ||
279 | Apm suspend -- same as APM_STANDBY, but it we should probably avoid | |
280 | spinning down disks. event = FREEZE, flags = APM_SUSPEND | |
281 | ||
282 | System halt, reboot -- quiesce devices to make life easier for BIOS. event | |
283 | = FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT | |
284 | ||
285 | System shutdown -- at least disks need to be spun down, or data may be | |
286 | lost. Quiesce devices, just to make life easier for BIOS. event = | |
287 | FREEZE, flags = SYSTEM_SHUTDOWN | |
288 | ||
289 | Kexec -- turn off DMAs and put hardware into some state where new | |
290 | kernel can take over. event = FREEZE, flags = KEXEC | |
291 | ||
292 | Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake | |
293 | may need to be enabled on some devices. This actually has at least 3 | |
294 | subtypes, system can reboot, enter S4 and enter S5 at the end of | |
295 | swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT, | |
296 | SYSTEM_SHUTDOWN, SYSTEM_S4 | |
297 | ||
298 | Suspend to ram -- put devices into low power state. event = SUSPEND, | |
299 | flags = SUSPEND_TO_RAM | |
300 | ||
301 | Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put | |
302 | devices into low power mode, but you must be able to reinitialize | |
303 | device from scratch in resume method. This has two flavors, its done | |
304 | once on suspending kernel, once on resuming kernel. event = FREEZE, | |
305 | flags = DURING_SUSPEND or DURING_RESUME | |
306 | ||
307 | Device detach requested from /sys -- deinitialize device; proably same as | |
308 | SYSTEM_SHUTDOWN, I do not understand this one too much. probably event | |
309 | = FREEZE, flags = DEV_DETACH. | |
310 | ||
311 | #These are not really events sent: | |
312 | # | |
313 | #System fully on -- device is working normally; this is probably never | |
314 | #passed to suspend() method... event = ON, flags = 0 | |
315 | # | |
316 | #Ready after resume -- userland is now running, again. Time to free any | |
317 | #memory you ate during prepare to suspend... event = ON, flags = | |
318 | #READY_AFTER_RESUME | |
319 | # |