]>
Commit | Line | Data |
---|---|---|
1da177e4 LT |
1 | |
2 | PCI Power Management | |
3 | ~~~~~~~~~~~~~~~~~~~~ | |
4 | ||
5 | An overview of the concepts and the related functions in the Linux kernel | |
6 | ||
7 | Patrick Mochel <mochel@transmeta.com> | |
8 | (and others) | |
9 | ||
10 | --------------------------------------------------------------------------- | |
11 | ||
12 | 1. Overview | |
13 | 2. How the PCI Subsystem Does Power Management | |
14 | 3. PCI Utility Functions | |
15 | 4. PCI Device Drivers | |
16 | 5. Resources | |
17 | ||
18 | 1. Overview | |
19 | ~~~~~~~~~~~ | |
20 | ||
21 | The PCI Power Management Specification was introduced between the PCI 2.1 and | |
22 | PCI 2.2 Specifications. It a standard interface for controlling various | |
23 | power management operations. | |
24 | ||
25 | Implementation of the PCI PM Spec is optional, as are several sub-components of | |
26 | it. If a device supports the PCI PM Spec, the device will have an 8 byte | |
27 | capability field in its PCI configuration space. This field is used to describe | |
28 | and control the standard PCI power management features. | |
29 | ||
30 | The PCI PM spec defines 4 operating states for devices (D0 - D3) and for buses | |
31 | (B0 - B3). The higher the number, the less power the device consumes. However, | |
32 | the higher the number, the longer the latency is for the device to return to | |
33 | an operational state (D0). | |
34 | ||
35 | There are actually two D3 states. When someone talks about D3, they usually | |
36 | mean D3hot, which corresponds to an ACPI D2 state (power is reduced, the | |
37 | device may lose some context). But they may also mean D3cold, which is an | |
38 | ACPI D3 state (power is fully off, all state was discarded); or both. | |
39 | ||
40 | Bus power management is not covered in this version of this document. | |
41 | ||
42 | Note that all PCI devices support D0 and D3cold by default, regardless of | |
43 | whether or not they implement any of the PCI PM spec. | |
44 | ||
45 | The possible state transitions that a device can undergo are: | |
46 | ||
47 | +---------------------------+ | |
48 | | Current State | New State | | |
49 | +---------------------------+ | |
50 | | D0 | D1, D2, D3| | |
51 | +---------------------------+ | |
52 | | D1 | D2, D3 | | |
53 | +---------------------------+ | |
54 | | D2 | D3 | | |
55 | +---------------------------+ | |
56 | | D1, D2, D3 | D0 | | |
57 | +---------------------------+ | |
58 | ||
59 | Note that when the system is entering a global suspend state, all devices will | |
60 | be placed into D3 and when resuming, all devices will be placed into D0. | |
61 | However, when the system is running, other state transitions are possible. | |
62 | ||
63 | 2. How The PCI Subsystem Handles Power Management | |
64 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
65 | ||
66 | The PCI suspend/resume functionality is accessed indirectly via the Power | |
67 | Management subsystem. At boot, the PCI driver registers a power management | |
68 | callback with that layer. Upon entering a suspend state, the PM layer iterates | |
69 | through all of its registered callbacks. This currently takes place only during | |
70 | APM state transitions. | |
71 | ||
72 | Upon going to sleep, the PCI subsystem walks its device tree twice. Both times, | |
73 | it does a depth first walk of the device tree. The first walk saves each of the | |
74 | device's state and checks for devices that will prevent the system from entering | |
75 | a global power state. The next walk then places the devices in a low power | |
76 | state. | |
77 | ||
78 | The first walk allows a graceful recovery in the event of a failure, since none | |
79 | of the devices have actually been powered down. | |
80 | ||
81 | In both walks, in particular the second, all children of a bridge are touched | |
82 | before the actual bridge itself. This allows the bridge to retain power while | |
83 | its children are being accessed. | |
84 | ||
85 | Upon resuming from sleep, just the opposite must be true: all bridges must be | |
86 | powered on and restored before their children are powered on. This is easily | |
87 | accomplished with a breadth-first walk of the PCI device tree. | |
88 | ||
89 | ||
90 | 3. PCI Utility Functions | |
91 | ~~~~~~~~~~~~~~~~~~~~~~~~ | |
92 | ||
93 | These are helper functions designed to be called by individual device drivers. | |
94 | Assuming that a device behaves as advertised, these should be applicable in most | |
95 | cases. However, results may vary. | |
96 | ||
97 | Note that these functions are never implicitly called for the driver. The driver | |
98 | is always responsible for deciding when and if to call these. | |
99 | ||
100 | ||
101 | pci_save_state | |
102 | -------------- | |
103 | ||
104 | Usage: | |
5fabdb94 | 105 | pci_save_state(struct pci_dev *dev); |
1da177e4 LT |
106 | |
107 | Description: | |
5fabdb94 JC |
108 | Save first 64 bytes of PCI config space, along with any additional |
109 | PCI-Express or PCI-X information. | |
1da177e4 LT |
110 | |
111 | ||
112 | pci_restore_state | |
113 | ----------------- | |
114 | ||
115 | Usage: | |
5fabdb94 | 116 | pci_restore_state(struct pci_dev *dev); |
1da177e4 LT |
117 | |
118 | Description: | |
5fabdb94 | 119 | Restore previously saved config space. |
1da177e4 LT |
120 | |
121 | ||
122 | pci_set_power_state | |
123 | ------------------- | |
124 | ||
125 | Usage: | |
5fabdb94 | 126 | pci_set_power_state(struct pci_dev *dev, pci_power_t state); |
1da177e4 LT |
127 | |
128 | Description: | |
129 | Transition device to low power state using PCI PM Capabilities | |
130 | registers. | |
131 | ||
132 | Will fail under one of the following conditions: | |
133 | - If state is less than current state, but not D0 (illegal transition) | |
134 | - Device doesn't support PM Capabilities | |
135 | - Device does not support requested state | |
136 | ||
137 | ||
138 | pci_enable_wake | |
139 | --------------- | |
140 | ||
141 | Usage: | |
5fabdb94 | 142 | pci_enable_wake(struct pci_dev *dev, pci_power_t state, int enable); |
1da177e4 LT |
143 | |
144 | Description: | |
145 | Enable device to generate PME# during low power state using PCI PM | |
146 | Capabilities. | |
147 | ||
148 | Checks whether if device supports generating PME# from requested state | |
149 | and fail if it does not, unless enable == 0 (request is to disable wake | |
150 | events, which is implicit if it doesn't even support it in the first | |
151 | place). | |
152 | ||
5d3f083d | 153 | Note that the PMC Register in the device's PM Capabilities has a bitmask |
1da177e4 LT |
154 | of the states it supports generating PME# from. D3hot is bit 3 and |
155 | D3cold is bit 4. So, while a value of 4 as the state may not seem | |
156 | semantically correct, it is. | |
157 | ||
158 | ||
159 | 4. PCI Device Drivers | |
160 | ~~~~~~~~~~~~~~~~~~~~~ | |
161 | ||
162 | These functions are intended for use by individual drivers, and are defined in | |
163 | struct pci_driver: | |
164 | ||
92df516e | 165 | int (*suspend) (struct pci_dev *dev, pm_message_t state); |
1da177e4 | 166 | int (*resume) (struct pci_dev *dev); |
92df516e | 167 | int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); |
1da177e4 LT |
168 | |
169 | ||
170 | suspend | |
171 | ------- | |
172 | ||
173 | Usage: | |
174 | ||
175 | if (dev->driver && dev->driver->suspend) | |
176 | dev->driver->suspend(dev,state); | |
177 | ||
178 | A driver uses this function to actually transition the device into a low power | |
179 | state. This should include disabling I/O, IRQs, and bus-mastering, as well as | |
180 | physically transitioning the device to a lower power state; it may also include | |
181 | calls to pci_enable_wake(). | |
182 | ||
183 | Bus mastering may be disabled by doing: | |
184 | ||
185 | pci_disable_device(dev); | |
186 | ||
187 | For devices that support the PCI PM Spec, this may be used to set the device's | |
188 | power state to match the suspend() parameter: | |
189 | ||
190 | pci_set_power_state(dev,state); | |
191 | ||
192 | The driver is also responsible for disabling any other device-specific features | |
193 | (e.g blanking screen, turning off on-card memory, etc). | |
194 | ||
195 | The driver should be sure to track the current state of the device, as it may | |
196 | obviate the need for some operations. | |
197 | ||
198 | The driver should update the current_state field in its pci_dev structure in | |
199 | this function, except for PM-capable devices when pci_set_power_state is used. | |
200 | ||
201 | resume | |
202 | ------ | |
203 | ||
204 | Usage: | |
205 | ||
206 | if (dev->driver && dev->driver->suspend) | |
207 | dev->driver->resume(dev) | |
208 | ||
209 | The resume callback may be called from any power state, and is always meant to | |
210 | transition the device to the D0 state. | |
211 | ||
212 | The driver is responsible for reenabling any features of the device that had | |
213 | been disabled during previous suspend calls, such as IRQs and bus mastering, | |
214 | as well as calling pci_restore_state(). | |
215 | ||
216 | If the device is currently in D3, it may need to be reinitialized in resume(). | |
217 | ||
218 | * Some types of devices, like bus controllers, will preserve context in D3hot | |
219 | (using Vcc power). Their drivers will often want to avoid re-initializing | |
220 | them after re-entering D0 (perhaps to avoid resetting downstream devices). | |
221 | ||
222 | * Other kinds of devices in D3hot will discard device context as part of a | |
223 | soft reset when re-entering the D0 state. | |
224 | ||
225 | * Devices resuming from D3cold always go through a power-on reset. Some | |
226 | device context can also be preserved using Vaux power. | |
227 | ||
228 | * Some systems hide D3cold resume paths from drivers. For example, on PCs | |
229 | the resume path for suspend-to-disk often runs BIOS powerup code, which | |
230 | will sometimes re-initialize the device. | |
231 | ||
232 | To handle resets during D3 to D0 transitions, it may be convenient to share | |
233 | device initialization code between probe() and resume(). Device parameters | |
234 | can also be saved before the driver suspends into D3, avoiding re-probe. | |
235 | ||
236 | If the device supports the PCI PM Spec, it can use this to physically transition | |
237 | the device to D0: | |
238 | ||
239 | pci_set_power_state(dev,0); | |
240 | ||
241 | Note that if the entire system is transitioning out of a global sleep state, all | |
242 | devices will be placed in the D0 state, so this is not necessary. However, in | |
243 | the event that the device is placed in the D3 state during normal operation, | |
244 | this call is necessary. It is impossible to determine which of the two events is | |
245 | taking place in the driver, so it is always a good idea to make that call. | |
246 | ||
247 | The driver should take note of the state that it is resuming from in order to | |
248 | ensure correct (and speedy) operation. | |
249 | ||
250 | The driver should update the current_state field in its pci_dev structure in | |
251 | this function, except for PM-capable devices when pci_set_power_state is used. | |
252 | ||
253 | ||
254 | enable_wake | |
255 | ----------- | |
256 | ||
257 | Usage: | |
258 | ||
259 | if (dev->driver && dev->driver->enable_wake) | |
260 | dev->driver->enable_wake(dev,state,enable); | |
261 | ||
262 | This callback is generally only relevant for devices that support the PCI PM | |
263 | spec and have the ability to generate a PME# (Power Management Event Signal) | |
264 | to wake the system up. (However, it is possible that a device may support | |
265 | some non-standard way of generating a wake event on sleep.) | |
266 | ||
267 | Bits 15:11 of the PMC (Power Mgmt Capabilities) Register in a device's | |
5d3f083d | 268 | PM Capabilities describe what power states the device supports generating a |
1da177e4 LT |
269 | wake event from: |
270 | ||
271 | +------------------+ | |
272 | | Bit | State | | |
273 | +------------------+ | |
274 | | 11 | D0 | | |
275 | | 12 | D1 | | |
276 | | 13 | D2 | | |
277 | | 14 | D3hot | | |
278 | | 15 | D3cold | | |
279 | +------------------+ | |
280 | ||
281 | A device can use this to enable wake events: | |
282 | ||
283 | pci_enable_wake(dev,state,enable); | |
284 | ||
285 | Note that to enable PME# from D3cold, a value of 4 should be passed to | |
286 | pci_enable_wake (since it uses an index into a bitmask). If a driver gets | |
287 | a request to enable wake events from D3, two calls should be made to | |
288 | pci_enable_wake (one for both D3hot and D3cold). | |
289 | ||
290 | ||
21d6b7e1 | 291 | A reference implementation |
292 | ------------------------- | |
293 | .suspend() | |
294 | { | |
295 | /* driver specific operations */ | |
296 | ||
297 | /* Disable IRQ */ | |
298 | free_irq(); | |
299 | /* If using MSI */ | |
300 | pci_disable_msi(); | |
301 | ||
302 | pci_save_state(); | |
303 | pci_enable_wake(); | |
304 | /* Disable IO/bus master/irq router */ | |
305 | pci_disable_device(); | |
306 | pci_set_power_state(pci_choose_state()); | |
307 | } | |
308 | ||
309 | .resume() | |
310 | { | |
311 | pci_set_power_state(PCI_D0); | |
312 | pci_restore_state(); | |
313 | /* device's irq possibly is changed, driver should take care */ | |
314 | pci_enable_device(); | |
315 | pci_set_master(); | |
316 | ||
317 | /* if using MSI, device's vector possibly is changed */ | |
318 | pci_enable_msi(); | |
319 | ||
320 | request_irq(); | |
321 | /* driver specific operations; */ | |
322 | } | |
323 | ||
324 | This is a typical implementation. Drivers can slightly change the order | |
325 | of the operations in the implementation, ignore some operations or add | |
fff9289b | 326 | more driver specific operations in it, but drivers should do something like |
21d6b7e1 | 327 | this on the whole. |
328 | ||
1da177e4 LT |
329 | 5. Resources |
330 | ~~~~~~~~~~~~ | |
331 | ||
332 | PCI Local Bus Specification | |
333 | PCI Bus Power Management Interface Specification | |
334 | ||
98766fbe | 335 | http://www.pcisig.com |
1da177e4 | 336 |