]>
Commit | Line | Data |
---|---|---|
c4d2ae96 MR |
1 | DMAengine controller documentation |
2 | ================================== | |
3 | ||
4 | Hardware Introduction | |
5 | +++++++++++++++++++++ | |
6 | ||
7 | Most of the Slave DMA controllers have the same general principles of | |
8 | operations. | |
9 | ||
10 | They have a given number of channels to use for the DMA transfers, and | |
11 | a given number of requests lines. | |
12 | ||
13 | Requests and channels are pretty much orthogonal. Channels can be used | |
14 | to serve several to any requests. To simplify, channels are the | |
15 | entities that will be doing the copy, and requests what endpoints are | |
16 | involved. | |
17 | ||
18 | The request lines actually correspond to physical lines going from the | |
19 | DMA-eligible devices to the controller itself. Whenever the device | |
20 | will want to start a transfer, it will assert a DMA request (DRQ) by | |
21 | asserting that request line. | |
22 | ||
23 | A very simple DMA controller would only take into account a single | |
24 | parameter: the transfer size. At each clock cycle, it would transfer a | |
25 | byte of data from one buffer to another, until the transfer size has | |
26 | been reached. | |
27 | ||
28 | That wouldn't work well in the real world, since slave devices might | |
29 | require a specific number of bits to be transferred in a single | |
30 | cycle. For example, we may want to transfer as much data as the | |
31 | physical bus allows to maximize performances when doing a simple | |
32 | memory copy operation, but our audio device could have a narrower FIFO | |
33 | that requires data to be written exactly 16 or 24 bits at a time. This | |
34 | is why most if not all of the DMA controllers can adjust this, using a | |
35 | parameter called the transfer width. | |
36 | ||
37 | Moreover, some DMA controllers, whenever the RAM is used as a source | |
38 | or destination, can group the reads or writes in memory into a buffer, | |
39 | so instead of having a lot of small memory accesses, which is not | |
40 | really efficient, you'll get several bigger transfers. This is done | |
41 | using a parameter called the burst size, that defines how many single | |
42 | reads/writes it's allowed to do without the controller splitting the | |
43 | transfer into smaller sub-transfers. | |
44 | ||
45 | Our theoretical DMA controller would then only be able to do transfers | |
46 | that involve a single contiguous block of data. However, some of the | |
47 | transfers we usually have are not, and want to copy data from | |
48 | non-contiguous buffers to a contiguous buffer, which is called | |
49 | scatter-gather. | |
50 | ||
51 | DMAEngine, at least for mem2dev transfers, require support for | |
52 | scatter-gather. So we're left with two cases here: either we have a | |
53 | quite simple DMA controller that doesn't support it, and we'll have to | |
54 | implement it in software, or we have a more advanced DMA controller, | |
55 | that implements in hardware scatter-gather. | |
56 | ||
57 | The latter are usually programmed using a collection of chunks to | |
58 | transfer, and whenever the transfer is started, the controller will go | |
59 | over that collection, doing whatever we programmed there. | |
60 | ||
61 | This collection is usually either a table or a linked list. You will | |
62 | then push either the address of the table and its number of elements, | |
63 | or the first item of the list to one channel of the DMA controller, | |
64 | and whenever a DRQ will be asserted, it will go through the collection | |
65 | to know where to fetch the data from. | |
66 | ||
67 | Either way, the format of this collection is completely dependent on | |
68 | your hardware. Each DMA controller will require a different structure, | |
69 | but all of them will require, for every chunk, at least the source and | |
70 | destination addresses, whether it should increment these addresses or | |
71 | not and the three parameters we saw earlier: the burst size, the | |
72 | transfer width and the transfer size. | |
73 | ||
74 | The one last thing is that usually, slave devices won't issue DRQ by | |
75 | default, and you have to enable this in your slave device driver first | |
76 | whenever you're willing to use DMA. | |
77 | ||
78 | These were just the general memory-to-memory (also called mem2mem) or | |
79 | memory-to-device (mem2dev) kind of transfers. Most devices often | |
80 | support other kind of transfers or memory operations that dmaengine | |
81 | support and will be detailed later in this document. | |
82 | ||
83 | DMA Support in Linux | |
84 | ++++++++++++++++++++ | |
85 | ||
86 | Historically, DMA controller drivers have been implemented using the | |
87 | async TX API, to offload operations such as memory copy, XOR, | |
88 | cryptography, etc., basically any memory to memory operation. | |
89 | ||
90 | Over time, the need for memory to device transfers arose, and | |
91 | dmaengine was extended. Nowadays, the async TX API is written as a | |
92 | layer on top of dmaengine, and acts as a client. Still, dmaengine | |
93 | accommodates that API in some cases, and made some design choices to | |
94 | ensure that it stayed compatible. | |
95 | ||
96 | For more information on the Async TX API, please look the relevant | |
97 | documentation file in Documentation/crypto/async-tx-api.txt. | |
98 | ||
99 | DMAEngine Registration | |
100 | ++++++++++++++++++++++ | |
101 | ||
102 | struct dma_device Initialization | |
103 | -------------------------------- | |
104 | ||
105 | Just like any other kernel framework, the whole DMAEngine registration | |
106 | relies on the driver filling a structure and registering against the | |
107 | framework. In our case, that structure is dma_device. | |
108 | ||
109 | The first thing you need to do in your driver is to allocate this | |
110 | structure. Any of the usual memory allocators will do, but you'll also | |
111 | need to initialize a few fields in there: | |
112 | ||
113 | * channels: should be initialized as a list using the | |
114 | INIT_LIST_HEAD macro for example | |
115 | ||
1faab1f2 MR |
116 | * src_addr_widths: |
117 | - should contain a bitmask of the supported source transfer width | |
118 | ||
119 | * dst_addr_widths: | |
120 | - should contain a bitmask of the supported destination transfer | |
121 | width | |
122 | ||
123 | * directions: | |
124 | - should contain a bitmask of the supported slave directions | |
125 | (i.e. excluding mem2mem transfers) | |
126 | ||
127 | * residue_granularity: | |
128 | - Granularity of the transfer residue reported to dma_set_residue. | |
129 | - This can be either: | |
130 | + Descriptor | |
131 | -> Your device doesn't support any kind of residue | |
132 | reporting. The framework will only know that a particular | |
133 | transaction descriptor is done. | |
134 | + Segment | |
135 | -> Your device is able to report which chunks have been | |
136 | transferred | |
137 | + Burst | |
138 | -> Your device is able to report which burst have been | |
139 | transferred | |
140 | ||
c4d2ae96 MR |
141 | * dev: should hold the pointer to the struct device associated |
142 | to your current driver instance. | |
143 | ||
144 | Supported transaction types | |
145 | --------------------------- | |
146 | ||
147 | The next thing you need is to set which transaction types your device | |
148 | (and driver) supports. | |
149 | ||
150 | Our dma_device structure has a field called cap_mask that holds the | |
151 | various types of transaction supported, and you need to modify this | |
152 | mask using the dma_cap_set function, with various flags depending on | |
153 | transaction types you support as an argument. | |
154 | ||
155 | All those capabilities are defined in the dma_transaction_type enum, | |
156 | in include/linux/dmaengine.h | |
157 | ||
158 | Currently, the types available are: | |
159 | * DMA_MEMCPY | |
160 | - The device is able to do memory to memory copies | |
161 | ||
162 | * DMA_XOR | |
163 | - The device is able to perform XOR operations on memory areas | |
164 | - Used to accelerate XOR intensive tasks, such as RAID5 | |
165 | ||
166 | * DMA_XOR_VAL | |
167 | - The device is able to perform parity check using the XOR | |
168 | algorithm against a memory buffer. | |
169 | ||
170 | * DMA_PQ | |
171 | - The device is able to perform RAID6 P+Q computations, P being a | |
172 | simple XOR, and Q being a Reed-Solomon algorithm. | |
173 | ||
174 | * DMA_PQ_VAL | |
175 | - The device is able to perform parity check using RAID6 P+Q | |
176 | algorithm against a memory buffer. | |
177 | ||
178 | * DMA_INTERRUPT | |
179 | - The device is able to trigger a dummy transfer that will | |
180 | generate periodic interrupts | |
181 | - Used by the client drivers to register a callback that will be | |
182 | called on a regular basis through the DMA controller interrupt | |
183 | ||
184 | * DMA_SG | |
185 | - The device supports memory to memory scatter-gather | |
186 | transfers. | |
187 | - Even though a plain memcpy can look like a particular case of a | |
188 | scatter-gather transfer, with a single chunk to transfer, it's a | |
189 | distinct transaction type in the mem2mem transfers case | |
190 | ||
191 | * DMA_PRIVATE | |
192 | - The devices only supports slave transfers, and as such isn't | |
193 | available for async transfers. | |
194 | ||
195 | * DMA_ASYNC_TX | |
196 | - Must not be set by the device, and will be set by the framework | |
197 | if needed | |
198 | - /* TODO: What is it about? */ | |
199 | ||
200 | * DMA_SLAVE | |
201 | - The device can handle device to memory transfers, including | |
202 | scatter-gather transfers. | |
203 | - While in the mem2mem case we were having two distinct types to | |
204 | deal with a single chunk to copy or a collection of them, here, | |
205 | we just have a single transaction type that is supposed to | |
206 | handle both. | |
207 | - If you want to transfer a single contiguous memory buffer, | |
208 | simply build a scatter list with only one item. | |
209 | ||
210 | * DMA_CYCLIC | |
211 | - The device can handle cyclic transfers. | |
212 | - A cyclic transfer is a transfer where the chunk collection will | |
213 | loop over itself, with the last item pointing to the first. | |
214 | - It's usually used for audio transfers, where you want to operate | |
215 | on a single ring buffer that you will fill with your audio data. | |
216 | ||
217 | * DMA_INTERLEAVE | |
218 | - The device supports interleaved transfer. | |
219 | - These transfers can transfer data from a non-contiguous buffer | |
220 | to a non-contiguous buffer, opposed to DMA_SLAVE that can | |
221 | transfer data from a non-contiguous data set to a continuous | |
222 | destination buffer. | |
223 | - It's usually used for 2d content transfers, in which case you | |
224 | want to transfer a portion of uncompressed data directly to the | |
225 | display to print it | |
226 | ||
227 | These various types will also affect how the source and destination | |
228 | addresses change over time. | |
229 | ||
230 | Addresses pointing to RAM are typically incremented (or decremented) | |
231 | after each transfer. In case of a ring buffer, they may loop | |
232 | (DMA_CYCLIC). Addresses pointing to a device's register (e.g. a FIFO) | |
233 | are typically fixed. | |
234 | ||
235 | Device operations | |
236 | ----------------- | |
237 | ||
238 | Our dma_device structure also requires a few function pointers in | |
239 | order to implement the actual logic, now that we described what | |
240 | operations we were able to perform. | |
241 | ||
242 | The functions that we have to fill in there, and hence have to | |
243 | implement, obviously depend on the transaction types you reported as | |
244 | supported. | |
245 | ||
246 | * device_alloc_chan_resources | |
247 | * device_free_chan_resources | |
248 | - These functions will be called whenever a driver will call | |
249 | dma_request_channel or dma_release_channel for the first/last | |
250 | time on the channel associated to that driver. | |
251 | - They are in charge of allocating/freeing all the needed | |
252 | resources in order for that channel to be useful for your | |
253 | driver. | |
254 | - These functions can sleep. | |
255 | ||
256 | * device_prep_dma_* | |
257 | - These functions are matching the capabilities you registered | |
258 | previously. | |
259 | - These functions all take the buffer or the scatterlist relevant | |
260 | for the transfer being prepared, and should create a hardware | |
261 | descriptor or a list of hardware descriptors from it | |
262 | - These functions can be called from an interrupt context | |
263 | - Any allocation you might do should be using the GFP_NOWAIT | |
264 | flag, in order not to potentially sleep, but without depleting | |
265 | the emergency pool either. | |
266 | - Drivers should try to pre-allocate any memory they might need | |
267 | during the transfer setup at probe time to avoid putting to | |
268 | much pressure on the nowait allocator. | |
269 | ||
270 | - It should return a unique instance of the | |
271 | dma_async_tx_descriptor structure, that further represents this | |
272 | particular transfer. | |
273 | ||
274 | - This structure can be initialized using the function | |
275 | dma_async_tx_descriptor_init. | |
276 | - You'll also need to set two fields in this structure: | |
277 | + flags: | |
278 | TODO: Can it be modified by the driver itself, or | |
279 | should it be always the flags passed in the arguments | |
280 | ||
281 | + tx_submit: A pointer to a function you have to implement, | |
282 | that is supposed to push the current | |
283 | transaction descriptor to a pending queue, waiting | |
284 | for issue_pending to be called. | |
285 | ||
286 | * device_issue_pending | |
287 | - Takes the first transaction descriptor in the pending queue, | |
288 | and starts the transfer. Whenever that transfer is done, it | |
289 | should move to the next transaction in the list. | |
290 | - This function can be called in an interrupt context | |
291 | ||
292 | * device_tx_status | |
293 | - Should report the bytes left to go over on the given channel | |
294 | - Should only care about the transaction descriptor passed as | |
295 | argument, not the currently active one on a given channel | |
296 | - The tx_state argument might be NULL | |
297 | - Should use dma_set_residue to report it | |
298 | - In the case of a cyclic transfer, it should only take into | |
299 | account the current period. | |
300 | - This function can be called in an interrupt context. | |
301 | ||
1faab1f2 MR |
302 | * device_config |
303 | - Reconfigures the channel with the configuration given as | |
304 | argument | |
305 | - This command should NOT perform synchronously, or on any | |
306 | currently queued transfers, but only on subsequent ones | |
307 | - In this case, the function will receive a dma_slave_config | |
308 | structure pointer as an argument, that will detail which | |
309 | configuration to use. | |
310 | - Even though that structure contains a direction field, this | |
311 | field is deprecated in favor of the direction argument given to | |
312 | the prep_* functions | |
6269591b VK |
313 | - This call is mandatory for slave operations only. This should NOT be |
314 | set or expected to be set for memcpy operations. | |
315 | If a driver support both, it should use this call for slave | |
316 | operations only and not for memcpy ones. | |
1faab1f2 MR |
317 | |
318 | * device_pause | |
319 | - Pauses a transfer on the channel | |
320 | - This command should operate synchronously on the channel, | |
321 | pausing right away the work of the given channel | |
322 | ||
323 | * device_resume | |
324 | - Resumes a transfer on the channel | |
325 | - This command should operate synchronously on the channel, | |
326 | pausing right away the work of the given channel | |
327 | ||
328 | * device_terminate_all | |
329 | - Aborts all the pending and ongoing transfers on the channel | |
330 | - This command should operate synchronously on the channel, | |
331 | terminating right away all the channels | |
c4d2ae96 MR |
332 | |
333 | Misc notes (stuff that should be documented, but don't really know | |
334 | where to put them) | |
335 | ------------------------------------------------------------------ | |
336 | * dma_run_dependencies | |
337 | - Should be called at the end of an async TX transfer, and can be | |
338 | ignored in the slave transfers case. | |
339 | - Makes sure that dependent operations are run before marking it | |
340 | as complete. | |
341 | ||
342 | * dma_cookie_t | |
343 | - it's a DMA transaction ID that will increment over time. | |
344 | - Not really relevant any more since the introduction of virt-dma | |
345 | that abstracts it away. | |
346 | ||
347 | * DMA_CTRL_ACK | |
06f10e2f VK |
348 | - If clear, the descriptor cannot be reused by provider until the |
349 | client acknowledges receipt, i.e. has has a chance to establish any | |
350 | dependency chains | |
351 | - This can be acked by invoking async_tx_ack() | |
352 | - If set, does not mean descriptor can be reused | |
353 | ||
60884dde VK |
354 | * DMA_CTRL_REUSE |
355 | - If set, the descriptor can be reused after being completed. It should | |
356 | not be freed by provider if this flag is set. | |
357 | - The descriptor should be prepared for reuse by invoking | |
358 | dmaengine_desc_set_reuse() which will set DMA_CTRL_REUSE. | |
359 | - dmaengine_desc_set_reuse() will succeed only when channel support | |
360 | reusable descriptor as exhibited by capablities | |
361 | - As a consequence, if a device driver wants to skip the dma_map_sg() and | |
362 | dma_unmap_sg() in between 2 transfers, because the DMA'd data wasn't used, | |
363 | it can resubmit the transfer right after its completion. | |
364 | - Descriptor can be freed in few ways | |
365 | - Clearing DMA_CTRL_REUSE by invoking dmaengine_desc_clear_reuse() | |
366 | and submitting for last txn | |
367 | - Explicitly invoking dmaengine_desc_free(), this can succeed only | |
368 | when DMA_CTRL_REUSE is already set | |
369 | - Terminating the channel | |
370 | ||
c4d2ae96 MR |
371 | |
372 | General Design Notes | |
373 | -------------------- | |
374 | ||
375 | Most of the DMAEngine drivers you'll see are based on a similar design | |
376 | that handles the end of transfer interrupts in the handler, but defer | |
377 | most work to a tasklet, including the start of a new transfer whenever | |
378 | the previous transfer ended. | |
379 | ||
380 | This is a rather inefficient design though, because the inter-transfer | |
381 | latency will be not only the interrupt latency, but also the | |
382 | scheduling latency of the tasklet, which will leave the channel idle | |
383 | in between, which will slow down the global transfer rate. | |
384 | ||
385 | You should avoid this kind of practice, and instead of electing a new | |
386 | transfer in your tasklet, move that part to the interrupt handler in | |
387 | order to have a shorter idle window (that we can't really avoid | |
388 | anyway). | |
389 | ||
390 | Glossary | |
391 | -------- | |
392 | ||
393 | Burst: A number of consecutive read or write operations | |
394 | that can be queued to buffers before being flushed to | |
395 | memory. | |
396 | Chunk: A contiguous collection of bursts | |
397 | Transfer: A collection of chunks (be it contiguous or not) |