]>
Commit | Line | Data |
---|---|---|
e9df12c3 TI |
1 | ========================= |
2 | ALSA Compress-Offload API | |
3 | ========================= | |
4 | ||
5 | Pierre-Louis.Bossart <pierre-louis.bossart@linux.intel.com> | |
6 | ||
7 | Vinod Koul <vinod.koul@linux.intel.com> | |
57bd9b8d | 8 | |
57bd9b8d | 9 | |
e9df12c3 TI |
10 | Overview |
11 | ======== | |
57bd9b8d PLB |
12 | Since its early days, the ALSA API was defined with PCM support or |
13 | constant bitrates payloads such as IEC61937 in mind. Arguments and | |
14 | returned values in frames are the norm, making it a challenge to | |
15 | extend the existing API to compressed data streams. | |
16 | ||
17 | In recent years, audio digital signal processors (DSP) were integrated | |
18 | in system-on-chip designs, and DSPs are also integrated in audio | |
19 | codecs. Processing compressed data on such DSPs results in a dramatic | |
20 | reduction of power consumption compared to host-based | |
21 | processing. Support for such hardware has not been very good in Linux, | |
22 | mostly because of a lack of a generic API available in the mainline | |
23 | kernel. | |
24 | ||
c94bed8e | 25 | Rather than requiring a compatibility break with an API change of the |
57bd9b8d PLB |
26 | ALSA PCM interface, a new 'Compressed Data' API is introduced to |
27 | provide a control and data-streaming interface for audio DSPs. | |
28 | ||
29 | The design of this API was inspired by the 2-year experience with the | |
30 | Intel Moorestown SOC, with many corrections required to upstream the | |
31 | API in the mainline kernel instead of the staging tree and make it | |
32 | usable by others. | |
33 | ||
57bd9b8d | 34 | |
e9df12c3 TI |
35 | Requirements |
36 | ============ | |
57bd9b8d PLB |
37 | The main requirements are: |
38 | ||
39 | - separation between byte counts and time. Compressed formats may have | |
40 | a header per file, per frame, or no header at all. The payload size | |
41 | may vary from frame-to-frame. As a result, it is not possible to | |
42 | estimate reliably the duration of audio buffers when handling | |
43 | compressed data. Dedicated mechanisms are required to allow for | |
44 | reliable audio-video synchronization, which requires precise | |
45 | reporting of the number of samples rendered at any given time. | |
46 | ||
47 | - Handling of multiple formats. PCM data only requires a specification | |
48 | of the sampling rate, number of channels and bits per sample. In | |
49 | contrast, compressed data comes in a variety of formats. Audio DSPs | |
50 | may also provide support for a limited number of audio encoders and | |
51 | decoders embedded in firmware, or may support more choices through | |
52 | dynamic download of libraries. | |
53 | ||
54 | - Focus on main formats. This API provides support for the most | |
55 | popular formats used for audio and video capture and playback. It is | |
56 | likely that as audio compression technology advances, new formats | |
57 | will be added. | |
58 | ||
59 | - Handling of multiple configurations. Even for a given format like | |
60 | AAC, some implementations may support AAC multichannel but HE-AAC | |
61 | stereo. Likewise WMA10 level M3 may require too much memory and cpu | |
62 | cycles. The new API needs to provide a generic way of listing these | |
63 | formats. | |
64 | ||
65 | - Rendering/Grabbing only. This API does not provide any means of | |
66 | hardware acceleration, where PCM samples are provided back to | |
67 | user-space for additional processing. This API focuses instead on | |
68 | streaming compressed data to a DSP, with the assumption that the | |
69 | decoded samples are routed to a physical output or logical back-end. | |
70 | ||
e9df12c3 | 71 | - Complexity hiding. Existing user-space multimedia frameworks all |
57bd9b8d PLB |
72 | have existing enums/structures for each compressed format. This new |
73 | API assumes the existence of a platform-specific compatibility layer | |
74 | to expose, translate and make use of the capabilities of the audio | |
75 | DSP, eg. Android HAL or PulseAudio sinks. By construction, regular | |
76 | applications are not supposed to make use of this API. | |
77 | ||
78 | ||
79 | Design | |
e9df12c3 | 80 | ====== |
c9f3f2d8 | 81 | The new API shares a number of concepts with the PCM API for flow |
57bd9b8d PLB |
82 | control. Start, pause, resume, drain and stop commands have the same |
83 | semantics no matter what the content is. | |
84 | ||
85 | The concept of memory ring buffer divided in a set of fragments is | |
86 | borrowed from the ALSA PCM API. However, only sizes in bytes can be | |
87 | specified. | |
88 | ||
89 | Seeks/trick modes are assumed to be handled by the host. | |
90 | ||
91 | The notion of rewinds/forwards is not supported. Data committed to the | |
92 | ring buffer cannot be invalidated, except when dropping all buffers. | |
93 | ||
94 | The Compressed Data API does not make any assumptions on how the data | |
95 | is transmitted to the audio DSP. DMA transfers from main memory to an | |
96 | embedded audio cluster or to a SPI interface for external DSPs are | |
97 | possible. As in the ALSA PCM case, a core set of routines is exposed; | |
98 | each driver implementer will have to write support for a set of | |
99 | mandatory routines and possibly make use of optional ones. | |
100 | ||
101 | The main additions are | |
102 | ||
e9df12c3 TI |
103 | get_caps |
104 | This routine returns the list of audio formats supported. Querying the | |
105 | codecs on a capture stream will return encoders, decoders will be | |
106 | listed for playback streams. | |
107 | ||
108 | get_codec_caps | |
109 | For each codec, this routine returns a list of | |
110 | capabilities. The intent is to make sure all the capabilities | |
111 | correspond to valid settings, and to minimize the risks of | |
112 | configuration failures. For example, for a complex codec such as AAC, | |
113 | the number of channels supported may depend on a specific profile. If | |
114 | the capabilities were exposed with a single descriptor, it may happen | |
115 | that a specific combination of profiles/channels/formats may not be | |
116 | supported. Likewise, embedded DSPs have limited memory and cpu cycles, | |
117 | it is likely that some implementations make the list of capabilities | |
118 | dynamic and dependent on existing workloads. In addition to codec | |
119 | settings, this routine returns the minimum buffer size handled by the | |
120 | implementation. This information can be a function of the DMA buffer | |
121 | sizes, the number of bytes required to synchronize, etc, and can be | |
122 | used by userspace to define how much needs to be written in the ring | |
123 | buffer before playback can start. | |
124 | ||
125 | set_params | |
126 | This routine sets the configuration chosen for a specific codec. The | |
127 | most important field in the parameters is the codec type; in most | |
128 | cases decoders will ignore other fields, while encoders will strictly | |
129 | comply to the settings | |
130 | ||
131 | get_params | |
132 | This routines returns the actual settings used by the DSP. Changes to | |
133 | the settings should remain the exception. | |
134 | ||
135 | get_timestamp | |
136 | The timestamp becomes a multiple field structure. It lists the number | |
137 | of bytes transferred, the number of samples processed and the number | |
138 | of samples rendered/grabbed. All these values can be used to determine | |
139 | the average bitrate, figure out if the ring buffer needs to be | |
140 | refilled or the delay due to decoding/encoding/io on the DSP. | |
57bd9b8d PLB |
141 | |
142 | Note that the list of codecs/profiles/modes was derived from the | |
143 | OpenMAX AL specification instead of reinventing the wheel. | |
144 | Modifications include: | |
145 | - Addition of FLAC and IEC formats | |
146 | - Merge of encoder/decoder capabilities | |
147 | - Profiles/modes listed as bitmasks to make descriptors more compact | |
148 | - Addition of set_params for decoders (missing in OpenMAX AL) | |
149 | - Addition of AMR/AMR-WB encoding modes (missing in OpenMAX AL) | |
150 | - Addition of format information for WMA | |
151 | - Addition of encoding options when required (derived from OpenMAX IL) | |
152 | - Addition of rateControlSupported (missing in OpenMAX AL) | |
153 | ||
e9df12c3 | 154 | |
9727b490 JK |
155 | Gapless Playback |
156 | ================ | |
157 | When playing thru an album, the decoders have the ability to skip the encoder | |
158 | delay and padding and directly move from one track content to another. The end | |
8d84c197 | 159 | user can perceive this as gapless playback as we don't have silence while |
9727b490 JK |
160 | switching from one track to another |
161 | ||
162 | Also, there might be low-intensity noises due to encoding. Perfect gapless is | |
163 | difficult to reach with all types of compressed data, but works fine with most | |
164 | music content. The decoder needs to know the encoder delay and encoder padding. | |
165 | So we need to pass this to DSP. This metadata is extracted from ID3/MP4 headers | |
166 | and are not present by default in the bitstream, hence the need for a new | |
167 | interface to pass this information to the DSP. Also DSP and userspace needs to | |
168 | switch from one track to another and start using data for second track. | |
169 | ||
170 | The main additions are: | |
171 | ||
e9df12c3 TI |
172 | set_metadata |
173 | This routine sets the encoder delay and encoder padding. This can be used by | |
174 | decoder to strip the silence. This needs to be set before the data in the track | |
175 | is written. | |
9727b490 | 176 | |
e9df12c3 TI |
177 | set_next_track |
178 | This routine tells DSP that metadata and write operation sent after this would | |
179 | correspond to subsequent track | |
9727b490 | 180 | |
e9df12c3 TI |
181 | partial drain |
182 | This is called when end of file is reached. The userspace can inform DSP that | |
183 | EOF is reached and now DSP can start skipping padding delay. Also next write | |
184 | data would belong to next track | |
9727b490 JK |
185 | |
186 | Sequence flow for gapless would be: | |
187 | - Open | |
188 | - Get caps / codec caps | |
189 | - Set params | |
190 | - Set metadata of the first track | |
191 | - Fill data of the first track | |
192 | - Trigger start | |
193 | - User-space finished sending all, | |
242658ff | 194 | - Indicate next track data by sending set_next_track |
9727b490 JK |
195 | - Set metadata of the next track |
196 | - then call partial_drain to flush most of buffer in DSP | |
197 | - Fill data of the next track | |
198 | - DSP switches to second track | |
e9df12c3 | 199 | |
9727b490 JK |
200 | (note: order for partial_drain and write for next track can be reversed as well) |
201 | ||
57bd9b8d | 202 | |
e9df12c3 TI |
203 | Not supported |
204 | ============= | |
57bd9b8d PLB |
205 | - Support for VoIP/circuit-switched calls is not the target of this |
206 | API. Support for dynamic bit-rate changes would require a tight | |
207 | coupling between the DSP and the host stack, limiting power savings. | |
208 | ||
209 | - Packet-loss concealment is not supported. This would require an | |
210 | additional interface to let the decoder synthesize data when frames | |
211 | are lost during transmission. This may be added in the future. | |
212 | ||
213 | - Volume control/routing is not handled by this API. Devices exposing a | |
214 | compressed data interface will be considered as regular ALSA devices; | |
215 | volume changes and routing information will be provided with regular | |
216 | ALSA kcontrols. | |
217 | ||
218 | - Embedded audio effects. Such effects should be enabled in the same | |
219 | manner, no matter if the input was PCM or compressed. | |
220 | ||
221 | - multichannel IEC encoding. Unclear if this is required. | |
222 | ||
223 | - Encoding/decoding acceleration is not supported as mentioned | |
224 | above. It is possible to route the output of a decoder to a capture | |
225 | stream, or even implement transcoding capabilities. This routing | |
226 | would be enabled with ALSA kcontrols. | |
227 | ||
228 | - Audio policy/resource management. This API does not provide any | |
b327d25c | 229 | hooks to query the utilization of the audio DSP, nor any preemption |
57bd9b8d PLB |
230 | mechanisms. |
231 | ||
b327d25c | 232 | - No notion of underrun/overrun. Since the bytes written are compressed |
57bd9b8d | 233 | in nature and data written/read doesn't translate directly to |
b327d25c | 234 | rendered output in time, this does not deal with underrun/overrun and |
57bd9b8d PLB |
235 | maybe dealt in user-library |
236 | ||
e9df12c3 TI |
237 | |
238 | Credits | |
239 | ======= | |
57bd9b8d PLB |
240 | - Mark Brown and Liam Girdwood for discussions on the need for this API |
241 | - Harsha Priya for her work on intel_sst compressed API | |
242 | - Rakesh Ughreja for valuable feedback | |
243 | - Sing Nallasellan, Sikkandar Madar and Prasanna Samaga for | |
244 | demonstrating and quantifying the benefits of audio offload on a | |
245 | real platform. |