[mirror_ubuntu-bionic-kernel.git] / Documentation / sound / designs / compress-offload.rst

=========================
ALSA Compress-Offload API
=========================

Pierre-Louis.Bossart <pierre-louis.bossart@linux.intel.com>

Vinod Koul <vinod.koul@linux.intel.com>


Overview
========
Since its early days, the ALSA API was defined with PCM support or
constant bitrates payloads such as IEC61937 in mind. Arguments and
returned values in frames are the norm, making it a challenge to
extend the existing API to compressed data streams.

In recent years, audio digital signal processors (DSP) were integrated
in system-on-chip designs, and DSPs are also integrated in audio
codecs. Processing compressed data on such DSPs results in a dramatic
reduction of power consumption compared to host-based
processing. Support for such hardware has not been very good in Linux,
mostly because of a lack of a generic API available in the mainline
kernel.

Rather than requiring a compatibility break with an API change of the
ALSA PCM interface, a new 'Compressed Data' API is introduced to
provide a control and data-streaming interface for audio DSPs.

The design of this API was inspired by the 2-year experience with the
Intel Moorestown SOC, with many corrections required to upstream the
API in the mainline kernel instead of the staging tree and make it
usable by others.


Requirements
============
The main requirements are:

- separation between byte counts and time. Compressed formats may have
  a header per file, per frame, or no header at all. The payload size
  may vary from frame-to-frame. As a result, it is not possible to
  estimate reliably the duration of audio buffers when handling
  compressed data. Dedicated mechanisms are required to allow for
  reliable audio-video synchronization, which requires precise
  reporting of the number of samples rendered at any given time.

- Handling of multiple formats. PCM data only requires a specification
  of the sampling rate, number of channels and bits per sample. In
  contrast, compressed data comes in a variety of formats. Audio DSPs
  may also provide support for a limited number of audio encoders and
  decoders embedded in firmware, or may support more choices through
  dynamic download of libraries.

- Focus on main formats. This API provides support for the most
  popular formats used for audio and video capture and playback. It is
  likely that as audio compression technology advances, new formats
  will be added.

- Handling of multiple configurations. Even for a given format like
  AAC, some implementations may support AAC multichannel but HE-AAC
  stereo. Likewise WMA10 level M3 may require too much memory and cpu
  cycles. The new API needs to provide a generic way of listing these
  formats.

- Rendering/Grabbing only. This API does not provide any means of
  hardware acceleration, where PCM samples are provided back to
  user-space for additional processing. This API focuses instead on
  streaming compressed data to a DSP, with the assumption that the
  decoded samples are routed to a physical output or logical back-end.

- Complexity hiding. Existing user-space multimedia frameworks all
  have existing enums/structures for each compressed format. This new
  API assumes the existence of a platform-specific compatibility layer
  to expose, translate and make use of the capabilities of the audio
  DSP, eg. Android HAL or PulseAudio sinks. By construction, regular
  applications are not supposed to make use of this API.


Design
======
The new API shares a number of concepts with the PCM API for flow
control. Start, pause, resume, drain and stop commands have the same
semantics no matter what the content is.

The concept of memory ring buffer divided in a set of fragments is
borrowed from the ALSA PCM API. However, only sizes in bytes can be
specified.

Seeks/trick modes are assumed to be handled by the host.

The notion of rewinds/forwards is not supported. Data committed to the
ring buffer cannot be invalidated, except when dropping all buffers.

The Compressed Data API does not make any assumptions on how the data
is transmitted to the audio DSP. DMA transfers from main memory to an
embedded audio cluster or to a SPI interface for external DSPs are
possible. As in the ALSA PCM case, a core set of routines is exposed;
each driver implementer will have to write support for a set of
mandatory routines and possibly make use of optional ones.

The main additions are

get_caps
  This routine returns the list of audio formats supported. Querying the
  codecs on a capture stream will return encoders, decoders will be
  listed for playback streams.

get_codec_caps
  For each codec, this routine returns a list of
  capabilities. The intent is to make sure all the capabilities
  correspond to valid settings, and to minimize the risks of
  configuration failures. For example, for a complex codec such as AAC,
  the number of channels supported may depend on a specific profile. If
  the capabilities were exposed with a single descriptor, it may happen
  that a specific combination of profiles/channels/formats may not be
  supported. Likewise, embedded DSPs have limited memory and cpu cycles,
  it is likely that some implementations make the list of capabilities
  dynamic and dependent on existing workloads. In addition to codec
  settings, this routine returns the minimum buffer size handled by the
  implementation. This information can be a function of the DMA buffer
  sizes, the number of bytes required to synchronize, etc, and can be
  used by userspace to define how much needs to be written in the ring
  buffer before playback can start.

set_params
  This routine sets the configuration chosen for a specific codec. The
  most important field in the parameters is the codec type; in most
  cases decoders will ignore other fields, while encoders will strictly
  comply to the settings

get_params
  This routines returns the actual settings used by the DSP. Changes to
  the settings should remain the exception.

get_timestamp
  The timestamp becomes a multiple field structure. It lists the number
  of bytes transferred, the number of samples processed and the number
  of samples rendered/grabbed. All these values can be used to determine
  the average bitrate, figure out if the ring buffer needs to be
  refilled or the delay due to decoding/encoding/io on the DSP.

Note that the list of codecs/profiles/modes was derived from the
OpenMAX AL specification instead of reinventing the wheel.
Modifications include:
- Addition of FLAC and IEC formats
- Merge of encoder/decoder capabilities
- Profiles/modes listed as bitmasks to make descriptors more compact
- Addition of set_params for decoders (missing in OpenMAX AL)
- Addition of AMR/AMR-WB encoding modes (missing in OpenMAX AL)
- Addition of format information for WMA
- Addition of encoding options when required (derived from OpenMAX IL)
- Addition of rateControlSupported (missing in OpenMAX AL)


Gapless Playback
================
When playing thru an album, the decoders have the ability to skip the encoder
delay and padding and directly move from one track content to another. The end
user can perceive this as gapless playback as we don't have silence while
switching from one track to another

Also, there might be low-intensity noises due to encoding. Perfect gapless is
difficult to reach with all types of compressed data, but works fine with most
music content. The decoder needs to know the encoder delay and encoder padding.
So we need to pass this to DSP. This metadata is extracted from ID3/MP4 headers
and are not present by default in the bitstream, hence the need for a new
interface to pass this information to the DSP. Also DSP and userspace needs to
switch from one track to another and start using data for second track.

The main additions are:

set_metadata
  This routine sets the encoder delay and encoder padding. This can be used by
  decoder to strip the silence. This needs to be set before the data in the track
  is written.

set_next_track
  This routine tells DSP that metadata and write operation sent after this would
  correspond to subsequent track

partial drain
  This is called when end of file is reached. The userspace can inform DSP that
  EOF is reached and now DSP can start skipping padding delay. Also next write
  data would belong to next track

Sequence flow for gapless would be:
- Open
- Get caps / codec caps
- Set params
- Set metadata of the first track
- Fill data of the first track
- Trigger start
- User-space finished sending all,
- Indicate next track data by sending set_next_track
- Set metadata of the next track
- then call partial_drain to flush most of buffer in DSP
- Fill data of the next track
- DSP switches to second track

(note: order for partial_drain and write for next track can be reversed as well)


Not supported
=============
- Support for VoIP/circuit-switched calls is not the target of this
  API. Support for dynamic bit-rate changes would require a tight
  coupling between the DSP and the host stack, limiting power savings.

- Packet-loss concealment is not supported. This would require an
  additional interface to let the decoder synthesize data when frames
  are lost during transmission. This may be added in the future.

- Volume control/routing is not handled by this API. Devices exposing a
  compressed data interface will be considered as regular ALSA devices;
  volume changes and routing information will be provided with regular
  ALSA kcontrols.

- Embedded audio effects. Such effects should be enabled in the same
  manner, no matter if the input was PCM or compressed.

- multichannel IEC encoding. Unclear if this is required.

- Encoding/decoding acceleration is not supported as mentioned
  above. It is possible to route the output of a decoder to a capture
  stream, or even implement transcoding capabilities. This routing
  would be enabled with ALSA kcontrols.

- Audio policy/resource management. This API does not provide any
  hooks to query the utilization of the audio DSP, nor any preemption
  mechanisms.

- No notion of underrun/overrun. Since the bytes written are compressed
  in nature and data written/read doesn't translate directly to
  rendered output in time, this does not deal with underrun/overrun and
  maybe dealt in user-library


Credits
=======
- Mark Brown and Liam Girdwood for discussions on the need for this API
- Harsha Priya for her work on intel_sst compressed API
- Rakesh Ughreja for valuable feedback
- Sing Nallasellan, Sikkandar Madar and Prasanna Samaga for
  demonstrating and quantifying the benefits of audio offload on a
  real platform.
Commit	Line	Data
e9df12c3 TI	1	=========================
	2	ALSA Compress-Offload API
	3	=========================
	4
	5	Pierre-Louis.Bossart <pierre-louis.bossart@linux.intel.com>
	6
	7	Vinod Koul <vinod.koul@linux.intel.com>
57bd9b8d	8
57bd9b8d	9
e9df12c3 TI	10	Overview
e9df12c3 TI	11	========
57bd9b8d PLB	12	Since its early days, the ALSA API was defined with PCM support or
	13	constant bitrates payloads such as IEC61937 in mind. Arguments and
	14	returned values in frames are the norm, making it a challenge to
	15	extend the existing API to compressed data streams.
	16
	17	In recent years, audio digital signal processors (DSP) were integrated
	18	in system-on-chip designs, and DSPs are also integrated in audio
	19	codecs. Processing compressed data on such DSPs results in a dramatic
	20	reduction of power consumption compared to host-based
	21	processing. Support for such hardware has not been very good in Linux,
	22	mostly because of a lack of a generic API available in the mainline
	23	kernel.
	24
c94bed8e	25	Rather than requiring a compatibility break with an API change of the
57bd9b8d PLB	26	ALSA PCM interface, a new 'Compressed Data' API is introduced to
	27	provide a control and data-streaming interface for audio DSPs.
	28
	29	The design of this API was inspired by the 2-year experience with the
	30	Intel Moorestown SOC, with many corrections required to upstream the
	31	API in the mainline kernel instead of the staging tree and make it
	32	usable by others.
	33
57bd9b8d	34
e9df12c3 TI	35	Requirements
e9df12c3 TI	36	============
57bd9b8d PLB	37	The main requirements are:
	38
	39	- separation between byte counts and time. Compressed formats may have
	40	a header per file, per frame, or no header at all. The payload size
	41	may vary from frame-to-frame. As a result, it is not possible to
	42	estimate reliably the duration of audio buffers when handling
	43	compressed data. Dedicated mechanisms are required to allow for
	44	reliable audio-video synchronization, which requires precise
	45	reporting of the number of samples rendered at any given time.
	46
	47	- Handling of multiple formats. PCM data only requires a specification
	48	of the sampling rate, number of channels and bits per sample. In
	49	contrast, compressed data comes in a variety of formats. Audio DSPs
	50	may also provide support for a limited number of audio encoders and
	51	decoders embedded in firmware, or may support more choices through
	52	dynamic download of libraries.
	53
	54	- Focus on main formats. This API provides support for the most
	55	popular formats used for audio and video capture and playback. It is
	56	likely that as audio compression technology advances, new formats
	57	will be added.
	58
	59	- Handling of multiple configurations. Even for a given format like
	60	AAC, some implementations may support AAC multichannel but HE-AAC
	61	stereo. Likewise WMA10 level M3 may require too much memory and cpu
	62	cycles. The new API needs to provide a generic way of listing these
	63	formats.
	64
	65	- Rendering/Grabbing only. This API does not provide any means of
	66	hardware acceleration, where PCM samples are provided back to
	67	user-space for additional processing. This API focuses instead on
	68	streaming compressed data to a DSP, with the assumption that the
	69	decoded samples are routed to a physical output or logical back-end.
	70
e9df12c3	71	- Complexity hiding. Existing user-space multimedia frameworks all
57bd9b8d PLB	72	have existing enums/structures for each compressed format. This new
	73	API assumes the existence of a platform-specific compatibility layer
	74	to expose, translate and make use of the capabilities of the audio
	75	DSP, eg. Android HAL or PulseAudio sinks. By construction, regular
	76	applications are not supposed to make use of this API.
	77
	78
	79	Design
e9df12c3	80	======
c9f3f2d8	81	The new API shares a number of concepts with the PCM API for flow
57bd9b8d PLB	82	control. Start, pause, resume, drain and stop commands have the same
	83	semantics no matter what the content is.
	84
	85	The concept of memory ring buffer divided in a set of fragments is
	86	borrowed from the ALSA PCM API. However, only sizes in bytes can be
	87	specified.
	88
	89	Seeks/trick modes are assumed to be handled by the host.
	90
	91	The notion of rewinds/forwards is not supported. Data committed to the
	92	ring buffer cannot be invalidated, except when dropping all buffers.
	93
	94	The Compressed Data API does not make any assumptions on how the data
	95	is transmitted to the audio DSP. DMA transfers from main memory to an
	96	embedded audio cluster or to a SPI interface for external DSPs are
	97	possible. As in the ALSA PCM case, a core set of routines is exposed;
	98	each driver implementer will have to write support for a set of
	99	mandatory routines and possibly make use of optional ones.
	100
	101	The main additions are
	102
e9df12c3 TI	103	get_caps
	104	This routine returns the list of audio formats supported. Querying the
	105	codecs on a capture stream will return encoders, decoders will be
	106	listed for playback streams.
	107
	108	get_codec_caps
	109	For each codec, this routine returns a list of
	110	capabilities. The intent is to make sure all the capabilities
	111	correspond to valid settings, and to minimize the risks of
	112	configuration failures. For example, for a complex codec such as AAC,
	113	the number of channels supported may depend on a specific profile. If
	114	the capabilities were exposed with a single descriptor, it may happen
	115	that a specific combination of profiles/channels/formats may not be
	116	supported. Likewise, embedded DSPs have limited memory and cpu cycles,
	117	it is likely that some implementations make the list of capabilities
	118	dynamic and dependent on existing workloads. In addition to codec
	119	settings, this routine returns the minimum buffer size handled by the
	120	implementation. This information can be a function of the DMA buffer
	121	sizes, the number of bytes required to synchronize, etc, and can be
	122	used by userspace to define how much needs to be written in the ring
	123	buffer before playback can start.
	124
	125	set_params
	126	This routine sets the configuration chosen for a specific codec. The
	127	most important field in the parameters is the codec type; in most
	128	cases decoders will ignore other fields, while encoders will strictly
	129	comply to the settings
	130
	131	get_params
	132	This routines returns the actual settings used by the DSP. Changes to
	133	the settings should remain the exception.
	134
	135	get_timestamp
	136	The timestamp becomes a multiple field structure. It lists the number
	137	of bytes transferred, the number of samples processed and the number
	138	of samples rendered/grabbed. All these values can be used to determine
	139	the average bitrate, figure out if the ring buffer needs to be
	140	refilled or the delay due to decoding/encoding/io on the DSP.
57bd9b8d PLB	141
	142	Note that the list of codecs/profiles/modes was derived from the
	143	OpenMAX AL specification instead of reinventing the wheel.
	144	Modifications include:
	145	- Addition of FLAC and IEC formats
	146	- Merge of encoder/decoder capabilities
	147	- Profiles/modes listed as bitmasks to make descriptors more compact
	148	- Addition of set_params for decoders (missing in OpenMAX AL)
	149	- Addition of AMR/AMR-WB encoding modes (missing in OpenMAX AL)
	150	- Addition of format information for WMA
	151	- Addition of encoding options when required (derived from OpenMAX IL)
	152	- Addition of rateControlSupported (missing in OpenMAX AL)
	153
e9df12c3	154
9727b490 JK	155	Gapless Playback
	156	================
	157	When playing thru an album, the decoders have the ability to skip the encoder
	158	delay and padding and directly move from one track content to another. The end
8d84c197	159	user can perceive this as gapless playback as we don't have silence while
9727b490 JK	160	switching from one track to another
	161
	162	Also, there might be low-intensity noises due to encoding. Perfect gapless is
	163	difficult to reach with all types of compressed data, but works fine with most
	164	music content. The decoder needs to know the encoder delay and encoder padding.
	165	So we need to pass this to DSP. This metadata is extracted from ID3/MP4 headers
	166	and are not present by default in the bitstream, hence the need for a new
	167	interface to pass this information to the DSP. Also DSP and userspace needs to
	168	switch from one track to another and start using data for second track.
	169
	170	The main additions are:
	171
e9df12c3 TI	172	set_metadata
	173	This routine sets the encoder delay and encoder padding. This can be used by
	174	decoder to strip the silence. This needs to be set before the data in the track
	175	is written.
9727b490	176
e9df12c3 TI	177	set_next_track
	178	This routine tells DSP that metadata and write operation sent after this would
	179	correspond to subsequent track
9727b490	180
e9df12c3 TI	181	partial drain
	182	This is called when end of file is reached. The userspace can inform DSP that
	183	EOF is reached and now DSP can start skipping padding delay. Also next write
	184	data would belong to next track
9727b490 JK	185
	186	Sequence flow for gapless would be:
	187	- Open
	188	- Get caps / codec caps
	189	- Set params
	190	- Set metadata of the first track
	191	- Fill data of the first track
	192	- Trigger start
	193	- User-space finished sending all,
242658ff	194	- Indicate next track data by sending set_next_track
9727b490 JK	195	- Set metadata of the next track
	196	- then call partial_drain to flush most of buffer in DSP
	197	- Fill data of the next track
	198	- DSP switches to second track
e9df12c3	199
9727b490 JK	200	(note: order for partial_drain and write for next track can be reversed as well)
9727b490 JK	201
57bd9b8d	202
e9df12c3 TI	203	Not supported
e9df12c3 TI	204	=============
57bd9b8d PLB	205	- Support for VoIP/circuit-switched calls is not the target of this
	206	API. Support for dynamic bit-rate changes would require a tight
	207	coupling between the DSP and the host stack, limiting power savings.
	208
	209	- Packet-loss concealment is not supported. This would require an
	210	additional interface to let the decoder synthesize data when frames
	211	are lost during transmission. This may be added in the future.
	212
	213	- Volume control/routing is not handled by this API. Devices exposing a
	214	compressed data interface will be considered as regular ALSA devices;
	215	volume changes and routing information will be provided with regular
	216	ALSA kcontrols.
	217
	218	- Embedded audio effects. Such effects should be enabled in the same
	219	manner, no matter if the input was PCM or compressed.
	220
	221	- multichannel IEC encoding. Unclear if this is required.
	222
	223	- Encoding/decoding acceleration is not supported as mentioned
	224	above. It is possible to route the output of a decoder to a capture
	225	stream, or even implement transcoding capabilities. This routing
	226	would be enabled with ALSA kcontrols.
	227
	228	- Audio policy/resource management. This API does not provide any
b327d25c	229	hooks to query the utilization of the audio DSP, nor any preemption
57bd9b8d PLB	230	mechanisms.
57bd9b8d PLB	231
b327d25c	232	- No notion of underrun/overrun. Since the bytes written are compressed
57bd9b8d	233	in nature and data written/read doesn't translate directly to
b327d25c	234	rendered output in time, this does not deal with underrun/overrun and
57bd9b8d PLB	235	maybe dealt in user-library
57bd9b8d PLB	236
e9df12c3 TI	237
	238	Credits
	239	=======
57bd9b8d PLB	240	- Mark Brown and Liam Girdwood for discussions on the need for this API
	241	- Harsha Priya for her work on intel_sst compressed API
	242	- Rakesh Ughreja for valuable feedback
	243	- Sing Nallasellan, Sikkandar Madar and Prasanna Samaga for
	244	demonstrating and quantifying the benefits of audio offload on a
	245	real platform.