Description
Advanced Audio Coding – Low Delay (AAC-LD) is a member of the MPEG-4 AAC family of audio codecs, specifically optimized for applications requiring very low end-to-end latency while maintaining high audio quality. As standardized in 3GPP TS 26.923, AAC-LD is a transform-based perceptual audio codec. It operates by dividing the input audio signal into overlapping blocks, transforming them into the frequency domain using a Modified Discrete Cosine Transform (MDCT), and applying sophisticated psychoacoustic models to identify and quantize only the perceptually relevant components of the signal. This process allows for efficient data compression by discarding inaudible information. The codec supports a wide range of sampling rates (e.g., 8 kHz to 48 kHz) and bitrates, typically from 32 kbps up to 64 kbps per channel for high-quality stereo communication.
Architecturally, AAC-LD is designed as a low-delay profile of the AAC codec. Its key innovation is the reduction of the look-ahead buffer and the use of a shorter transform window. While standard AAC codecs use a 2048-sample window for high efficiency, AAC-LD employs a shorter 512- or 480-sample window (depending on the sampling rate) to drastically cut algorithmic delay. This results in a total codec delay of approximately 20 ms, which is critical for maintaining conversational quality in two-way communication. The codec structure includes modules for time/frequency mapping, perceptual noise shaping, quantization, and noiseless coding (using Huffman coding). It also incorporates tools like Temporal Noise Shaping (TNS) to control pre-echo artifacts, which are more pronounced with shorter windows.
Within the 3GPP ecosystem, AAC-LD functions as a mandatory codec for certain real-time communication services. It is integrated into the Media Processing functions of the IP Multimedia Subsystem (IMS) for packet-switched voice and video services. For example, in Voice over LTE (VoLTE) and Video over LTE (ViLTE), AAC-LD can be negotiated during the Session Description Protocol (SDP) offer/answer exchange as part of the Session Initiation Protocol (SIP) signaling. The encoded audio frames are packetized into Real-time Transport Protocol (RTP) packets for transmission over the IP-based bearer. Its role is to provide a high-fidelity, low-latency audio experience that is perceptually indistinguishable from a face-to-face conversation, which is a fundamental requirement for user acceptance of telephony and conferencing services over mobile broadband networks.
The performance of AAC-LD is characterized by its excellent audio quality at low bitrates, rivaling that of higher-delay codecs like standard AAC-LC, but with a delay profile suitable for interactive use. It supports both mono and stereo configurations, making it suitable for music sharing and high-quality conference calls. The codec's robustness to packet loss is also a consideration, often managed in conjunction with RTP-based error concealment techniques at the receiver. Its implementation in devices and network equipment involves careful optimization of computational complexity to ensure efficient operation on mobile processors while meeting the stringent delay budget for end-to-end media paths.
Purpose & Motivation
AAC-LD was created to address the specific need for high-quality, low-latency audio in bidirectional real-time communication over packet-switched networks. Prior to its adoption, mobile telephony primarily relied on narrowband speech codecs like AMR-NB, which, while low in delay, offered limited audio bandwidth and quality unsuitable for music or high-fidelity voice. For multimedia services, standard audio codecs like AAC-LC or MP3 provided high quality but introduced high algorithmic delays (often 100+ ms) due to their use of long analysis windows for compression efficiency. This high delay is detrimental to conversational interactivity, causing noticeable talk-over effects and disrupting the natural flow of dialogue.
The motivation for AAC-LD stemmed from the evolution of mobile networks towards all-IP architectures like LTE and the rise of enriched communication services through IMS. Services such as VoLTE, ViLTE, and real-time video conferencing demanded an audio component that could deliver 'CD-like' stereo quality without sacrificing conversational latency. The traditional circuit-switched voice codecs were insufficient for these rich media applications. AAC-LD was standardized to fill this gap, providing a codec that could be used for both speech and music content with a delay comparable to traditional telephony codecs but with vastly superior audio fidelity.
By solving the latency-quality trade-off, AAC-LD enabled a new class of services. It allowed network operators and service providers to offer premium voice and video call experiences, music streaming with interactivity, and low-latency audio for gaming and augmented reality applications over cellular networks. Its introduction supported the 3GPP vision of convergent IP-based services, ensuring that the audio experience kept pace with improvements in video quality and network bandwidth, thereby enhancing overall user satisfaction and enabling the commercial success of advanced telephony services.
Key Features
- Algorithmic delay of approximately 20 ms, enabling natural conversation
- High audio quality supporting full stereo bandwidth up to 20 kHz
- Wide range of supported bitrates from 32 kbps to 64 kbps per channel
- Based on MPEG-4 AAC profile with a low-delay specific window sequence
- Supports various sampling rates (8, 16, 24, 32, 48 kHz) for flexibility
- Incorporates Temporal Noise Shaping (TNS) to control pre-echo artifacts
Evolution Across Releases
Initially standardized in 3GPP TS 26.923. AAC-LD was introduced as a mandatory codec for the Enhanced Voice Services (EVS) framework and for certain IMS-based communication services. The specification defined its bitstream format, decoder requirements, and performance characteristics for integration into VoLTE and other real-time multimedia applications, establishing it as the high-quality, low-delay audio option for LTE networks.
Defining Specifications
| Specification | Title |
|---|---|
| TS 26.923 | 3GPP TS 26.923 |