MDCT (Modified Discrete Cosine Transform) — 3GPP Glossary

MDCT is a lapped transform used for efficient audio signal compression in 3GPP codecs like Enhanced Voice Services (EVS) and AMR-WB+. It converts time-domain audio samples into frequency-domain coefficients, enabling high-quality, low-bitrate audio coding essential for mobile voice and media services.

Description

The Modified Discrete Cosine Transform (MDCT) is a critical signal processing algorithm within 3GPP audio codecs, such as the Enhanced Voice Services (EVS) codec specified in TS 26.410 and TS 26.411. It belongs to the family of lapped transforms, designed to process audio frames with 50% overlap between consecutive blocks. This overlapping structure is key to its performance; it helps mitigate blocking artifacts—audible discontinuities that can occur at frame boundaries in traditional block transforms—thereby providing smoother, higher-quality audio reconstruction, especially at lower bitrates.

Technically, the MDCT operates by taking a windowed segment of time-domain audio samples and transforming them into a set of frequency-domain coefficients. The window function, typically a sine or Kaiser-Bessel derived (KBD) window, is applied to the input signal to reduce spectral leakage. The 50% overlap means that each sample contributes to two consecutive transforms, which provides redundancy. During the inverse transform (IMDCT), this overlap-add process reconstructs the original signal perfectly in the absence of quantization, a property known as time-domain aliasing cancellation (TDAC). This makes the MDCT particularly efficient as it is a form of critically sampled filter bank.

Within the 3GPP codec architecture, the MDCT is often used in conjunction with other tools like Linear Predictive Coding (LPC) for speech or modified discrete sine transforms (MDST) for certain bandwidths. In EVS, for instance, the MDCT is employed for coding the high-frequency band or the entire signal in generic audio mode. The transform coefficients are then quantized and entropy coded according to a perceptual model that allocates bits based on auditory masking thresholds. This entire process allows the codec to achieve transparent audio quality at rates as low as 9.6 kbps for speech and up to 128 kbps for music, forming the backbone of high-definition voice services in LTE and 5G networks.

Purpose & Motivation

The MDCT was introduced to address the need for high-efficiency, high-quality audio compression in mobile networks. Prior transform coding techniques, like the standard Discrete Cosine Transform (DCT) used in earlier codecs, suffered from blocking artifacts at low bitrates, which degraded perceptual audio quality. The lapped structure of the MDCT, with its inherent time-domain aliasing cancellation, was developed to eliminate these artifacts without sacrificing coding efficiency, enabling robust audio delivery over bandwidth-constrained wireless channels.

Its adoption in 3GPP standards, particularly from Release 8 onwards with AMR-WB+ and later in EVS, was driven by the evolution of mobile services from basic telephony to rich media communication. Consumers demanded studio-quality voice and music streaming, which required codecs that could deliver superior audio fidelity at variable bitrates while maintaining resilience to packet loss. The MDCT's mathematical properties made it ideal for this, as it provides a near-optimal time-frequency representation for perceptual coding, allowing codec designers to exploit psychoacoustic principles to discard inaudible signal components aggressively.

Furthermore, the MDCT enabled the unification of speech and audio coding within a single codec framework. Traditional speech codecs relied heavily on source models (like LPC), which performed poorly for non-speech signals like music. By incorporating the MDCT, 3GPP codecs like EVS can switch seamlessly between speech-specific and generic audio coding modes, ensuring optimal performance across a wide range of audio content. This versatility was crucial for supporting advanced services like Voice over LTE (VoLTE), high-definition voice calls, and multimedia streaming, forming a foundational technology for the audio experience in modern cellular networks.

Key Features

Lapped transform with 50% overlap between consecutive blocks to prevent blocking artifacts
Provides perfect reconstruction via time-domain aliasing cancellation (TDAC) in the absence of quantization
Used in frequency-domain coding for both speech and generic audio signals in 3GPP codecs
Enables high compression efficiency through perceptual quantization of frequency coefficients
Supports variable frame sizes and window shapes (e.g., sine, KBD) for signal-adaptive processing
Integral to the Enhanced Voice Services (EVS) codec for high-definition voice in LTE and 5G

Evolution Across Releases

Rel-8 Initial

Introduced as part of the AMR-WB+ codec for extended audio bandwidth and stereo support. The MDCT was used for transform coding of the high-frequency components and non-speech signals, providing improved audio quality for music and multimedia services compared to earlier narrowband codecs.

TS 26.253 TS 26.255 TS 26.410 TS 26.411

Defining Specifications

Specification	Title
TS 26.253	3GPP TS 26.253
TS 26.255	3GPP TS 26.255
TS 26.410	3GPP TS 26.410
TS 26.411	3GPP TS 26.411