LP

Linear Prediction

Services
Introduced in Rel-8
A fundamental digital signal processing technique used extensively in speech and audio codecs to model the spectral envelope of a signal. It predicts a sample's value as a linear combination of its past samples, allowing for efficient compression by transmitting only prediction coefficients and an error signal. It is the core of many voice codecs, including AMR and EVS.

Description

Linear Prediction (LP) is a mathematical operation where a future value of a discrete-time signal is estimated as a weighted sum (linear combination) of its past values. In the context of speech coding, the speech signal s(n) is modeled as the output of an all-pole filter (the synthesis filter) excited by an input signal e(n), which is either a periodic pulse train (for voiced sounds) or white noise (for unvoiced sounds). The relationship is expressed as s(n) = Σ (a_k * s(n-k)) + G * e(n), for k=1 to p, where 'a_k' are the Linear Prediction Coefficients (LPCs), 'p' is the prediction order, and G is a gain factor. The coefficients 'a_k' define the spectral envelope or formants of the speech.

The encoding process involves analyzing a short frame of speech (e.g., 20 ms) to compute the set of LPCs that best predict the signal within that frame. This is typically done by solving the Yule-Walker equations using methods like the Levinson-Durbin recursion. The difference between the original speech signal and the predicted signal is the LP residual, which represents the excitation. High-quality codecs then further encode this residual. For very low bitrates, the residual may be modeled parametrically (as in Algebraic Code-Excited Linear Prediction, ACELP), where only the type of excitation (voiced/unvoiced), pitch period, and a fixed codebook index are transmitted.

At the decoder, the received LPCs are used to construct the synthesis filter. The received excitation parameters are used to generate the excitation signal e(n), which is then passed through this synthesis filter to reconstruct the speech waveform. The stability of the synthesis filter is ensured by converting the LPCs to a more robust representation like Line Spectral Pairs (LSPs) or Immittance Spectral Fairs (ISFs) for quantization and transmission. LP is computationally intensive but provides extremely high compression efficiency, making it the backbone of all modern narrowband and wideband speech codecs standardized by 3GPP, such as the Adaptive Multi-Rate (AMR) and Enhanced Voice Services (EVS) codecs.

Purpose & Motivation

Linear Prediction exists to solve the problem of efficiently digitizing and compressing speech signals for transmission over bandwidth-constrained wireless channels. Its primary purpose is to exploit the strong short-term correlation (redundancy) present in speech signals, where each sample is highly predictable from preceding samples due to the physical properties of the human vocal tract. By modeling this correlation, LP allows the codec to transmit only the model parameters (LPCs) and a simplified representation of the excitation, achieving high compression ratios (e.g., from 64 kbps PCM down to 5.9 kbps AMR) while maintaining intelligible speech quality.

Historically, LP-based coding replaced simpler waveform codecs (like PCM and ADPCM) for mobile voice because it offered far better compression, which was critical for early digital cellular systems (2G GSM) with limited spectral efficiency. The introduction of the Full-Rate speech codec in GSM, based on Regular Pulse Excitation-Long Term Prediction (RPE-LTP), marked the adoption of LP principles. Subsequent evolution through the AMR codec in 3G UMTS and the EVS codec in 4G/5G has continuously refined LP techniques, increasing prediction order, improving quantization of LPCs (using LSPs), and enhancing excitation modeling (e.g., with ACELP and TCX). This evolution has addressed limitations like poor music handling and robotic voice artifacts, extending LP's utility from narrowband telephony to high-quality fullband audio and voice-over-LTE (VoLTE) services.

Key Features

  • Models the speech signal as an all-pole filter excited by a periodic or noise-like source
  • Extracts Linear Prediction Coefficients (LPCs) representing the spectral envelope (formants) of speech
  • Achieves high compression by transmitting only LPCs and a compact representation of the excitation signal
  • Uses robust parameterizations like Line Spectral Pairs (LSPs) for stable quantization and transmission
  • Forms the core analysis-synthesis structure of codecs like AMR, AMR-WB, and EVS
  • Enables multi-rate operation by varying bit allocation for LPC and excitation quantization

Evolution Across Releases

Rel-8 Initial

Introduced with the standardization of the AMR Wideband (AMR-WB) codec for GSM and UMTS, extending Linear Prediction techniques to wideband speech (50-7000 Hz). The initial architecture for wideband used a higher LP order (typically 16 vs. 10 for narrowband) to model the broader spectrum and introduced more advanced quantization methods for the LP parameters, significantly improving naturalness and voice quality compared to narrowband telephony.

Defining Specifications

SpecificationTitle
TS 26.090 3GPP TS 26.090
TS 26.092 3GPP TS 26.092
TS 26.190 3GPP TS 26.190
TS 26.192 3GPP TS 26.192
TS 26.226 3GPP TS 26.226
TS 26.253 3GPP TS 26.253
TS 26.267 3GPP TS 26.267
TS 26.290 3GPP TS 26.290
TS 26.818 3GPP TS 26.818
TS 46.060 3GPP TR 46.060
TS 46.062 3GPP TR 46.062