LTP

Long Term Predictor

Other
Introduced in Rel-8
Long Term Predictor (LTP) is a speech coding technique used in 3GPP codecs like AMR and AMR-WB. It models the long-term periodicity (pitch) of the speech signal to efficiently remove redundancy, improving compression and quality. This is crucial for optimizing bandwidth usage in mobile voice services.

Description

The Long Term Predictor (LTP) is a core component within the Algebraic Code-Excited Linear Prediction (ACELP) algorithm used in 3GPP speech codecs such as the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB). Its primary function is to model and exploit the long-term correlation, or pitch periodicity, inherent in voiced speech segments. Voiced sounds, like vowels, exhibit a quasi-periodic structure where the waveform repeats approximately every pitch period (typically 2.5 ms to 20 ms). The LTP identifies this periodicity and uses it to predict the current speech frame based on a delayed version of the past excitation signal.

Architecturally, the LTP operates within the synthesis filter loop of the codec. It consists of a long-term synthesis filter, often represented as 1/(1 - gp*z^-T), where 'T' is the pitch lag (delay) and 'gp' is the pitch gain. The codec's analysis-by-synthesis search procedure determines the optimal integer or fractional pitch lag (T) and the corresponding gain (gp) that minimize the perceptual error between the original and synthesized speech. The output of the LTP, called the long-term prediction signal or adaptive codebook excitation, represents the periodic component of the excitation.

This predicted component is then combined with a stochastic component from a fixed codebook (representing innovation or unpredictable parts of the signal) to form the total excitation. By accurately modeling the pitch, the LTP allows the fixed codebook to focus on representing the remaining innovation, leading to a more efficient allocation of bits. The parameters (lag and gain) are quantized and transmitted to the decoder, which reconstructs the long-term prediction using its own filter state. This process is fundamental to achieving high-quality speech at low bit rates, as it captures a significant source of signal redundancy.

In standards like TS 26.090 (AMR) and TS 26.190 (AMR-WB), the LTP implementation details, including search ranges, resolution (integer or fractional lag), and gain quantization, are meticulously specified to ensure interoperability. Its performance is critical across the codec's multiple bit-rate modes, adapting to different channel conditions. The LTP's role is so integral that enhancements to its precision (e.g., fractional lag resolution) directly correlate with improved speech quality, particularly for female and child voices which have higher pitch frequencies.

Purpose & Motivation

The Long Term Predictor was created to address the fundamental challenge of compressing speech signals for digital transmission over bandwidth-constrained channels, such as those in early 2G and 3G mobile networks. Prior speech codecs, while using linear prediction for short-term spectral envelope modeling, were less efficient at exploiting the long-term periodic correlations in voiced speech. This resulted in either higher required bit rates for equivalent quality or reduced speech quality at lower bit rates, limiting network capacity and service quality.

The motivation for LTP stems from the source-filter model of speech production, where the excitation source for voiced sounds is a quasi-periodic glottal pulse train. By explicitly modeling this periodicity, the LTP removes a major source of redundancy, allowing the codec to represent the speech signal with fewer bits. This was a key innovation that enabled the development of the highly successful AMR codec, which became the mandatory voice codec for 3GPP GSM and UMTS networks. It solved the problem of maintaining robust voice quality under varying radio channel conditions by supporting multiple bit rates, with the LTP's efficient modeling being essential at the lower rates.

Historically, the incorporation of LTP into code-excited linear prediction (CELP) architectures marked a significant evolution from simpler multi-pulse or regular-pulse excitation designs. It addressed the limitation of those earlier approaches, which did not separately model long-term and short-term correlations, leading to less efficient excitation coding. The LTP's creation was driven by the need for a standardized, high-performance speech coding algorithm that could maximize the number of voice channels per radio carrier while maintaining toll-quality speech, a critical economic and technical driver for cellular network operators.

Key Features

  • Models long-term pitch periodicity in voiced speech using a feedback filter
  • Uses adaptive codebook to generate the periodic component of excitation
  • Parameters include pitch lag (integer or fractional) and pitch gain
  • Integral part of ACELP algorithm in AMR and AMR-WB codecs
  • Reduces bit rate by exploiting signal redundancy
  • Improves speech quality, especially for voices with high pitch

Evolution Across Releases

Rel-8 Initial

Introduced as a core, standardized component within the AMR and AMR-WB codecs specified in TS 26.090 and TS 26.190. The initial architecture featured an adaptive codebook (LTP) working in tandem with an algebraic codebook, using analysis-by-synthesis to find optimal pitch lag and gain. It supported multiple bit rates with defined LTP search procedures for each mode.

Defining Specifications

SpecificationTitle
TS 26.090 3GPP TS 26.090
TS 26.190 3GPP TS 26.190
TS 26.290 3GPP TS 26.290
TS 46.020 3GPP TR 46.020
TS 46.042 3GPP TR 46.042
TS 46.060 3GPP TR 46.060
TS 46.082 3GPP TR 46.082