FPS (Frame Pattern Substitution) — 3GPP Glossary

Frame Pattern Substitution (FPS) is a technique used in 3GPP codecs, particularly for Voice over LTE (VoLTE) and later voice services, to handle lost or corrupted speech frames. It replaces missing frames with a synthesized pattern to maintain audio continuity and intelligibility, improving perceived voice quality in poor radio conditions.

Description

Frame Pattern Substitution (FPS) is a core component of the error concealment mechanisms within 3GPP speech and audio codecs, such as the Adaptive Multi-Rate (AMR) and Enhanced Voice Services (EVS) codecs. It operates at the receiver's decoder when a speech frame is lost during transmission over the radio interface or is received with irrecoverable errors. The primary goal is to mitigate the audible impact of such losses, which would otherwise manifest as disruptive gaps, clicks, or distortion in the decoded speech signal. FPS works by generating a substitute frame based on previously received, correctly decoded speech frames and the inherent properties of the speech signal. This synthesized frame is designed to seamlessly continue the speech waveform, preserving pitch and spectral characteristics to the extent possible, thereby maintaining naturalness and intelligibility.

The technical implementation of FPS is tightly integrated with the specific codec algorithm. For instance, in the AMR codec, when a frame is declared as Bad (through mechanisms like Cyclic Redundancy Check failure), the decoder invokes the FPS algorithm. It utilizes parameters from the last good frame, such as the Linear Prediction Coefficients (LPC) representing the vocal tract filter and the pitch period (for voiced sounds), to extrapolate and generate a new excitation signal. This synthesized excitation is then filtered through the LPC synthesis filter to produce a time-domain speech signal for the missing frame. The process often involves gradually attenuating the energy of the substituted frames if consecutive losses occur, to avoid generating artificial, sustained noise, and to provide a smoother transition back to normal decoding when good frames resume.

FPS is a critical part of the Radio Access Bearer (RAB) and later QoS framework for conversational voice. Its effectiveness is a key factor in achieving high Mean Opinion Score (MOS) ratings for voice services, especially in challenging radio environments at cell edges or during handovers. The algorithms are specified in detail in 3GPP TS 26-series specifications (e.g., 26.092 for AMR) to ensure interoperability. While FPS handles frame-level losses, it is part of a broader suite of resilience features including redundancy (e.g., frame duplication in RoHC), jitter buffer management, and codec mode adaptation, all working together to deliver robust Voice over IP (VoIP) services in mobile networks.

Purpose & Motivation

FPS was created to address the fundamental challenge of delivering toll-quality voice over packet-switched networks, which are inherently prone to packet loss and delay jitter, unlike the circuit-switched networks of 2G/3G. In circuit-switched connections, dedicated channels provided a consistent bitstream, while packet networks (like the IP-based LTE and 5G cores) treat voice as data packets susceptible to loss. Without FPS, lost voice frames would cause audible and disruptive glitches, severely degrading user experience. The technology solves this by providing a software-based, intelligent 'guess' for missing content, allowing the conversation to continue with minimal perceptual disruption.

The motivation stemmed from the transition to all-IP networks in 3GPP Release 8 (LTE), where VoLTE was standardized. To make VoIP viable over wireless links with fluctuating quality, robust error concealment was non-negotiable. Previous approaches in circuit-switched systems had different, often hardware-based, error correction. FPS represents a shift to sophisticated signal processing within the codec itself, optimizing for the statistical nature of packet loss. It addresses the limitation of simple packet loss concealment (PLC) methods, which might insert silence or simple noise, by generating a signal that is acoustically consistent with the ongoing speech, thereby preserving call naturalness and speaker identity where possible.

Key Features

Generates substitute speech frames based on parameters from previous good frames
Utilizes Linear Prediction (LP) and pitch period extrapolation for signal synthesis
Includes energy attenuation for consecutive lost frames to avoid artificial sustained noise
Integrates seamlessly with codec state machines (e.g., AMR, EVS decoder)
Specified algorithmically in 3GPP TS 26-series for interoperability
Critical for maintaining high Mean Opinion Score (MOS) in poor radio conditions

Evolution Across Releases

Rel-8 Initial

Introduced as a core error concealment component for the AMR and AMR-WB codecs in the context of VoLTE and CS fallback. Specified the fundamental algorithm for generating substitution frames using LP synthesis and pitch period replication from the last good frame to maintain speech continuity during packet loss.

TS 22.864 TS 26.922 TS 26.926 TS 26.928 TS 26.955 TS 26.956 TS 26.998 TS 38.835 TS 48.020

Defining Specifications

Specification	Title
TS 22.864	3GPP TS 22.864
TS 26.922	3GPP TS 26.922
TS 26.926	3GPP TS 26.926
TS 26.928	3GPP TS 26.928
TS 26.955	3GPP TS 26.955
TS 26.956	3GPP TS 26.956
TS 26.998	3GPP TS 26.998
TS 38.835	3GPP TR 38.835
TS 48.020	3GPP TR 48.020