Description
Frame Pattern Substitution (FPS) is a core component of the error concealment mechanisms within 3GPP speech and audio codecs, such as the Adaptive Multi-Rate (AMR) and Enhanced Voice Services (EVS) codecs. It operates at the receiver's decoder when a speech frame is lost during transmission over the radio interface or is received with irrecoverable errors. The primary goal is to mitigate the audible impact of such losses, which would otherwise manifest as disruptive gaps, clicks, or distortion in the decoded speech signal. FPS works by generating a substitute frame based on previously received, correctly decoded speech frames and the inherent properties of the speech signal. This synthesized frame is designed to seamlessly continue the speech waveform, preserving pitch and spectral characteristics to the extent possible, thereby maintaining naturalness and intelligibility.
The technical implementation of FPS is tightly integrated with the specific codec algorithm. For instance, in the AMR codec, when a frame is declared as Bad (through mechanisms like Cyclic Redundancy Check failure), the decoder invokes the FPS algorithm. It utilizes parameters from the last good frame, such as the Linear Prediction Coefficients (LPC) representing the vocal tract filter and the pitch period (for voiced sounds), to extrapolate and generate a new excitation signal. This synthesized excitation is then filtered through the LPC synthesis filter to produce a time-domain speech signal for the missing frame. The process often involves gradually attenuating the energy of the substituted frames if consecutive losses occur, to avoid generating artificial, sustained noise, and to provide a smoother transition back to normal decoding when good frames resume.
FPS is a critical part of the Radio Access Bearer (RAB) and later QoS framework for conversational voice. Its effectiveness is a key factor in achieving high Mean Opinion Score (MOS) ratings for voice services, especially in challenging radio environments at cell edges or during handovers. The algorithms are specified in detail in 3GPP TS 26-series specifications (e.g., 26.092 for AMR) to ensure interoperability. While FPS handles frame-level losses, it is part of a broader suite of resilience features including redundancy (e.g., frame duplication in RoHC), jitter buffer management, and codec mode adaptation, all working together to deliver robust Voice over IP (VoIP) services in mobile networks.
Purpose & Motivation
FPS was created to address the fundamental challenge of delivering toll-quality voice over packet-switched networks, which are inherently prone to packet loss and delay jitter, unlike the circuit-switched networks of 2G/3G. In circuit-switched connections, dedicated channels provided a consistent bitstream, while packet networks (like the IP-based LTE and 5G cores) treat voice as data packets susceptible to loss. Without FPS, lost voice frames would cause audible and disruptive glitches, severely degrading user experience. The technology solves this by providing a software-based, intelligent 'guess' for missing content, allowing the conversation to continue with minimal perceptual disruption.
The motivation stemmed from the transition to all-IP networks in 3GPP Release 8 (LTE), where VoLTE was standardized. To make VoIP viable over wireless links with fluctuating quality, robust error concealment was non-negotiable. Previous approaches in circuit-switched systems had different, often hardware-based, error correction. FPS represents a shift to sophisticated signal processing within the codec itself, optimizing for the statistical nature of packet loss. It addresses the limitation of simple packet loss concealment (PLC) methods, which might insert silence or simple noise, by generating a signal that is acoustically consistent with the ongoing speech, thereby preserving call naturalness and speaker identity where possible.
Key Features
- Generates substitute speech frames based on parameters from previous good frames
- Utilizes Linear Prediction (LP) and pitch period extrapolation for signal synthesis
- Includes energy attenuation for consecutive lost frames to avoid artificial sustained noise
- Integrates seamlessly with codec state machines (e.g., AMR, EVS decoder)
- Specified algorithmically in 3GPP TS 26-series for interoperability
- Critical for maintaining high Mean Opinion Score (MOS) in poor radio conditions
Evolution Across Releases
Introduced as a core error concealment component for the AMR and AMR-WB codecs in the context of VoLTE and CS fallback. Specified the fundamental algorithm for generating substitution frames using LP synthesis and pitch period replication from the last good frame to maintain speech continuity during packet loss.
Defining Specifications
| Specification | Title |
|---|---|
| TS 22.864 | 3GPP TS 22.864 |
| TS 26.922 | 3GPP TS 26.922 |
| TS 26.926 | 3GPP TS 26.926 |
| TS 26.928 | 3GPP TS 26.928 |
| TS 26.955 | 3GPP TS 26.955 |
| TS 26.956 | 3GPP TS 26.956 |
| TS 26.998 | 3GPP TS 26.998 |
| TS 38.835 | 3GPP TR 38.835 |
| TS 48.020 | 3GPP TR 48.020 |