What is SID? Silence Insertion Descriptor

Description

The Silence Insertion Descriptor (SID) is a fundamental component of the Adaptive Multi-Rate (AMR) and AMR-Wideband (AMR-WB) speech codecs standardized by 3GPP. During a voice call, human speech contains natural pauses and silence periods, which can constitute up to 60% of the conversation. Transmitting these silent segments as regular speech frames would be highly inefficient. Instead, the codec employs a Voice Activity Detection (VAD) algorithm at the transmitting end to identify these non-speech intervals. Upon detecting silence, the encoder stops generating conventional speech frames and produces a special SID frame. This SID frame is a compact data structure that contains essential parameters to characterize the background noise, such as spectral envelope information and energy levels, allowing the receiver to generate Comfort Noise (CN) that matches the acoustic environment of the caller.

The architecture for SID frame generation and processing is integrated within the speech codec's operational modes. The encoder, upon VAD-triggered transition from active speech to silence, transmits an initial SID frame (often called a 'first SID') to establish the noise parameters. Subsequently, during the prolonged silence period, the encoder may send periodic update SID frames at a much lower rate (e.g., every 160 ms or 320 ms) compared to the regular 20 ms speech frame rate, to track any changes in the background noise. This discontinuous transmission (DTX) mechanism, where SID frames are sent sporadically, is the core of bandwidth savings. The receiver's decoder uses the information in the received SID frames to synthesize comfort noise through a noise generation function, preventing the eerie 'dead silence' that would otherwise be perceived by the listener and maintaining a natural call experience.

The technical implementation of SID frames involves specific bit patterns and frame types defined in the codec specifications. For AMR, there are different SID frame types corresponding to various codec modes. The SID frame is much smaller than a full speech frame; for instance, an AMR 12.2 kbps mode speech frame is 244 bits, while a SID frame can be as small as 35 bits. This drastic reduction in payload size is what conserves radio resources and battery life. The SID mechanism works in tandem with the Radio Access Network's transport protocols, which must correctly identify and handle these special frames to ensure they are not mistaken for corrupted data. The role of SID is thus critical in the end-to-end voice service chain, enabling efficient use of the channel while preserving a high-quality, natural-sounding user experience, which is a key requirement for mobile telephony.

Purpose & Motivation

The primary purpose of the Silence Insertion Descriptor is to enable efficient bandwidth utilization during voice calls by eliminating the wasteful transmission of silence. In early digital voice systems, even during pauses in speech, the channel would remain occupied with data representing background noise or mere digital silence, consuming valuable radio spectrum and network capacity. This was particularly problematic for cellular networks where spectrum is a scarce and expensive resource. The SID mechanism, as part of the DTX feature, was created to solve this problem, directly increasing the number of simultaneous calls a cell can handle and reducing interference in the system.

Historically, before sophisticated codecs like AMR, some systems used simple on-off DTX which could lead to an unpleasant switching effect where background noise would abruptly disappear and reappear, creating a 'choppy' auditory experience. The innovation of SID was to provide a descriptor that allows the receiving end to reconstruct a plausible approximation of the sender's background noise. This addresses the key limitation of previous DTX approaches: the need to maintain acoustic continuity and call naturalness. By sending a compact mathematical description of the noise rather than the noise itself, the system achieves the dual goals of efficiency and quality.

The motivation for its creation within 3GPP was integral to the development of the AMR codec for GSM and later UMTS. As networks evolved to support more users and data services, optimizing every aspect of voice traffic became paramount. SID is a classic example of a perceptual optimization—exploiting the characteristics of human hearing and conversation patterns to design a more efficient technical system without the user perceiving any negative impact, thereby enhancing overall network performance and economic viability.

Classification

Part ofAMR

Related approaches

Detected Changes Across Releases

from 3GPP Change Requests

Specific changes extracted from the „Change history“ tables of 3GPP specifications (1 CRs across 1 releases). Complements the general historical overview above with the evidence-based evolution of this function.

Studied in Rel-5, normative work from Rel-15.

Rel-15 1 change

In Release 15, the primary change for the SID function was a correction related to its use with the EVS codec. The update specifically addressed an issue in the procedure for updating the EVS SID (Silence Insertion Descriptor) during periods of silence.

Correction of EVS SID update TS 26.449CR0006

Explore further

Broader topics and technologies where SID plays a role.

Topics

Spectrum & Coexistence MEC (Edge Computing)Lawful Intercept UMTS / WCDMA Services & Applications Protocols & Interfaces

Defining Specifications

3GPP specifications that define or reference SID, with the latest known release. Sourced from the 3GPP document catalog — see methodology.

Specification	Title	Release
TR 21.905 vj00	3GPP Technical Terms and Definitions	Rel-19
TS 25.415 vj00	Iu Interface User Plane Protocol	Rel-19
TS 26.091 vj00	AMR Error Concealment Procedure	Rel-19
TS 26.092 vj00	AMR Comfort Noise for SCR Operation	Rel-19
TS 26.093 vj00	SCR operation of AMR codec for UMTS	Rel-19
TS 26.101 vj00	Generic frame format for AMR and GSM-EFR speech codecs	Rel-19
TS 26.102 vj00	Mapping of AMR and other codecs to interfaces	Rel-19
TS 26.103 vj00	3GPP Codec Lists for OoBTC and TrFO	Rel-19
TS 26.114 vj10	IMS Multimedia Telephony Media Handling	Rel-19
TS 26.191 vj00	AMR-WB Error Concealment Procedure	Rel-19
TS 26.192 vj00	AMR-WB Comfort Noise Requirements	Rel-19
TS 26.193 vj00	AMR-WB Source Controlled Rate (SCR) Operation	Rel-19
TS 26.201 vj00	AMR-WB Speech Codec Frame Format	Rel-19
TS 26.202 vj00	AMR-WB Speech Codec Mapping Specification	Rel-19
TS 26.250 vj00	IVAS Codec Introduction	Rel-19
TS 26.258 vj10	IVAS Codec Floating-Point C Code Specification	Rel-19
TS 26.441 vj00	EVS Audio Processing Introduction	Rel-19
TS 26.442 vj00	EVS Codec Fixed Point ANSI-C Code	Rel-19
TS 26.443 vj00	EVS Codec Floating-Point C Code	Rel-19
TS 26.444 vj00	EVS Codec Conformance Test Sequences	Rel-19
TS 26.446 vj00	EVS Codec AMR-WB Backward Compatibility Spec	Rel-19
TS 26.448 vj00	EVS Jitter Buffer Management Specification	Rel-19
TS 26.449 vj00	EVS Codec Comfort Noise Generation for DTX	Rel-19
TS 26.450 vj00	EVS Codec DTX System Level Aspects	Rel-19
TS 26.451 vj00	EVS Codec Voice Activity Detector (VAD) Specification	Rel-19
TS 26.452 vj00	EVS Codec Fixed-Point C Code Implementation	Rel-19
TS 26.453 vj00	EVS Codec Generic Frame Format for 3G CS Networks	Rel-19
TR 26.916 ve20	eSRVCC Transcoding Minimization Study	Rel-14
TR 26.952 vj00	EVS Codec Selection, Verification & Characterization	Rel-19
TR 26.975 vj00	AMR Speech Codec Performance Background	Rel-19
TR 26.978 vj00	AMR Noise Suppression Selection Phase Technical Report	Rel-19
TR 26.998 vj00	5G AR/MR Glasses Integration Study	Rel-19
TS 28.620 vj20	FMC Federated Network Information Model (FNIM) UIM	Rel-19
TS 29.414 vj00	Nb Interface Bearer Transport & Control Protocols	Rel-19
TS 29.892 vg00	Study on User Plane Protocol in 5GC	Rel-16
TS 32.808 v1800	Common User Profile Storage Framework	Rel-8
TR 38.805 ve00	Study on New Radio Access Technology; 60 GHz unlicensed spectrum	Rel-14
TS 38.807 vg10	NR beyond 52.6 GHz Study	Rel-16
TR 38.808 vh00	Study on NR above 52.6 GHz to 71 GHz	Rel-17
TR 38.859 vi10	Technical Report	Rel-18
TR 43.901 vj00	Generic Access to A/Gb Interface Feasibility Study	Rel-19
TR 45.913 vj00	Optimized Transmit Pulse Shape for EGPRS2-B	Rel-19
TS 46.002 vj00	Introduction to GSM Half-Rate Speech Processing	Rel-19
TS 46.008 vj00	GSM Half Rate Speech Codec Performance	Rel-19
TS 46.021 vj00	GSM Half Rate DTX Frame Substitution & Muting	Rel-19
TS 46.022 vj00	GSM Half Rate DTX Comfort Noise Specification	Rel-19
TS 46.041 vj00	GSM Half Rate Speech DTX Operation	Rel-19
TS 46.051 vj00	GSM Enhanced Full Rate Speech Processing Intro	Rel-19
TS 46.055 vj00	GSM Enhanced Full Rate Speech Codec Performance	Rel-19
TS 46.061 vj00	GSM EFR Frame Substitution and Muting Procedure	Rel-19
TS 46.062 vj00	GSM EFR DTX Comfort Noise Specification	Rel-19
TS 46.081 vj00	GSM Enhanced Full Rate DTX Operation	Rel-19

Silence Insertion Descriptor