SID (Silence Insertion Descriptor) — 3GPP Glossary

A frame type used in Adaptive Multi-Rate (AMR) and AMR-Wideband (AMR-WB) codecs to efficiently represent silence periods during voice calls. It replaces actual speech frames with compact descriptors, significantly reducing bandwidth consumption and improving network capacity without degrading perceived voice quality.

Description

The Silence Insertion Descriptor (SID) is a fundamental component of the Adaptive Multi-Rate (AMR) and AMR-Wideband (AMR-WB) speech codecs standardized by 3GPP. During a voice call, human speech contains natural pauses and silence periods, which can constitute up to 60% of the conversation. Transmitting these silent segments as regular speech frames would be highly inefficient. Instead, the codec employs a Voice Activity Detection (VAD) algorithm at the transmitting end to identify these non-speech intervals. Upon detecting silence, the encoder stops generating conventional speech frames and produces a special SID frame. This SID frame is a compact data structure that contains essential parameters to characterize the background noise, such as spectral envelope information and energy levels, allowing the receiver to generate Comfort Noise (CN) that matches the acoustic environment of the caller.

The architecture for SID frame generation and processing is integrated within the speech codec's operational modes. The encoder, upon VAD-triggered transition from active speech to silence, transmits an initial SID frame (often called a 'first SID') to establish the noise parameters. Subsequently, during the prolonged silence period, the encoder may send periodic update SID frames at a much lower rate (e.g., every 160 ms or 320 ms) compared to the regular 20 ms speech frame rate, to track any changes in the background noise. This discontinuous transmission (DTX) mechanism, where SID frames are sent sporadically, is the core of bandwidth savings. The receiver's decoder uses the information in the received SID frames to synthesize comfort noise through a noise generation function, preventing the eerie 'dead silence' that would otherwise be perceived by the listener and maintaining a natural call experience.

The technical implementation of SID frames involves specific bit patterns and frame types defined in the codec specifications. For AMR, there are different SID frame types corresponding to various codec modes. The SID frame is much smaller than a full speech frame; for instance, an AMR 12.2 kbps mode speech frame is 244 bits, while a SID frame can be as small as 35 bits. This drastic reduction in payload size is what conserves radio resources and battery life. The SID mechanism works in tandem with the Radio Access Network's transport protocols, which must correctly identify and handle these special frames to ensure they are not mistaken for corrupted data. The role of SID is thus critical in the end-to-end voice service chain, enabling efficient use of the channel while preserving a high-quality, natural-sounding user experience, which is a key requirement for mobile telephony.

Purpose & Motivation

The primary purpose of the Silence Insertion Descriptor is to enable efficient bandwidth utilization during voice calls by eliminating the wasteful transmission of silence. In early digital voice systems, even during pauses in speech, the channel would remain occupied with data representing background noise or mere digital silence, consuming valuable radio spectrum and network capacity. This was particularly problematic for cellular networks where spectrum is a scarce and expensive resource. The SID mechanism, as part of the DTX feature, was created to solve this problem, directly increasing the number of simultaneous calls a cell can handle and reducing interference in the system.

Historically, before sophisticated codecs like AMR, some systems used simple on-off DTX which could lead to an unpleasant switching effect where background noise would abruptly disappear and reappear, creating a 'choppy' auditory experience. The innovation of SID was to provide a descriptor that allows the receiving end to reconstruct a plausible approximation of the sender's background noise. This addresses the key limitation of previous DTX approaches: the need to maintain acoustic continuity and call naturalness. By sending a compact mathematical description of the noise rather than the noise itself, the system achieves the dual goals of efficiency and quality.

The motivation for its creation within 3GPP was integral to the development of the AMR codec for GSM and later UMTS. As networks evolved to support more users and data services, optimizing every aspect of voice traffic became paramount. SID is a classic example of a perceptual optimization—exploiting the characteristics of human hearing and conversation patterns to design a more efficient technical system without the user perceiving any negative impact, thereby enhancing overall network performance and economic viability.

Key Features

Enables Discontinuous Transmission (DTX) by replacing silence with compact frames
Carries parameters for background noise characterization (spectral envelope, energy)
Triggers generation of Comfort Noise (CN) at the receiver to maintain call naturalness
Significantly reduces payload size compared to active speech frames (e.g., ~35 bits vs. 244 bits)
Operates with Voice Activity Detection (VAD) for automatic silence/speech classification
Supports periodic update during long silence to track noise changes

Evolution Across Releases

Rel-5 Initial

Introduced as part of the Adaptive Multi-Rate (AMR) codec specification for GSM/EDGE and UMTS. Defined the fundamental SID frame structure and its role in the DTX operation. Established the mechanism for generating comfort noise from SID parameters to ensure natural sound during speech pauses.

Defining Specifications

Specification	Title
TS 21.905	3GPP TS 21.905
TS 25.415	3GPP TS 25.415
TS 26.091	3GPP TS 26.091
TS 26.092	3GPP TS 26.092
TS 26.093	3GPP TS 26.093
TS 26.101	3GPP TS 26.101
TS 26.102	3GPP TS 26.102
TS 26.103	3GPP TS 26.103
TS 26.114	3GPP TS 26.114
TS 26.191	3GPP TS 26.191
TS 26.192	3GPP TS 26.192
TS 26.193	3GPP TS 26.193
TS 26.201	3GPP TS 26.201
TS 26.202	3GPP TS 26.202
TS 26.250	3GPP TS 26.250
TS 26.258	3GPP TS 26.258
TS 26.441	3GPP TS 26.441
TS 26.442	3GPP TS 26.442
TS 26.443	3GPP TS 26.443
TS 26.444	3GPP TS 26.444
TS 26.446	3GPP TS 26.446
TS 26.448	3GPP TS 26.448
TS 26.449	3GPP TS 26.449
TS 26.450	3GPP TS 26.450
TS 26.451	3GPP TS 26.451
TS 26.452	3GPP TS 26.452
TS 26.453	3GPP TS 26.453
TS 26.916	3GPP TS 26.916
TS 26.952	3GPP TS 26.952
TS 26.975	3GPP TS 26.975
TS 26.978	3GPP TS 26.978
TS 26.998	3GPP TS 26.998
TS 28.620	3GPP TS 28.620
TS 29.414	3GPP TS 29.414
TS 29.892	3GPP TS 29.892
TS 32.808	3GPP TR 32.808
TS 38.805	3GPP TR 38.805
TS 38.807	3GPP TR 38.807
TS 38.808	3GPP TR 38.808
TS 38.859	3GPP TR 38.859
TS 43.901	3GPP TR 43.901
TS 45.913	3GPP TR 45.913
TS 46.002	3GPP TR 46.002
TS 46.008	3GPP TR 46.008
TS 46.021	3GPP TR 46.021
TS 46.022	3GPP TR 46.022
TS 46.041	3GPP TR 46.041
TS 46.051	3GPP TR 46.051
TS 46.055	3GPP TR 46.055
TS 46.061	3GPP TR 46.061
TS 46.062	3GPP TR 46.062
TS 46.081	3GPP TR 46.081