SID

Silence Insertion Descriptor

Services
Introduced in Rel-5
A frame type used in Adaptive Multi-Rate (AMR) and AMR-Wideband (AMR-WB) codecs to efficiently represent silence periods during voice calls. It replaces actual speech frames with compact descriptors, significantly reducing bandwidth consumption and improving network capacity without degrading perceived voice quality.

Description

The Silence Insertion Descriptor (SID) is a fundamental component of the Adaptive Multi-Rate (AMR) and AMR-Wideband (AMR-WB) speech codecs standardized by 3GPP. During a voice call, human speech contains natural pauses and silence periods, which can constitute up to 60% of the conversation. Transmitting these silent segments as regular speech frames would be highly inefficient. Instead, the codec employs a Voice Activity Detection (VAD) algorithm at the transmitting end to identify these non-speech intervals. Upon detecting silence, the encoder stops generating conventional speech frames and produces a special SID frame. This SID frame is a compact data structure that contains essential parameters to characterize the background noise, such as spectral envelope information and energy levels, allowing the receiver to generate Comfort Noise (CN) that matches the acoustic environment of the caller.

The architecture for SID frame generation and processing is integrated within the speech codec's operational modes. The encoder, upon VAD-triggered transition from active speech to silence, transmits an initial SID frame (often called a 'first SID') to establish the noise parameters. Subsequently, during the prolonged silence period, the encoder may send periodic update SID frames at a much lower rate (e.g., every 160 ms or 320 ms) compared to the regular 20 ms speech frame rate, to track any changes in the background noise. This discontinuous transmission (DTX) mechanism, where SID frames are sent sporadically, is the core of bandwidth savings. The receiver's decoder uses the information in the received SID frames to synthesize comfort noise through a noise generation function, preventing the eerie 'dead silence' that would otherwise be perceived by the listener and maintaining a natural call experience.

The technical implementation of SID frames involves specific bit patterns and frame types defined in the codec specifications. For AMR, there are different SID frame types corresponding to various codec modes. The SID frame is much smaller than a full speech frame; for instance, an AMR 12.2 kbps mode speech frame is 244 bits, while a SID frame can be as small as 35 bits. This drastic reduction in payload size is what conserves radio resources and battery life. The SID mechanism works in tandem with the Radio Access Network's transport protocols, which must correctly identify and handle these special frames to ensure they are not mistaken for corrupted data. The role of SID is thus critical in the end-to-end voice service chain, enabling efficient use of the channel while preserving a high-quality, natural-sounding user experience, which is a key requirement for mobile telephony.

Purpose & Motivation

The primary purpose of the Silence Insertion Descriptor is to enable efficient bandwidth utilization during voice calls by eliminating the wasteful transmission of silence. In early digital voice systems, even during pauses in speech, the channel would remain occupied with data representing background noise or mere digital silence, consuming valuable radio spectrum and network capacity. This was particularly problematic for cellular networks where spectrum is a scarce and expensive resource. The SID mechanism, as part of the DTX feature, was created to solve this problem, directly increasing the number of simultaneous calls a cell can handle and reducing interference in the system.

Historically, before sophisticated codecs like AMR, some systems used simple on-off DTX which could lead to an unpleasant switching effect where background noise would abruptly disappear and reappear, creating a 'choppy' auditory experience. The innovation of SID was to provide a descriptor that allows the receiving end to reconstruct a plausible approximation of the sender's background noise. This addresses the key limitation of previous DTX approaches: the need to maintain acoustic continuity and call naturalness. By sending a compact mathematical description of the noise rather than the noise itself, the system achieves the dual goals of efficiency and quality.

The motivation for its creation within 3GPP was integral to the development of the AMR codec for GSM and later UMTS. As networks evolved to support more users and data services, optimizing every aspect of voice traffic became paramount. SID is a classic example of a perceptual optimization—exploiting the characteristics of human hearing and conversation patterns to design a more efficient technical system without the user perceiving any negative impact, thereby enhancing overall network performance and economic viability.

Key Features

  • Enables Discontinuous Transmission (DTX) by replacing silence with compact frames
  • Carries parameters for background noise characterization (spectral envelope, energy)
  • Triggers generation of Comfort Noise (CN) at the receiver to maintain call naturalness
  • Significantly reduces payload size compared to active speech frames (e.g., ~35 bits vs. 244 bits)
  • Operates with Voice Activity Detection (VAD) for automatic silence/speech classification
  • Supports periodic update during long silence to track noise changes

Evolution Across Releases

Defining Specifications

SpecificationTitle
TS 21.905 3GPP TS 21.905
TS 25.415 3GPP TS 25.415
TS 26.091 3GPP TS 26.091
TS 26.092 3GPP TS 26.092
TS 26.093 3GPP TS 26.093
TS 26.101 3GPP TS 26.101
TS 26.102 3GPP TS 26.102
TS 26.103 3GPP TS 26.103
TS 26.114 3GPP TS 26.114
TS 26.191 3GPP TS 26.191
TS 26.192 3GPP TS 26.192
TS 26.193 3GPP TS 26.193
TS 26.201 3GPP TS 26.201
TS 26.202 3GPP TS 26.202
TS 26.250 3GPP TS 26.250
TS 26.258 3GPP TS 26.258
TS 26.441 3GPP TS 26.441
TS 26.442 3GPP TS 26.442
TS 26.443 3GPP TS 26.443
TS 26.444 3GPP TS 26.444
TS 26.446 3GPP TS 26.446
TS 26.448 3GPP TS 26.448
TS 26.449 3GPP TS 26.449
TS 26.450 3GPP TS 26.450
TS 26.451 3GPP TS 26.451
TS 26.452 3GPP TS 26.452
TS 26.453 3GPP TS 26.453
TS 26.916 3GPP TS 26.916
TS 26.952 3GPP TS 26.952
TS 26.975 3GPP TS 26.975
TS 26.978 3GPP TS 26.978
TS 26.998 3GPP TS 26.998
TS 28.620 3GPP TS 28.620
TS 29.414 3GPP TS 29.414
TS 29.892 3GPP TS 29.892
TS 32.808 3GPP TR 32.808
TS 38.805 3GPP TR 38.805
TS 38.807 3GPP TR 38.807
TS 38.808 3GPP TR 38.808
TS 38.859 3GPP TR 38.859
TS 43.901 3GPP TR 43.901
TS 45.913 3GPP TR 45.913
TS 46.002 3GPP TR 46.002
TS 46.008 3GPP TR 46.008
TS 46.021 3GPP TR 46.021
TS 46.022 3GPP TR 46.022
TS 46.041 3GPP TR 46.041
TS 46.051 3GPP TR 46.051
TS 46.055 3GPP TR 46.055
TS 46.061 3GPP TR 46.061
TS 46.062 3GPP TR 46.062
TS 46.081 3GPP TR 46.081