Description
The Silence Insertion Descriptor (SID) is a fundamental component of the Adaptive Multi-Rate (AMR) and AMR-Wideband (AMR-WB) speech codecs standardized by 3GPP. During a voice call, human speech contains natural pauses and silence periods, which can constitute up to 60% of the conversation. Transmitting these silent segments as regular speech frames would be highly inefficient. Instead, the codec employs a Voice Activity Detection (VAD) algorithm at the transmitting end to identify these non-speech intervals. Upon detecting silence, the encoder stops generating conventional speech frames and produces a special SID frame. This SID frame is a compact data structure that contains essential parameters to characterize the background noise, such as spectral envelope information and energy levels, allowing the receiver to generate Comfort Noise (CN) that matches the acoustic environment of the caller.
The architecture for SID frame generation and processing is integrated within the speech codec's operational modes. The encoder, upon VAD-triggered transition from active speech to silence, transmits an initial SID frame (often called a 'first SID') to establish the noise parameters. Subsequently, during the prolonged silence period, the encoder may send periodic update SID frames at a much lower rate (e.g., every 160 ms or 320 ms) compared to the regular 20 ms speech frame rate, to track any changes in the background noise. This discontinuous transmission (DTX) mechanism, where SID frames are sent sporadically, is the core of bandwidth savings. The receiver's decoder uses the information in the received SID frames to synthesize comfort noise through a noise generation function, preventing the eerie 'dead silence' that would otherwise be perceived by the listener and maintaining a natural call experience.
The technical implementation of SID frames involves specific bit patterns and frame types defined in the codec specifications. For AMR, there are different SID frame types corresponding to various codec modes. The SID frame is much smaller than a full speech frame; for instance, an AMR 12.2 kbps mode speech frame is 244 bits, while a SID frame can be as small as 35 bits. This drastic reduction in payload size is what conserves radio resources and battery life. The SID mechanism works in tandem with the Radio Access Network's transport protocols, which must correctly identify and handle these special frames to ensure they are not mistaken for corrupted data. The role of SID is thus critical in the end-to-end voice service chain, enabling efficient use of the channel while preserving a high-quality, natural-sounding user experience, which is a key requirement for mobile telephony.
Purpose & Motivation
The primary purpose of the Silence Insertion Descriptor is to enable efficient bandwidth utilization during voice calls by eliminating the wasteful transmission of silence. In early digital voice systems, even during pauses in speech, the channel would remain occupied with data representing background noise or mere digital silence, consuming valuable radio spectrum and network capacity. This was particularly problematic for cellular networks where spectrum is a scarce and expensive resource. The SID mechanism, as part of the DTX feature, was created to solve this problem, directly increasing the number of simultaneous calls a cell can handle and reducing interference in the system.
Historically, before sophisticated codecs like AMR, some systems used simple on-off DTX which could lead to an unpleasant switching effect where background noise would abruptly disappear and reappear, creating a 'choppy' auditory experience. The innovation of SID was to provide a descriptor that allows the receiving end to reconstruct a plausible approximation of the sender's background noise. This addresses the key limitation of previous DTX approaches: the need to maintain acoustic continuity and call naturalness. By sending a compact mathematical description of the noise rather than the noise itself, the system achieves the dual goals of efficiency and quality.
The motivation for its creation within 3GPP was integral to the development of the AMR codec for GSM and later UMTS. As networks evolved to support more users and data services, optimizing every aspect of voice traffic became paramount. SID is a classic example of a perceptual optimization—exploiting the characteristics of human hearing and conversation patterns to design a more efficient technical system without the user perceiving any negative impact, thereby enhancing overall network performance and economic viability.
Key Features
- Enables Discontinuous Transmission (DTX) by replacing silence with compact frames
- Carries parameters for background noise characterization (spectral envelope, energy)
- Triggers generation of Comfort Noise (CN) at the receiver to maintain call naturalness
- Significantly reduces payload size compared to active speech frames (e.g., ~35 bits vs. 244 bits)
- Operates with Voice Activity Detection (VAD) for automatic silence/speech classification
- Supports periodic update during long silence to track noise changes
Evolution Across Releases
Introduced as part of the Adaptive Multi-Rate (AMR) codec specification for GSM/EDGE and UMTS. Defined the fundamental SID frame structure and its role in the DTX operation. Established the mechanism for generating comfort noise from SID parameters to ensure natural sound during speech pauses.
Defining Specifications
| Specification | Title |
|---|---|
| TS 21.905 | 3GPP TS 21.905 |
| TS 25.415 | 3GPP TS 25.415 |
| TS 26.091 | 3GPP TS 26.091 |
| TS 26.092 | 3GPP TS 26.092 |
| TS 26.093 | 3GPP TS 26.093 |
| TS 26.101 | 3GPP TS 26.101 |
| TS 26.102 | 3GPP TS 26.102 |
| TS 26.103 | 3GPP TS 26.103 |
| TS 26.114 | 3GPP TS 26.114 |
| TS 26.191 | 3GPP TS 26.191 |
| TS 26.192 | 3GPP TS 26.192 |
| TS 26.193 | 3GPP TS 26.193 |
| TS 26.201 | 3GPP TS 26.201 |
| TS 26.202 | 3GPP TS 26.202 |
| TS 26.250 | 3GPP TS 26.250 |
| TS 26.258 | 3GPP TS 26.258 |
| TS 26.441 | 3GPP TS 26.441 |
| TS 26.442 | 3GPP TS 26.442 |
| TS 26.443 | 3GPP TS 26.443 |
| TS 26.444 | 3GPP TS 26.444 |
| TS 26.446 | 3GPP TS 26.446 |
| TS 26.448 | 3GPP TS 26.448 |
| TS 26.449 | 3GPP TS 26.449 |
| TS 26.450 | 3GPP TS 26.450 |
| TS 26.451 | 3GPP TS 26.451 |
| TS 26.452 | 3GPP TS 26.452 |
| TS 26.453 | 3GPP TS 26.453 |
| TS 26.916 | 3GPP TS 26.916 |
| TS 26.952 | 3GPP TS 26.952 |
| TS 26.975 | 3GPP TS 26.975 |
| TS 26.978 | 3GPP TS 26.978 |
| TS 26.998 | 3GPP TS 26.998 |
| TS 28.620 | 3GPP TS 28.620 |
| TS 29.414 | 3GPP TS 29.414 |
| TS 29.892 | 3GPP TS 29.892 |
| TS 32.808 | 3GPP TR 32.808 |
| TS 38.805 | 3GPP TR 38.805 |
| TS 38.807 | 3GPP TR 38.807 |
| TS 38.808 | 3GPP TR 38.808 |
| TS 38.859 | 3GPP TR 38.859 |
| TS 43.901 | 3GPP TR 43.901 |
| TS 45.913 | 3GPP TR 45.913 |
| TS 46.002 | 3GPP TR 46.002 |
| TS 46.008 | 3GPP TR 46.008 |
| TS 46.021 | 3GPP TR 46.021 |
| TS 46.022 | 3GPP TR 46.022 |
| TS 46.041 | 3GPP TR 46.041 |
| TS 46.051 | 3GPP TR 46.051 |
| TS 46.055 | 3GPP TR 46.055 |
| TS 46.061 | 3GPP TR 46.061 |
| TS 46.062 | 3GPP TR 46.062 |
| TS 46.081 | 3GPP TR 46.081 |