WB-SID (Wideband Silence Insertion Descriptor) — 3GPP Glossary

WB-SID is a frame type used in AMR-WB and EVS codecs to efficiently represent silence or background noise during speech pauses. It enables discontinuous transmission (DTX), saving radio resources and battery life while maintaining acceptable voice quality by generating comfort noise.

Description

The Wideband Silence Insertion Descriptor (WB-SID) is a specialized data frame defined in 3GPP specifications for the Adaptive Multi-Rate Wideband (AMR-WB) and Enhanced Voice Services (EVS) codecs. It is used during voice activity detection (VAD) identified silence periods or background noise intervals in a voice call. Instead of transmitting regular speech frames containing actual encoded audio, the WB-SID frame is sent to inform the receiver about the characteristics of the background noise, allowing the receiver to generate synthetic comfort noise. This mechanism is part of the Discontinuous Transmission (DTX) operation, where the transmitter stops sending full speech frames during pauses.

Architecturally, WB-SID frames are generated by the speech codec's VAD and comfort noise generation (CNG) subsystems in the UE or network node (e.g., MGW). In AMR-WB, the WB-SID frame contains parameters such as the logarithmic frame energy and a set of Line Spectral Frequencies (LSFs) that model the spectral shape of the background noise. These parameters are derived from the last few frames of actual background noise before the speech pause begins. The frame is much smaller in size compared to a full speech frame (e.g., 40 bits vs. several hundred bits), leading to significant bandwidth savings. In EVS, the WB-SID concept is extended for the EVS AMR-WB interoperable mode, where similar parameters are transmitted to ensure interoperability with legacy AMR-WB equipment.

How WB-SID works involves a coordinated process between transmitter and receiver. During a speech pause, the transmitter analyzes the background noise, creates a WB-SID frame, and sends it at a reduced rate (e.g., once every 160 ms or as configured). The receiver decodes the WB-SID frame to extract the noise parameters and uses them to synthesize comfort noise through a noise generator. This synthetic noise approximates the original background noise (like office hum or street sounds), preventing an unnatural 'dead silence' that could be disconcerting to the listener. The process repeats periodically to adapt to changing noise conditions. If speech resumes, the transmitter sends a speech frame with a marker to indicate the end of the silence period.

The role of WB-SID in the network is critical for efficient voice service delivery, especially over radio interfaces. By reducing the number of transmitted frames during silence, it decreases the uplink and downlink data rate, conserving valuable radio resources in LTE and NR. This allows more users to be multiplexed on the same carrier, improving system capacity. Additionally, for the UE, transmitting fewer frames reduces power consumption, extending battery life. WB-SID is managed by the codec mode adaptation and radio resource control mechanisms, ensuring that silence frames are handled appropriately across different network conditions and handovers.

Purpose & Motivation

WB-SID was introduced in 3GPP Release 13 as part of the evolution of wideband voice codecs to address inefficiencies in transmitting silence during voice calls. Prior to DTX and SID frames, voice codecs transmitted frames continuously even during pauses, wasting bandwidth and power. Narrowband codecs like AMR had SID frames, but with the adoption of AMR-WB (G.722.2) for high-quality voice, a wideband equivalent was needed to maintain DTX benefits without compromising the wider audio bandwidth (50-7000 Hz).

The primary problem WB-SID solves is the inefficient use of resources during speech pauses, which constitute about 50-60% of a typical conversation. Without DTX, these pauses would consume full bitrate transmission, straining radio capacity and UE battery. WB-SID enables DTX for AMR-WB, allowing the system to operate in a low-bitrate mode during silence. This is particularly important for VoLTE (Voice over LTE) and VoNR (Voice over NR), where voice is packet-switched and competes with data for resources. By reducing the average bitrate, WB-SID helps maintain voice quality while freeing up resources for other services.

Moreover, WB-SID ensures interoperability and quality in mixed deployments. With EVS (introduced in Rel-12) supporting AMR-WB interoperability modes, WB-SID frames allow EVS-equipped devices to communicate with legacy AMR-WB devices while still using DTX. The evolution to EVS also brought improved comfort noise generation, but WB-SID provides a backward-compatible mechanism. Thus, WB-SID was motivated by the need for efficient, high-quality voice services in modern mobile networks, balancing capacity, battery life, and user experience.

Key Features

Represents silence/background noise in AMR-WB and EVS AMR-WB IO modes
Contains noise parameters (energy, spectral shape) for comfort noise generation
Enables Discontinuous Transmission (DTX), reducing average bitrate
Small frame size (e.g., 40 bits) compared to speech frames
Transmitted periodically during speech pauses (e.g., every 160 ms)
Supports interoperability between EVS and legacy AMR-WB codecs

Evolution Across Releases

Rel-13 Initial

Introduced WB-SID for AMR-WB codec in 3GPP specifications. Defined frame structure and parameters for wideband comfort noise generation. Enabled DTX operation for AMR-WB, reducing bandwidth consumption during silence periods in VoLTE and other voice services.

TS 26.453 TS 26.454

Defining Specifications

Specification	Title
TS 26.453	3GPP TS 26.453
TS 26.454	3GPP TS 26.454