EVS-SID

Enhanced Voice Services Silence Insertion Descriptor

Services
Introduced in Rel-13
A specific Silence Insertion Descriptor (SID) used within the Enhanced Voice Services (EVS) codec for discontinuous transmission (DTX). It provides a compact, efficient representation of silence and background noise during speech pauses, crucial for maintaining high audio quality while conserving bandwidth in voice calls.

Description

The EVS-SID (Enhanced Voice Services Silence Insertion Descriptor) is a core component of the EVS codec's discontinuous transmission (DTX) and comfort noise generation (CNG) system. During active speech, the EVS codec encodes audio frames at a high bitrate. However, during pauses or silence, transmitting full frames is inefficient. The DTX mechanism detects these inactive periods and switches to a low-bitrate mode. In this mode, instead of sending regular speech frames, the encoder transmits special SID (Silence Insertion Descriptor) frames. The EVS-SID is a specific type of SID frame optimized for the EVS codec's parametric model of background noise.

Technically, an EVS-SID frame contains a highly compressed parametric description of the current acoustic background noise. This description typically includes spectral envelope parameters (e.g., Linear Predictive Coding coefficients or spectral band energies) and potentially information about the noise's stationarity and level. The frame is transmitted at regular, extended intervals (e.g., every 160-640 ms) during a silence period, rather than with every 20ms frame period used for active speech. This drastically reduces the average bitrate during calls with significant pauses.

At the receiving end, the decoder uses the parameters from the most recently received EVS-SID frame to synthesize comfort noise. This synthesized noise matches the spectral characteristics of the background noise at the sender's location, preventing the unnatural 'dead silence' that would otherwise occur when transmission stops. The process involves generating a noise signal, shaping its spectrum according to the received parameters, and adjusting its gain. This maintains a consistent and natural acoustic background for the listener, which is psychologically important for call quality.

The EVS-SID operates within the broader EVS codec framework defined in 3GPP TS 26.445. Its specific packet formats and payload structures for SID frames are detailed in the codec's transport layer specifications, TS 26.453 (RTP payload format) and TS 26.454 (file format). The design of EVS-SID is integral to EVS's superior performance in noisy environments and its ability to provide high-quality voice services at lower average bitrates compared to legacy codecs like AMR-WB.

Purpose & Motivation

The EVS-SID was created to address the specific needs of the Enhanced Voice Services (EVS) codec, introduced in 3GPP Release 13, for high-efficiency discontinuous transmission (DTX). Previous codecs like AMR and AMR-WB also used SID frames, but EVS's advanced audio model and support for super-wideband and full-band audio required a more sophisticated noise modeling approach. The purpose of EVS-SID is to enable significant radio resource savings during voice calls without degrading the perceived conversational experience.

Historically, early digital voice systems without DTX simply transmitted encoded silence, wasting bandwidth. Later codecs introduced basic DTX with simple noise models, but the synthesized comfort noise could sound artificial or fail to track changing background noise. The EVS codec was designed for next-generation voice services (VoLTE, ViLTE, VoNR) demanding both high quality and network efficiency. The EVS-SID solves the problem of efficiently representing complex, time-varying background noises (like office babble or street noise) with a very low bitrate parametric description, allowing the network to carry more simultaneous calls or free up capacity for data services.

Its creation was motivated by the need to extend the benefits of efficient DTX to the superior audio quality realms of super-wideband and full-band speech. Without a tailored SID mechanism, the bandwidth savings of DTX would be lost when using the high-quality modes of EVS. The EVS-SID ensures that the codec's advanced capabilities are not undermined during speech pauses, making high-quality voice services economically viable for mobile operators.

Key Features

  • Parametric encoding of background noise spectrum for ultra-low bitrate transmission
  • Enables Discontinuous Transmission (DTX) in EVS codec, drastically reducing average call bitrate
  • Supports Comfort Noise Generation (CNG) at the receiver to avoid 'dead silence'
  • Optimized for EVS's super-wideband and full-band audio models
  • Defined payload formats for RTP transport (TS 26.453) and storage (TS 26.454)
  • Integral part of EVS's robust performance in variable acoustic environments

Evolution Across Releases

Rel-13 Initial

Introduced as part of the initial Enhanced Voice Services (EVS) codec specification. The EVS-SID was defined to provide DTX and CNG functionality specifically tailored to the new codec's advanced audio capabilities, including support for super-wideband and full-band operation. Its initial architecture included parametric noise modeling for efficient SID frame generation.

Defining Specifications

SpecificationTitle
TS 26.453 3GPP TS 26.453
TS 26.454 3GPP TS 26.454