Description
Comfort Noise Generation (CNG) is a signal processing function implemented within the context of 3GPP's Voice over IP (VoIP) and telephony services, particularly those using the Adaptive Multi-Rate (AMR) and Enhanced Voice Services (EVS) codecs. Its primary role is to manage the audio experience during discontinuous transmission (DTX) modes. In DTX, the transmitter stops sending voice frames during periods of silence (e.g., when a user is listening) to conserve network bandwidth and terminal battery power. However, if the receiver simply played absolute silence during these gaps, the abrupt contrast between active speech and dead silence would be unnatural and potentially alarming to the user, as it might be mistaken for a dropped call.
Technically, CNG works by having the transmitting side (e.g., the User Equipment or network node) analyze the background acoustic noise present during active speech. It periodically sends special Silence Insertion Descriptor (SID) frames instead of regular speech frames when entering a silence period. These SID frames are low-bitrate packets that contain parameters characterizing the spectral properties and energy level of the background noise (e.g., linear predictive coding coefficients and gain). The receiving side's decoder uses these parameters to synthesize a matching, low-level noise signal—the comfort noise—which is played to the listener during the silent intervals.
The architecture for CNG is integrated into the voice codec's operation and the associated Radio Access Bearer (RAB) or Packet Data Convergence Protocol (PDCP) service for VoIP. Key components include the noise estimation algorithm on the sender, the SID frame generation logic, the SID frame transmission scheduler, and the comfort noise synthesis module on the receiver. The process is governed by specific timers and thresholds defined in 3GPP specifications to determine when to send the first SID frame after speech stops, how often to update it, and when to stop CNG and return to full speech transmission.
CNG's role is critical for Quality of Experience (QoE). It ensures a continuous, natural-sounding audio background, masking the digital on/off effect of DTX. This psychoacoustic smoothing is a fundamental aspect of perceived voice quality metrics. The technique is employed end-to-end across the 3GPP system, from the UE through the Radio Access Network (RAN) and Core Network, wherever voice packets are processed and where DTX may be applied to optimize resource usage without degrading the subjective listening experience.
Purpose & Motivation
CNG was created to solve a fundamental user experience problem introduced by Discontinuous Transmission (DTX) in digital cellular systems. Early digital voice codecs, when implementing DTX to save power and bandwidth, would create periods of absolute digital silence. This 'dead air' was perceptually jarring; listeners could not distinguish between intentional silence and a call failure, leading to confusion and a perception of poor call quality. The complete absence of sound also made it difficult for users to gauge if the line was still connected, often causing them to speak louder or repeatedly ask 'Hello?'. This degraded the natural flow of conversation.
The motivation for CNG was therefore psychoacoustic: to replicate the constant, low-level ambient noise present in all real-world acoustic environments (like room tone or gentle street noise). By generating this 'comfort' noise, the system provides a consistent auditory backdrop that signals an active connection. Historically, this concept was important in analog telephony where the circuit itself provided a faint hiss, but was lost in early pure-digital implementations. CNG restores this natural cue digitally.
It addresses the limitations of simple DTX by transforming a resource-saving technique from a potential quality liability into a transparent feature. Without CNG, the benefits of DTX (extended battery life, reduced network congestion, and lower interference) would come at an unacceptable cost to user satisfaction. Thus, CNG is an enabling technology that allows network operators to deploy efficient VoIP and VoLTE/VoNR services without compromising the familiar, comfortable experience of a traditional phone call.
Key Features
- Generates artificial background noise during speech silence periods to maintain a natural auditory experience
- Utilizes low-bitrate Silence Insertion Descriptor (SID) frames to transmit noise parameters instead of full speech frames
- Integrated with voice codecs like AMR and EVS to operate seamlessly during Discontinuous Transmission (DTX) modes
- Reduces perceptual discontinuity and prevents user confusion that can arise from absolute digital silence
- Conserves UE battery life and network bandwidth by enabling efficient DTX operation without quality degradation
- Parameters are updated periodically to track changes in the acoustic background noise environment
Evolution Across Releases
Introduced as part of the AMR codec for circuit-switched voice and early VoIP definitions in the IP Multimedia Subsystem (IMS). The initial architecture defined the basic SID frame structure for noise parameter transmission and the receiver-side noise synthesis algorithms. It established the fundamental timers for sending the first SID and for SID updates during long silence periods.
Defining Specifications
| Specification | Title |
|---|---|
| TS 21.905 | 3GPP TS 21.905 |
| TS 24.523 | 3GPP TS 24.523 |
| TS 24.525 | 3GPP TS 24.525 |
| TS 26.094 | 3GPP TS 26.094 |
| TS 26.194 | 3GPP TS 26.194 |
| TS 26.253 | 3GPP TS 26.253 |
| TS 26.441 | 3GPP TS 26.441 |
| TS 26.442 | 3GPP TS 26.442 |
| TS 26.443 | 3GPP TS 26.443 |
| TS 26.444 | 3GPP TS 26.444 |
| TS 26.446 | 3GPP TS 26.446 |
| TS 26.448 | 3GPP TS 26.448 |
| TS 26.449 | 3GPP TS 26.449 |
| TS 26.450 | 3GPP TS 26.450 |
| TS 26.451 | 3GPP TS 26.451 |
| TS 26.452 | 3GPP TS 26.452 |
| TS 26.952 | 3GPP TS 26.952 |