ASL (Active Speech Level) — 3GPP Glossary

ASL is a standardized metric for measuring the average power level of active speech segments in audio signals. It is crucial for ensuring consistent speech quality and loudness in telecommunications, particularly for voice over NR (VoNR) and voice over LTE (VoLTE) services.

Description

Active Speech Level (ASL) is a technical parameter defined by 3GPP to quantify the average power level of speech during active talk spurts, excluding silent periods and background noise. It is measured in decibels relative to a full-scale digital signal (dBov) or decibels relative to a specified reference (dBm0). The calculation involves segmenting the audio signal, identifying active speech intervals using voice activity detection (VAD) algorithms, and computing the root mean square (RMS) power over those intervals. This process ensures that only the speech components contribute to the level measurement, providing a reliable metric for speech loudness independent of silence or noise.

In the 3GPP architecture, ASL is integrated into the media processing functions of the IP Multimedia Subsystem (IMS) and the 5G Core Network (5GC). For voice services like VoNR and VoLTE, the Media Resource Function (MRF) and the Application Function (AF) utilize ASL measurements to monitor and control speech levels. The metric is applied at various points in the transmission path, including the user equipment (UE), the radio access network (RAN), and the core network nodes involved in media handling. This end-to-end application ensures that speech levels remain consistent across different network segments and devices.

Key components involved in ASL implementation include the codec processing units, which encode and decode speech signals, and the VAD modules, which distinguish between speech and non-speech segments. The ASL value is often used in conjunction with other parameters like the Noise Level (NL) and the Speech Level Variation (SLV) to comprehensively assess speech quality. In 3GPP specifications such as TS 26.801 and TS 26.921, ASL is defined as part of the performance requirements for speech codecs and voice service quality metrics. It plays a critical role in maintaining interoperability between different vendors' equipment and ensuring a uniform user experience.

The role of ASL in the network extends to quality of service (QoS) management and network optimization. By monitoring ASL values, network operators can detect issues such as excessive attenuation or amplification in the speech path, which could degrade call quality. ASL data can trigger adaptive level control mechanisms, such as automatic gain control (AGC), to adjust speech levels dynamically. This is particularly important in 5G networks, where voice services must meet stringent quality standards to compete with over-the-top (OTT) applications. Furthermore, ASL is used in testing and certification processes to verify that devices and network elements comply with 3GPP specifications.

Overall, ASL is a fundamental metric for speech processing in modern telecommunications. Its standardized definition allows for consistent measurement and control of speech loudness, which is essential for delivering high-quality voice services. By providing a clear reference for active speech power, ASL helps mitigate issues related to level mismatches and ensures that users experience clear and comfortable voice communications across diverse network conditions and device types.

Purpose & Motivation

The creation of Active Speech Level (ASL) was motivated by the need for a standardized method to measure speech loudness in digital voice communications. Prior to its definition, varying implementations of speech level measurement led to inconsistencies in voice quality across networks and devices. This resulted in user discomfort, such as calls being too loud or too soft, and complicated interoperability between equipment from different manufacturers. ASL addresses these problems by providing a uniform metric that focuses exclusively on active speech segments, excluding silence and noise, thereby enabling accurate and comparable assessments of speech levels.

Historically, analog telephone systems relied on simple power measurements that included all audio components, which could be misleading due to background noise. With the transition to digital and packet-switched networks like LTE and 5G, precise control over speech parameters became critical for maintaining quality in voice over IP (VoIP) services. 3GPP introduced ASL in Release 16 as part of the enhanced voice service requirements for 5G New Radio (NR), specifically for VoNR. This standardization ensures that speech levels are managed consistently from end to end, supporting seamless handovers between different radio access technologies and core networks.

ASL solves the limitation of previous approaches that lacked a clear distinction between speech and non-speech elements, which could skew level measurements and lead to inappropriate gain adjustments. By defining ASL, 3GPP enables network operators and device vendors to implement reliable level control mechanisms, such as automatic gain control (AGC) and loudness normalization. This is essential for meeting user expectations in an era where voice services must compete with high-quality OTT applications. Furthermore, ASL facilitates regulatory compliance for loudness standards in telecommunications, ensuring that services adhere to safety and comfort guidelines.

Key Features

Standardized measurement of active speech power in dBov or dBm0
Exclusion of silent periods and background noise via VAD algorithms
Integration with IMS and 5GC for end-to-end voice service management
Support for QoS monitoring and adaptive level control in networks
Use in testing and certification for 3GPP-compliant devices
Application in VoNR and VoLTE to ensure consistent speech loudness

Evolution Across Releases

Rel-16 Initial

Introduced ASL as a standardized metric for active speech level measurement in 3GPP specifications TS 26.801 and TS 26.921. It defined the methodology for calculating speech power during active talk spurts, excluding silence and noise, to support voice services in 5G NR, particularly VoNR. This initial architecture established ASL as a key parameter for speech quality assessment and level control in next-generation networks.

TS 26.801 TS 26.921

Defining Specifications

Specification	Title
TS 26.801	3GPP TS 26.801
TS 26.921	3GPP TS 26.921