ITD

Inter-Channel Time Difference

Physical Layer →
Introduced in Rel-18

ITD is a parameter in 3GPP audio codecs that represents the time delay difference between audio channels to enable spatial audio rendering and sound localization.

Category
Physical Layer
Introduced
Rel-18
Where
Services › Codecs
Specifications
3 specs
ITD Description Purpose Specifications

Description

Inter-Channel Time Difference (ITD) is a fundamental binaural cue used in spatial audio processing and is standardized within 3GPP for next-generation audio codecs. It quantifies the difference in the time of arrival of a sound wave at a listener's left and right ears. This time difference, along with the Inter-Channel Level Difference (ILD), is crucial for the human auditory system to localize sound sources horizontally. In the context of 3GPP, ITD parameters are generated, encoded, transmitted, and then decoded to faithfully reconstruct a spatial sound scene for immersive services like Voice over New Radio (VoNR) with immersive voice, teleconferencing, and Extended Reality (XR) applications.

The technical implementation involves audio capture using microphone arrays or artificial intelligence-based audio scene analysis to estimate the direction of arrival for different sound sources. For a given audio object or channel pair, the ITD is calculated, typically in the sub-millisecond range. In codecs like the Immersive Voice and Audio Services (IVAS) codec defined in 3GPP TS 26.261, these spatial parameters (ITD, ILD, and others) are parameterized, quantized, and efficiently packetized alongside the core audio signal. This parametric representation is highly bandwidth-efficient compared to transmitting full multi-channel audio streams. At the receiver, the decoder uses the transmitted ITD values, along with a head-related transfer function (HRTF) model, to synthesize binaural audio signals for headphones, creating the perception of sound coming from specific directions.

The specifications governing ITD include TS 26.253 (codec configuration), TS 26.260 (IVAS codec specification), and TS 26.261 (support for immersive conversational services). The architecture involves components in both the UE (for capture and playback) and potentially in the network (for media processing). The accurate preservation and rendering of ITD is critical for the quality of experience (QoE) in immersive communications, as errors can lead to blurred or incorrectly positioned audio images, breaking the sense of immersion.

Purpose & Motivation

ITD was introduced into 3GPP standards to address the limitations of traditional monophonic or stereophonic voice services, which lack spatial realism. As telecommunications evolve to support immersive experiences like virtual meetings, gaming, and XR, there is a growing need to convey not just the audio content but also the spatial arrangement of sound sources. This creates a more natural, engaging, and effective communication environment, allowing users to distinguish between multiple speakers in a conference call as if they were in the same room.

The problem it solves is the bandwidth inefficiency of transmitting discrete multi-channel audio (e.g., 5.1 or Ambisonics) over mobile networks. Transmitting raw audio for each channel consumes excessive data. By extracting and transmitting compact spatial parameters like ITD and ILD, 3GPP codecs can recreate a convincing spatial sound scene at a fraction of the bitrate. This makes immersive audio feasible for mass-market mobile services.

Historically, spatial audio parameters were used in professional audio and gaming. Their introduction in 3GPP Rel-18, particularly with the IVAS codec, was motivated by the industry's push towards 5G-Advanced and 6G use cases that demand ultra-realistic communication. It addresses the limitation of previous voice codecs (like AMR-WB or EVS) which, while high quality, were primarily designed for mono or stereo playback without dedicated spatial cues. Standardizing ITD ensures interoperability between different devices and networks, enabling a consistent immersive audio experience across the ecosystem.

Evolution Across Releases

Rel-18 Initial

Introduced Inter-Channel Time Difference (ITD) as a standardized spatial audio parameter within the 3GPP media codec framework, specifically for the Immersive Voice and Audio Services (IVAS) codec. Defined its role in capturing, encoding, and rendering immersive conversational services to enable realistic sound localization and spatial audio experiences over 5G networks.

Explore further

Broader topics and technologies where ITD plays a role.

Defining Specifications

3GPP specifications that define or reference ITD, with the latest known release. Sourced from the 3GPP document catalog — see methodology.

SpecificationTitleRelease
TS 26.253 vj00 IVAS Codec Algorithmic Description Rel-19
TS 26.260 vj00 Immersive Audio Objective Test Methods Rel-19
TS 26.261 vj00 Electro-acoustic specs for immersive terminals Rel-19