What is ISAR? Immersive Audio for Split Rendering Scenarios

Description

Immersive Audio for Split Rendering Scenarios (ISAR) is a 3GPP media codec and system specification designed to deliver high-quality, object-based spatial audio for Extended Reality (XR) applications, particularly in network architectures where rendering is split between a user device (e.g., XR headset) and a network edge server. ISAR addresses the unique challenges of streaming immersive audio, which requires six-degrees-of-freedom (6DoF) rendering, low end-to-end latency, and high compression efficiency to conserve bandwidth. The architecture typically involves an XR application server in the network (e.g., at the edge) that generates or processes the raw audio scene, containing multiple audio objects with metadata describing their positions, orientations, and acoustic properties. The ISAR encoder compresses this audio scene. A key innovation is the split of the rendering pipeline: part of the rendering (e.g., early reflections, basic binauralization) can be performed on the server, while the final stage (e.g., late reverberation, personalized head-related transfer function (HRTF) application, and compensation for last-moment head movements) is performed on the user equipment (UE). This split reduces the data rate that needs to be transmitted compared to sending fully rendered binaural audio, while also offloading complex processing from the potentially resource-constrained UE. The ISAR stream, containing encoded audio objects and rendering metadata, is delivered over the 5G network. The UE's ISAR decoder and renderer then complete the audio rendering based on the latest sensor data (head position) to create a precise, personalized spatial audio experience. The specifications (TS 26.249, 26.251, etc.) define the codec formats, metadata schemas, APIs, and system interfaces to enable this interoperable, low-latency immersive audio service.

Purpose & Motivation

ISAR was created to solve the audio delivery challenges for truly immersive and interactive XR experiences over mobile networks. Traditional audio codecs (like MPEG-H 3D Audio or Dolby Atmos) are designed for cinematic or broadcast scenarios with fixed playback environments and higher latency tolerance. For interactive XR, where a user can move their head and body in real-time, audio must be rendered dynamically with ultra-low latency (<20ms) to match the visual scene and prevent motion sickness. Transmitting fully rendered binaural audio for every possible head position is prohibitively bandwidth-intensive. ISAR's purpose is to enable efficient streaming by adopting a split-rendering model, which aligns with the overall XR split rendering paradigm studied in 3GPP. This model leverages the compute resources of the 5G network edge for heavy audio processing while keeping final, user-specific rendering on the device. It addresses the limitations of previous approaches: either high bandwidth consumption (sending pre-rendered audio) or high device compute load (rendering everything locally from raw objects, which may not be feasible on lightweight XR glasses). By standardizing ISAR, 3GPP aims to ensure interoperability between XR application providers, network operators, and device manufacturers, fostering a ecosystem for high-quality cloud/edge-rendered XR services over 5G and beyond.

Classification

Part ofXR

Related approaches

Detected Changes Across Releases

from 3GPP Change Requests

Specific changes extracted from the „Change history“ tables of 3GPP specifications (3 CRs across 3 releases). Complements the general historical overview above with the evidence-based evolution of this function.

Rel-15 1 change

In Release 15, the ISAR function was newly introduced, providing a detailed algorithmic description for split rendering of immersive audio to enable operation on power-constrained End Devices. It established a baseline system, defining mandatory post-renderer procedures for compliant UEs and specifying interfaces for both pre-rendering and post-rendering APIs. The release also specified operational parameters, including support for 48 kHz sampling and bitrates from 256 kbps for 0-DOF up to 768 kbps for 3-DOF pose correction.

Correction of sensitivity calculation for immersive audio playback TS 26.260CR002

Rel-18 1 change

In Release 18, the new ISAR (Immersive Audio for Split Rendering Scenarios) function introduced the "track-a split rendering feature" to TS 26.258, which enables pose correction for a lightweight end device using metadata calculated from additional binaural renditions. This release also included corrections to the IVAS C-Code and its corresponding specification text, solidifying the implementation of the ISAR pre-renderer and post-renderer interfaces described in the standard.

Adding ISAR track-a split rendering feature to TS 26.258 and Corrections to the IVAS C-Code and corresponding specification text TS 26.258CR0002

Rel-19 1 change

In Release 19, the ISAR function introduced new test methods for immersive user equipment to verify compliance. This addition builds upon the existing detailed algorithmic description for split rendering, which includes mandatory post-renderer procedures for UEs and defined interfaces for both pre-rendering and post-rendering APIs. The new testing specifically validates UE capabilities against the ISAR baseline system, which operates with pose correction metadata and supports various degrees of freedom.

New test methods for immersive UEs TS 26.260CR0008

Explore further

Broader topics and technologies where ISAR plays a role.

Topics

SON (Self-Organizing Networks)IMS & Voice (VoLTE, VoNR)Lawful Intercept Services & Applications Radio Access Network

Technologies

Defining Specifications

3GPP specifications that define or reference ISAR, with the latest known release. Sourced from the 3GPP document catalog — see methodology.

Specification	Title	Release
TS 26.249 vj00	Immersive Audio Split Rendering (ISAR)	Rel-19
TS 26.251 vj00	IVAS Codec Fixed-Point C Code Specification	Rel-19
TS 26.252 vj00	IVAS Codec Test Sequences Specification	Rel-19
TS 26.258 vj10	IVAS Codec Floating-Point C Code Specification	Rel-19
TS 26.260 vj00	Immersive Audio Objective Test Methods	Rel-19
TR 26.996 vj00	ISAR Split Rendering Audio Characterization	Rel-19
TR 26.997 vj00	IVAS Codec Specification	Rel-19

Immersive Audio for Split Rendering Scenarios