OSBA

Objects (ISM) with Scene-Based Audio

Services →
Introduced in Rel-18

OSBA is a 3GPP standard for immersive audio that defines the representation and streaming of audio scenes composed of individual sound objects and metadata, enabling personalized, interactive spatial audio.

Category
Services
Introduced
Rel-18
Where
Services › Codecs
Specifications
4 specs
OSBA Description Purpose Related Classification Detected Changes Specifications

Description

Objects (ISM) with Scene-Based Audio (OSBA) is a media delivery standard within the 3GPP Immersive Sound Model (ISM) framework, specifically designed for representing and rendering complex, object-based audio scenes. An audio scene in OSBA is not a single monolithic audio track but a composition of multiple individual audio objects, each with its own audio essence (the sound data) and rich spatial metadata. This metadata precisely defines each object's position, movement, size, and other acoustic properties within a three-dimensional coordinate system. The core of OSBA's operation involves the authoring, encapsulation, delivery, and client-side rendering of these scenes. Content creators author scenes using tools that output audio objects and metadata, which are then packaged according to 3GPP specifications, typically within ISOBMFF (MP4) containers for streaming.

For delivery, OSBA leverages existing adaptive streaming protocols like DASH or HLS. The audio objects and their dynamic metadata are packaged as separate media components or tracks within a media presentation. This allows the streaming client to request and receive only the components necessary for the current scene and user perspective. A key technical aspect is the synchronization of object audio essence with its time-varying spatial metadata, ensuring that sounds are rendered at the correct location at the correct time. The client-side renderer, which could be on a smartphone, XR headset, or home theater system, receives these components, decodes the audio objects, and uses the metadata to spatially render the audio scene in real-time, often using binaural rendering for headphones or channel-based rendering for speaker arrays.

The role of OSBA in the network is as an application-layer media format standard. It sits atop the core network's data transport capabilities, enabling service providers to offer next-generation audio experiences. It is integral to media services like extended reality (XR), interactive live events, and personalized audio for video. By separating the audio scene description (metadata) from the audio essence, OSBA enables advanced features like selective object enhancement, accessibility features (e.g., boosting commentary audio), and bandwidth efficiency, as objects can be added, removed, or substituted based on network conditions or user preferences without re-encoding the entire scene.

Purpose & Motivation

OSBA was created to address the limitations of traditional channel-based (e.g., 5.1, 7.1) and scene-based (e.g., Ambisonics) audio formats in delivering truly immersive and interactive audio experiences for emerging media. Channel-based audio is tied to a specific speaker layout and offers no interactivity, while first-order Ambisonics has limited spatial resolution. The rise of applications like virtual reality (VR), augmented reality (AR), and interactive 360-degree video demanded an audio format that could provide precise, dynamic spatial audio that reacts to user head movements and interactions.

The primary problem OSBA solves is how to efficiently stream complex, multi-object audio scenes over potentially constrained mobile networks while allowing for client-side personalization and adaptation. Previous approaches either required pre-mixing audio for a specific output (losing flexibility) or transmitted high-order Ambisonics fields (which can be bandwidth-inefficient and lack object-level control). OSBA's object-based approach allows the network to transmit a scene description and discrete audio elements, enabling the end-user's device to perform the final, personalized rendering. This is crucial for XR, where the audio must update in real-time based on the user's head orientation.

Therefore, the motivation for OSBA was to standardize an interoperable format for object-based immersive audio, ensuring content created by one provider can be rendered correctly on devices from different manufacturers. This standardization, part of the broader 3GPP media codec and delivery work, aims to catalyze the ecosystem for immersive media services over 5G and beyond, making personalized, cinematic-quality audio a viable service for mobile users.

Classification

Part ofDASH

Detected Changes Across Releases

from 3GPP Change Requests

Specific changes extracted from the „Change history“ tables of 3GPP specifications (2 CRs across 2 releases). Complements the general historical overview above with the evidence-based evolution of this function.

Rel-15 1 change

In Release 15, the OSBA (Objects (ISM) with Scene-Based Audio) function was newly introduced as a combined immersive audio format within the IVAS codec framework. It supports encoding and decoding of this combined object-based and scene-based audio format across a bitrate range from 13.2 kbps to 512 kbps. This addition expanded the codec's capabilities for immersive communication by allowing the simultaneous processing of both object and Ambisonics scene audio components.

  • Correction of sensitivity calculation for immersive audio playback TS 26.260CR002
Rel-18 1 change

In Release 18, the OSBA (Objects (ISM) with Scene-Based Audio) function was enhanced by the introduction of objective test methodologies for IVAS-based user equipment. This provided standardized procedures for evaluating the performance of devices capable of decoding and rendering the combined object-based and scene-based audio format. These methodologies ensure consistent quality assessment for OSBA, which supports bitrates from 13.2 to 512 kbps.

  • Objective Test Methodologies for IVAS-based UEs TS 26.260CR0006

Explore further

Broader topics and technologies where OSBA plays a role.

Defining Specifications

3GPP specifications that define or reference OSBA, with the latest known release. Sourced from the 3GPP document catalog — see methodology.

SpecificationTitleRelease
TS 26.253 vj00 IVAS Codec Algorithmic Description Rel-19
TS 26.255 vj00 IVAS Frame Loss Concealment Procedure Rel-19
TS 26.260 vj00 Immersive Audio Objective Test Methods Rel-19
TS 26.261 vj00 Electro-acoustic specs for immersive terminals Rel-19