OBA (Object-Based Audio) — 3GPP Glossary

Object-Based Audio is a media service that represents audio scenes as a collection of individual audio objects with associated metadata, rather than traditional channel-based mixes. It enables immersive, personalized, and interactive audio experiences, such as 3D soundscapes for VR/AR or customizable audio mixes for accessibility.

Description

Object-Based Audio (OBA) is a paradigm shift in audio representation and delivery, standardized by 3GPP for media services. Unlike traditional channel-based audio (e.g., stereo or 5.1 surround), which encodes sound for fixed speaker positions, OBA decomposes an audio scene into discrete 'objects.' Each object is an audio signal (e.g., a dialogue track, a sound effect, or ambient music) accompanied by rich, time-variant metadata. This metadata describes the object's spatial position (coordinates in a 3D space), gain, and other perceptual attributes, allowing for dynamic rendering. The architecture involves a content creation stage where audio objects and metadata are authored, a delivery stage where they are efficiently encoded and transported (often using codecs like MPEG-H 3D Audio), and a client-side rendering stage. The renderer, based on the metadata and the capabilities of the playback device (from headphones to complex speaker arrays), synthesizes the final audio output in real-time. This decoupling of content from the presentation format is fundamental. In the network context, 3GPP specifications define how OBA services are delivered over mobile networks, including signaling, media formats, and quality of service considerations to ensure synchronized delivery of audio objects and their metadata for a seamless experience. Its role is to provide a future-proof audio foundation for immersive media, enabling features that are impossible with fixed-channel audio.

Purpose & Motivation

OBA was created to address the limitations of channel-based audio in the face of evolving media consumption. Traditional audio mixes are 'baked' for a specific speaker configuration, offering no flexibility for different listening environments (e.g., headphones vs. a soundbar), user preferences, or accessibility needs. The rise of virtual reality (VR), augmented reality (AR), and interactive media demanded audio that could adapt dynamically to user head movements and interactivity. OBA solves this by providing a flexible, scene-description-based approach. It allows for personalized audio, such as adjusting dialogue volume independently of background music, or enabling audio description tracks to be seamlessly integrated. From a network and service provider perspective, it also offers efficiency; a single OBA stream can be adapted to many output devices, reducing the need to store and transmit multiple channel-based versions. Its introduction in 3GPP Rel-14 was motivated by the industry's move towards immersive media standards and the need for telecom networks to support next-generation audio services as part of enhanced Multimedia Broadcast/Multicast Service (eMBMS) and streaming offerings.

Key Features

Decomposes audio into discrete objects with associated metadata
Enables dynamic, real-time rendering adapted to the playback device
Supports 3D spatial audio positioning for immersive experiences
Facilitates user interactivity and personalization (e.g., object gain control)
Provides a foundation for accessibility features like audio description
Efficiently delivers a single stream adaptable to multiple output formats

Evolution Across Releases

Rel-14 Initial

Introduced Object-Based Audio as a service within the Enhanced Television (EnTV) and media streaming framework. Initial specifications defined the core concepts, media formats (leveraging MPEG-H 3D Audio), and delivery procedures over LTE broadcast (eMBMS) and unicast, establishing the baseline architecture for OBA service provision.

TS 26.258 TS 26.918 TS 26.997

Defining Specifications

Specification	Title
TS 26.258	3GPP TS 26.258
TS 26.918	3GPP TS 26.918
TS 26.997	3GPP TS 26.997