MASA (Metadata-Assisted Spatial Audio) — 3GPP Glossary

A 3GPP codec enhancement for immersive audio experiences. It uses metadata to describe spatial audio scenes, enabling efficient compression and high-quality rendering on devices with varying speaker configurations. This is crucial for applications like extended reality (XR) and advanced media streaming.

Description

Metadata-Assisted Spatial Audio (MASA) is a standardized framework within 3GPP for representing, encoding, and rendering immersive audio. Unlike traditional channel-based or object-based audio, MASA employs a hybrid approach where the core audio is encoded using a traditional codec (like EVS or 3GPP-DRC), while a separate, compact metadata stream describes the spatial properties of the audio scene. This metadata includes parameters such as direction, distance, and size of sound sources, as well as environmental acoustic properties. The architecture is designed to be codec-agnostic, allowing the spatial metadata to be associated with various underlying audio bitstreams, providing flexibility for service providers. The key components include a metadata encoder, which analyzes the spatial audio scene, and a metadata decoder/renderer on the receiving device, which uses the metadata to reconstruct the immersive sound field appropriate for the listener's specific playback environment, be it headphones, stereo speakers, or a multi-channel home theater system. Its role in the network is as a service enabler, where the audio and metadata streams are packetized, transmitted over 5G networks (leveraging high bandwidth and low latency), and synchronized at the receiver to deliver a cohesive immersive experience. The specification details the syntax and semantics of the metadata, ensuring interoperability between content creation tools and consumer devices.

Purpose & Motivation

MASA was created to address the growing demand for immersive audio experiences, particularly driven by extended reality (XR), 360-degree video, and next-generation broadcasting. Traditional audio codecs were designed for fixed channel configurations (e.g., stereo 5.1) and struggle with the flexibility required for personalized, device-adaptive rendering. Object-based audio formats existed but could be inefficient for transmission over bandwidth-constrained mobile networks due to the high bitrate needed for numerous discrete audio objects. MASA solves this by decoupling the descriptive spatial metadata (which is very low bitrate) from the core audio payload. This allows for efficient network transmission while enabling the receiver to render an optimal sound field tailored to its specific output capabilities and the listener's orientation (in the case of head-tracking). Its creation was motivated by the need for a standardized, network-friendly immersive audio solution within the 3GPP ecosystem to complement advancements in video and XR services over 5G, ensuring a high-quality of experience without prohibitive bandwidth costs.

Key Features

Hybrid audio representation combining core coded audio with separate spatial metadata
Codec-agnostic design allowing use with EVS, 3GPP-DRC, and other audio codecs
Compact metadata syntax for efficient transmission over mobile networks
Device-adaptive rendering for headphones, stereo, and multi-speaker setups
Support for dynamic scene updates and listener head-tracking (6DoF)
Standardized bitstream format ensuring interoperability between content creation and playback devices

Evolution Across Releases

Rel-18 Initial

Introduced the initial MASA framework, defining the core architecture, metadata syntax, and encapsulation methods. It specified the base profiles for mono (MASA1) and stereo (MASA2) core audio, establishing the foundational capability for transmitting spatial audio scenes with associated descriptive metadata over 3GPP networks.

TS 26.250 TS 26.251 TS 26.253 TS 26.254 TS 26.255 TS 26.258 TS 26.260 TS 26.261 TS 26.865 TS 26.933 TS 26.996 TS 26.997

Defining Specifications

Specification	Title
TS 26.250	3GPP TS 26.250
TS 26.251	3GPP TS 26.251
TS 26.253	3GPP TS 26.253
TS 26.254	3GPP TS 26.254
TS 26.255	3GPP TS 26.255
TS 26.258	3GPP TS 26.258
TS 26.260	3GPP TS 26.260
TS 26.261	3GPP TS 26.261
TS 26.865	3GPP TS 26.865
TS 26.933	3GPP TS 26.933
TS 26.996	3GPP TS 26.996
TS 26.997	3GPP TS 26.997