MAE (MPEG-H Audio Metadata information) — 3GPP Glossary

MAE refers to the metadata information associated with MPEG-H Audio, an advanced audio codec system standardized for immersive and interactive audio experiences. This metadata enables features like object-based audio, dynamic adaptation to different playback systems, and personalized audio rendering. It is a key enabler for next-generation audio services in 5G broadcast and media delivery.

Description

MPEG-H Audio Metadata information (MAE) encompasses the structured data that describes and controls the rendering of MPEG-H Audio bitstreams. MPEG-H Audio (ISO/IEC 23008-3) is a state-of-the-art codec system supporting channel-based, object-based, and Higher Order Ambisonics (HOA) audio content. The MAE is not the audio signal itself but the accompanying information that defines how the audio should be interpreted and reproduced.

Architecturally, MAE is embedded within or associated with the MPEG-H Audio elementary stream. It includes several key components: Presentation Information, which defines how audio elements (objects, channels, HOA) are mixed for different output configurations (e.g., stereo, 5.1.4, headphones); Interaction Metadata, which allows for user control over audio objects (e.g., boosting commentary in a sports broadcast); and Dynamic Range Control (DRC) metadata for adapting loudness to different listening environments. The metadata is typically structured in a binary format defined by the MPEG-H Audio standard.

How it works involves a two-step process: delivery and rendering. The MAE is delivered alongside the compressed audio data to the playback device, such as a smartphone, TV, or set-top box. The device's MPEG-H Audio decoder and renderer then parse this metadata. Based on the Presentation Information, the renderer mixes the audio elements to suit the specific speaker setup of the device. Interaction Metadata, if present, allows the user interface to let the listener adjust mix parameters (e.g., dialogue enhancement), and the renderer dynamically adapts the audio output in real-time. This enables a single audio stream to provide an optimal experience on everything from a basic stereo system to an advanced home theater, and allows for user personalization.

Purpose & Motivation

MAE was developed to overcome the limitations of traditional, fixed-format audio codecs (like AC-3 or AAC) in the face of evolving consumer audio systems and demand for personalized experiences. Traditional codecs deliver a fixed mix optimized for one specific speaker layout, which often degrades when played on a different setup. The primary problem MPEG-H Audio with MAE solves is the delivery of future-proof, adaptable audio.

Its creation was motivated by the rise of object-based audio production in cinema and broadcast, and the need to efficiently transport this format to the home. MAE allows a single broadcast stream or media file to contain a complete audio scene description. This enables broadcasters to deliver immersive UHDTV services with audio that automatically adapts to the viewer's specific receiver capabilities, whether a soundbar or a full Dolby Atmos system. Furthermore, it empowers new interactive services, such as letting users select their preferred language track or adjust the mix balance between dialogue, effects, and music. This aligns perfectly with 5G's goals for enhanced mobile broadband and media services, providing a rich, customizable audio layer to complement ultra-high-definition video.

Key Features

Enables rendering of audio for arbitrary playback systems from a single bitstream.
Supports interactive user control over audio objects and mix parameters.
Contains Presentation Information for automatic adaptation to speaker layouts.
Includes Dynamic Range Control metadata for consistent loudness across environments.
Integral part of the MPEG-H Audio system for immersive (3D) audio experiences.
Standardized format ensuring interoperability between content creation and consumer devices.

Evolution Across Releases

Rel-15 Initial

Introduced MPEG-H Audio support for 5G Media Streaming and Enhanced Television services. Defined the initial framework for carrying and utilizing MPEG-H Audio Metadata (MAE) within 3GPP ecosystems, enabling adaptive and immersive audio experiences over mobile and broadcast networks.

TS 26.118 TS 28.908

Defining Specifications

Specification	Title
TS 26.118	3GPP TS 26.118
TS 28.908	3GPP TS 28.908