MHAS

MPEG-H Audio Stream

Services
Introduced in Rel-15
A standardized bitstream format for immersive and interactive audio, based on the MPEG-H 3D Audio standard. It enables advanced audio experiences like object-based audio, personalized mixes, and dynamic adaptation to different playback systems, enhancing multimedia broadcasting and streaming services.

Description

MPEG-H Audio Stream (MHAS) is a normative bitstream format and transport mechanism defined by 3GPP for delivering MPEG-H 3D Audio content. MPEG-H Audio is an advanced audio coding system that supports channel-based, object-based, and scene-based (Higher Order Ambisonics) audio representations in a single bitstream. The MHAS format is the container used to packetize and transport these audio components. Technically, an MHAS stream is composed of a sequence of MHAS packets. Each packet contains a header with synchronization, packet type, and length information, followed by a payload. The payload can carry different types of audio data units, such as MPEG-H Audio configuration data, MPEG-H Audio frames, or other auxiliary data. A key operational aspect is its layering and multiplexing capability. Multiple audio substreams (e.g., a main audio program, descriptive audio, or individual audio objects) can be multiplexed into a single MHAS stream. This allows a receiver to decode only the necessary components, enabling features like personalized audio where a user can boost commentary volume or select a preferred language track. The MHAS stream is typically carried within a higher-level transport container, such as an MPEG-2 Transport Stream (TS) for broadcast or the ISO Base Media File Format (ISOBMFF) for streaming. In a 5G Media Streaming (5GMS) context, the MHAS stream would be packaged into DASH segments or HLS chunks. The decoding process involves demultiplexing the MHAS stream, extracting the relevant MPEG-H Audio frames, and decoding them using an MPEG-H Audio decoder. The decoder then renders the audio based on the received metadata and the capabilities of the playback system, which could range from stereo headphones to a full 22.2 channel home theater system. MHAS provides a flexible, future-proof audio format that can adapt the audio presentation in real-time to the listener's environment and preferences.

Purpose & Motivation

MHAS was introduced to solve the limitations of traditional audio codecs in the face of evolving consumer expectations and new multimedia formats. Prior audio standards like Advanced Audio Coding (AAC) were primarily designed for channel-based stereo or surround sound, offering a fixed mix. The rise of Ultra High Definition (UHD) video, Virtual Reality (VR), and interactive media demanded audio that was equally immersive, adaptable, and interactive. MPEG-H Audio, and by extension MHAS, was created to provide a unified solution for next-generation audio services. It addresses the problem of delivering a single audio bitstream that can be optimally rendered on a vast array of playback devices, from smartphones to sophisticated home theaters, without requiring multiple parallel audio tracks. It also enables broadcaster and service provider innovation through features like personalized dialogue enhancement, accessible audio descriptions, and interactive audio objects that a user can control. The integration of MHAS into 3GPP standards, starting in Release 15, was motivated by the industry's move towards 5G-enabled enhanced Mobile Broadband (eMBB) and media services. 5G's high bandwidth and low latency are ideal for delivering rich, immersive media experiences, and MHAS provides the standardized audio component to complete the next-generation media stack alongside video standards like HEVC and VVC.

Key Features

  • Supports multiplexing of channel, object, and Higher Order Ambisonics (HOA) audio components in a single stream
  • Enables object-based audio for interactive user experiences (e.g., boosting commentator volume)
  • Provides dynamic rendering metadata to adapt audio output to specific playback systems (from mono to 22.2 channels)
  • Defines a packetized stream format (MHAS) for robust transport over broadcast or packet-switched networks
  • Facilitates personalized audio services and accessibility features like audio description
  • Standardized integration with media delivery formats like MPEG-2 TS and DASH/ISOBMFF for streaming

Evolution Across Releases

Rel-15 Initial

Initially introduced MHAS into the 3GPP ecosystem within TS 26.118, defining the MHAS packet structure and its carriage in MPEG-2 Transport Streams. This enabled the use of MPEG-H 3D Audio for media delivery over 5G networks and next-generation broadcast systems like ATSC 3.0.

Enhanced support for MHAS in streaming applications, specifying its carriage within the ISO Base Media File Format (ISOBMFF) for use with Dynamic Adaptive Streaming over HTTP (DASH). This solidified MHAS as a key audio format for 5G Media Streaming (5GMS) services.

Further refinements and profiling for immersive media experiences, aligning with work on Extended Reality (XR) in 3GPP. Ensured MHAS could efficiently support low-latency audio for interactive and VR applications delivered over 5G networks.

Defining Specifications

SpecificationTitle
TS 26.118 3GPP TS 26.118