Description
MPEG-H Audio Stream (MHAS) is a normative bitstream format and transport mechanism defined by 3GPP for delivering MPEG-H 3D Audio content. MPEG-H Audio is an advanced audio coding system that supports channel-based, object-based, and scene-based (Higher Order Ambisonics) audio representations in a single bitstream. The MHAS format is the container used to packetize and transport these audio components. Technically, an MHAS stream is composed of a sequence of MHAS packets. Each packet contains a header with synchronization, packet type, and length information, followed by a payload. The payload can carry different types of audio data units, such as MPEG-H Audio configuration data, MPEG-H Audio frames, or other auxiliary data. A key operational aspect is its layering and multiplexing capability. Multiple audio substreams (e.g., a main audio program, descriptive audio, or individual audio objects) can be multiplexed into a single MHAS stream. This allows a receiver to decode only the necessary components, enabling features like personalized audio where a user can boost commentary volume or select a preferred language track. The MHAS stream is typically carried within a higher-level transport container, such as an MPEG-2 Transport Stream (TS) for broadcast or the ISO Base Media File Format (ISOBMFF) for streaming. In a 5G Media Streaming (5GMS) context, the MHAS stream would be packaged into DASH segments or HLS chunks. The decoding process involves demultiplexing the MHAS stream, extracting the relevant MPEG-H Audio frames, and decoding them using an MPEG-H Audio decoder. The decoder then renders the audio based on the received metadata and the capabilities of the playback system, which could range from stereo headphones to a full 22.2 channel home theater system. MHAS provides a flexible, future-proof audio format that can adapt the audio presentation in real-time to the listener's environment and preferences.
Purpose & Motivation
MHAS was introduced to solve the limitations of traditional audio codecs in the face of evolving consumer expectations and new multimedia formats. Prior audio standards like Advanced Audio Coding (AAC) were primarily designed for channel-based stereo or surround sound, offering a fixed mix. The rise of Ultra High Definition (UHD) video, Virtual Reality (VR), and interactive media demanded audio that was equally immersive, adaptable, and interactive. MPEG-H Audio, and by extension MHAS, was created to provide a unified solution for next-generation audio services. It addresses the problem of delivering a single audio bitstream that can be optimally rendered on a vast array of playback devices, from smartphones to sophisticated home theaters, without requiring multiple parallel audio tracks. It also enables broadcaster and service provider innovation through features like personalized dialogue enhancement, accessible audio descriptions, and interactive audio objects that a user can control. The integration of MHAS into 3GPP standards, starting in Release 15, was motivated by the industry's move towards 5G-enabled enhanced Mobile Broadband (eMBB) and media services. 5G's high bandwidth and low latency are ideal for delivering rich, immersive media experiences, and MHAS provides the standardized audio component to complete the next-generation media stack alongside video standards like HEVC and VVC.
Key Features
- Supports multiplexing of channel, object, and Higher Order Ambisonics (HOA) audio components in a single stream
- Enables object-based audio for interactive user experiences (e.g., boosting commentator volume)
- Provides dynamic rendering metadata to adapt audio output to specific playback systems (from mono to 22.2 channels)
- Defines a packetized stream format (MHAS) for robust transport over broadcast or packet-switched networks
- Facilitates personalized audio services and accessibility features like audio description
- Standardized integration with media delivery formats like MPEG-2 TS and DASH/ISOBMFF for streaming
Evolution Across Releases
Initially introduced MHAS into the 3GPP ecosystem within TS 26.118, defining the MHAS packet structure and its carriage in MPEG-2 Transport Streams. This enabled the use of MPEG-H 3D Audio for media delivery over 5G networks and next-generation broadcast systems like ATSC 3.0.
Defining Specifications
| Specification | Title |
|---|---|
| TS 26.118 | 3GPP TS 26.118 |