Description
Higher Order Ambisonics (HOA) is a parametric representation of a sound field, designed to capture and reproduce immersive, three-dimensional audio. Unlike channel-based formats (e.g., 5.1 or 7.1 surround) which assign audio to specific speaker locations, Ambisonics represents the sound field as a set of spherical harmonic coefficients. These coefficients mathematically describe how sound pressure varies across all directions around a point in space. The 'order' (e.g., 1st, 2nd, 3rd) determines the spatial resolution and accuracy; higher orders provide more precise directional and spatial information, such as the width and elevation of sound sources.
Technically, HOA audio is created by encoding signals from a microphone array (or synthesized from audio objects) into Ambisonics B-format channels. Each channel corresponds to a specific spherical harmonic component (e.g., W for omnidirectional, X/Y/Z for first-order figure-of-eight patterns). For Nth order Ambisonics, there are (N+1)² channels. This encoded signal is a scene-based representation, meaning it is independent of any specific playback system. For rendering, the HOA stream is decoded based on the target speaker layout or binaurally for headphones, using a set of decoding coefficients that project the spherical harmonics onto the available output transducers.
Within the 3GPP ecosystem, HOA is integrated into media delivery standards, particularly for Virtual Reality (VR), Augmented Reality (AR), and immersive teleconferencing. Key specifications define the transport and storage of HOA content, such as its encapsulation in the ISO Base Media File Format (ISOBMFF) for Dynamic Adaptive Streaming over HTTP (DASH). Codecs like MPEG-H 3D Audio support HOA as an input format for compression and transmission. The network's role is to deliver these potentially high-bitrate, multi-channel audio streams efficiently, often in synchronization with 360-degree video, requiring robust QoS and media-aware network functions.
Purpose & Motivation
HOA was standardized in 3GPP to address the limitations of traditional audio formats for emerging immersive media applications. Channel-based surround sound is tied to a fixed, predefined speaker configuration and does not adequately support 360-degree listener rotation, which is essential for VR/AR. First-order Ambisonics (FOA) offers basic 3D audio but with limited spatial resolution and accuracy, often resulting in blurred or imprecise sound source localization. HOA was introduced to solve these problems, providing the high-fidelity, full-sphere audio necessary for convincing presence and realism in virtual environments.
The driving motivation was the commercial rise of VR and 360-degree video services, which demanded an audio format that could match the visual immersion. HOA enables sound to remain stable and accurately positioned relative to the visual scene as the user rotates their head, which is critical for maintaining the illusion of being 'inside' the content. From a network and service perspective, standardizing HOA ensures interoperability between content creation tools, compression codecs, streaming servers, and playback devices, preventing vendor lock-in and fostering a healthy ecosystem for immersive media.
Furthermore, HOA's scene-based nature is more efficient for interactive and adaptive streaming compared to transmitting multiple discrete object tracks. A single HOA stream can represent a complex auditory scene, and the rendering can be adapted client-side based on user interaction (head movement) or device capabilities (different headphone types), without requiring the server to re-encode or send multiple audio streams. This reduces server complexity and network load, making it a scalable solution for delivering personalized immersive audio over mobile networks.
Key Features
- Scene-based audio representation using spherical harmonic coefficients, independent of playback setup.
- Scalable spatial resolution defined by the Ambisonics order (e.g., 1st, 2nd, 3rd order).
- Supports full 360-degree sound field capture and reproduction, including elevation.
- Enables stable, head-tracked audio rendering for VR/AR applications.
- Standardized transport in ISOBMFF for adaptive streaming (DASH).
- Compression support via advanced audio codecs like MPEG-H 3D Audio and EVS.
Evolution Across Releases
Initially standardized for immersive media services. Specifications defined the core HOA format, its encapsulation for streaming, and support for first-order and higher-order content within VR and audio-on-demand applications, establishing the foundation for 3D audio delivery.
Defining Specifications
| Specification | Title |
|---|---|
| TS 26.118 | 3GPP TS 26.118 |
| TS 26.253 | 3GPP TS 26.253 |
| TS 26.805 | 3GPP TS 26.805 |
| TS 26.818 | 3GPP TS 26.818 |
| TS 26.865 | 3GPP TS 26.865 |
| TS 26.918 | 3GPP TS 26.918 |
| TS 26.933 | 3GPP TS 26.933 |
| TS 26.998 | 3GPP TS 26.998 |