Description
Spherical Harmonics (SH) is a set of orthogonal basis functions defined on the surface of a sphere. In 3GPP, specifically within the context of immersive media services (e.g., 360-degree video, virtual reality, and augmented reality), SH is utilized as a core mathematical tool for representing spatial audio. Spatial audio aims to recreate a three-dimensional sound field, allowing a listener to perceive sounds coming from specific directions and distances, which is crucial for an immersive experience. The sound pressure at a point in space can be decomposed into an infinite series of Spherical Harmonic functions. In practice, this infinite series is truncated to a finite order (e.g., 1st, 3rd, or 5th order), which provides an approximation of the sound field with a manageable amount of data.
The process begins with capturing or synthesizing a sound field using microphone arrays or audio object definitions. This sound field data is then encoded into Spherical Harmonic coefficients. These coefficients are essentially weights assigned to each basis function. A higher order of SH representation captures more detailed spatial information (higher angular resolution) but requires more coefficients and thus more bandwidth. For example, a 1st-order Ambisonics (a format based on SH) uses 4 channels (W, X, Y, Z), while a 3rd-order uses 16 channels. The 3GPP specifications, such as those in TS 26.253 for immersive audio, define how these SH-based audio streams are multiplexed, transported, and synchronized with video within MPEG media containers like ISOBMFF.
At the receiver (e.g., a VR headset or advanced media player), the SH coefficients are decoded. The decoder uses the coefficients to reconstruct an approximation of the original sound field. This reconstructed field can then be rendered for playback on the listener's specific audio output setup, whether it's headphones (using binaural rendering), a stereo speaker system, or a full surround sound array. The renderer applies head-related transfer functions (HRTFs) for headphone playback to simulate the directionality of sounds. The use of SH provides a format-agnostic intermediate representation; the audio is stored and transmitted in the SH domain, and the final rendering is tailored to the playback environment, offering great flexibility and efficiency for immersive media delivery over mobile networks.
Purpose & Motivation
Spherical Harmonics was adopted in 3GPP to solve the challenge of efficiently delivering immersive spatial audio for services like 360-degree video and virtual reality over bandwidth-constrained mobile networks. Traditional channel-based audio (e.g., 5.1 or 7.1 surround) is tied to a specific speaker layout and does not adapt well to different playback environments, especially headphone-based VR. Object-based audio, where each sound is an object with metadata, can be flexible but becomes computationally intensive and bandwidth-heavy for complex scenes with many sounds.
SH-based audio, often realized in the form of Ambisonics, provides a middle ground. It represents the entire sound field as a set of coefficients, which is more efficient than transmitting dozens of individual audio objects. This representation is independent of the listener's orientation during encoding, making it ideal for 360-degree content where the user can look around. The primary motivation for its inclusion in 3GPP standards (starting in Release 18) was to create a standardized, interoperable, and network-efficient method for spatial audio that is a core component of a high-quality immersive media experience, ensuring all users receive a consistent and compelling audio-visual experience regardless of their device.
Key Features
- Efficient mathematical representation of 3D sound fields
- Format-agnostic intermediate representation for spatial audio
- Supports variable order for trade-off between quality and bitrate
- Decoupling of capture/transmission from playback rendering
- Native support for 360-degree content and listener rotation
- Enables binaural rendering for headphone-based VR/AR
Evolution Across Releases
Spherical Harmonics was formally introduced in 3GPP Release 18 as part of the enhanced Immersive Media standards. The initial architecture defined the use of SH for encoding spatial audio components within immersive media services, specifying the bitstream format, multiplexing with video in ISOBMFF, and synchronization mechanisms. It established the baseline for interoperable delivery of 360-degree audio with 360-degree video.
Release 19 builds upon the SH foundation with enhancements for more complex immersive experiences. This includes support for higher-order Ambisonics for improved spatial audio quality, advanced compression techniques to reduce bandwidth further, and tighter integration with volumetric video and 6 Degrees of Freedom (6DoF) media formats for augmented and virtual reality applications.
Defining Specifications
| Specification | Title |
|---|---|
| TS 26.253 | 3GPP TS 26.253 |