FOA (First Order Ambisonics) — 3GPP Glossary

A spatial audio format standardized by 3GPP for immersive media services like VR and 360° video. It represents a sound field using four audio channels (W, X, Y, Z), enabling a full-sphere, rotationally invariant listening experience. This is key for realistic audio in augmented and virtual reality applications.

Description

First Order Ambisonics (FOA) is a method for capturing, processing, and reproducing three-dimensional spatial sound. Within 3GPP, it has been standardized as a core audio format for immersive media services, detailed across specifications like TS 26.118 (VR profiles) and TS 26.260 (audio codec specifics). FOA represents a sound field at a point in space using a set of spherical harmonic components of the first order. Practically, this results in a four-channel audio signal: one omnidirectional (W) channel and three figure-of-eight directional channels (X, Y, Z) aligned with the Cartesian axes.

Technically, the W channel captures the overall pressure (mono audio), while the X, Y, and Z channels capture the pressure gradients along the respective axes, encoding the directionality of sound sources. During playback, these four channels are decoded based on the listener's head orientation (provided by head-tracking data) and rendered for headphones or a speaker array, creating the illusion of sounds coming from specific directions in 3D space. A key property of Ambisonics is its rotational invariance; rotating the sound field mathematically is straightforward, which is essential for VR where the user's head constantly moves.

In the 3GPP architecture for immersive media, FOA audio streams are typically packaged within dynamic adaptive streaming over HTTP (DASH) segments, synchronized with 360° video. The Media Presentation Description (MPD) includes metadata describing the audio as FOA. The client's media player, often part of an Extended Reality (XR) application, receives head orientation data from sensors, decodes the FOA B-format signals, and performs binaural rendering for headphone output, creating a convincing 3D audio scene that matches the visual viewpoint. This integration is crucial for maintaining audio-visual coherence and enhancing the sense of presence in virtual environments.

Purpose & Motivation

FOA was standardized in 3GPP to address the critical need for immersive, spatial audio in emerging media services like virtual reality (VR), augmented reality (AR), and 360° video. Traditional stereo or surround sound formats (e.g., 5.1) are viewpoint-locked to the content creator's perspective and do not adapt to user head movement, breaking immersion in interactive VR experiences. The primary problem was the lack of a standardized, efficient, and adaptable format for 3D audio in telecommunications.

The motivation for its inclusion, starting in Release 14, was driven by the industry's push towards immersive media. 3GPP's work on VR profiles and media codecs identified spatial audio as a fundamental component. FOA was chosen because it provides a good balance between audio quality, computational complexity, and bitrate efficiency compared to higher-order Ambisonics (HOA) or object-based audio. It solves the limitation of channel-based audio by providing a full-sphere representation that is independent of the playback system's speaker configuration and can be dynamically rotated.

Historically, proprietary or research-oriented formats existed, but a universal standard was needed for interoperability across content creation tools, streaming services, and playback devices. 3GPP's standardization of FOA enabled mass-market deployment of VR services with compelling audio, ensuring that sound sources remain fixed in the virtual world as the user looks around, which is essential for realism and user comfort in XR applications.

Key Features

Encodes a full 3D sound field using four audio channels (W, X, Y, Z B-format)
Provides rotational invariance, allowing real-time rotation of the sound field based on head tracking
Enables efficient binaural rendering for headphone playback, creating personalized 3D audio
Standardized for interoperability in 3GPP immersive media streaming (e.g., DASH-IF IOP)
Supports scalable bitrates and can be compressed with codecs like EVS or MPEG-H 3D Audio
Forms the foundation for more complex spatial audio formats like Higher Order Ambisonics (HOA)

Evolution Across Releases

Rel-14 Initial

First introduced First Order Ambisonics (FOA) within 3GPP standards, primarily in TS 26.118 for VR media profiles. It defined FOA as a supported audio format for immersive services, establishing the basic four-channel B-format representation and its application in 360° video and VR streaming scenarios.

TS 26.118 TS 26.253 TS 26.260 TS 26.818 TS 26.918 TS 26.933 TS 26.997

Defining Specifications

Specification	Title
TS 26.118	3GPP TS 26.118
TS 26.253	3GPP TS 26.253
TS 26.260	3GPP TS 26.260
TS 26.818	3GPP TS 26.818
TS 26.918	3GPP TS 26.918
TS 26.933	3GPP TS 26.933
TS 26.997	3GPP TS 26.997