Description
Virtual Studio Technology (VST) is a comprehensive framework defined by 3GPP to standardize the production, distribution, and consumption of advanced immersive media services over mobile networks. It encompasses the entire media chain from content creation to user presentation. At its core, VST deals with volumetric media—representations of scenes or objects as 3D models or point clouds—and associated metadata, enabling six degrees of freedom (6DoF) experiences where users can change their viewpoint interactively. The architecture involves several key components: capture systems (like multi-camera rigs), media processing functions for encoding and packaging, a media delivery network, and client-side rendering engines on devices such as XR headsets or smartphones.
Technically, VST specifies how volumetric data is formatted, compressed, and transported. It leverages existing and new media codecs, such as those for point cloud compression (PCC), and defines delivery protocols optimized for low latency and high reliability, which are critical for interactive and live experiences. The media is often packaged using standards like MPEG Media Transport (MMT) or Dynamic Adaptive Streaming over HTTP (DASH), with specific adaptations for timed metadata describing camera poses and scene geometry. A VST server orchestrates the session, managing the synchronization of multiple media streams (e.g., video, audio, haptics) and adapting the delivery based on network conditions and client capabilities.
Within the 5G system, VST utilizes network capabilities like edge computing (Multi-access Edge Computing, MEC) to offload intensive processing like real-time rendering or scene composition closer to the user, reducing end-to-end latency. Quality of Service (QoS) mechanisms ensure the high bandwidth and low delay required for immersive media. VST also defines APIs for service discovery, session management, and user interaction, enabling seamless integration with 5G core network functions. Its role is to provide an interoperable ecosystem for content creators, network operators, and device manufacturers to deploy immersive services at scale, leveraging the high throughput and low latency of 5G networks.
Purpose & Motivation
VST was created to address the lack of standardization for delivering next-generation immersive media, such as volumetric video and extended reality (XR), over telecommunications networks. Prior to its specification, proprietary solutions for capture, streaming, and rendering of such content led to fragmentation, hindering widespread adoption and interoperability. The rise of 5G, with its enhanced Mobile Broadband (eMBB) and Ultra-Reliable Low-Latency Communications (URLLC) capabilities, provided a catalyst, as it offered the necessary network performance but required standardized media handling to unlock new consumer and enterprise services.
The technology solves key problems in producing and distributing immersive content at scale. It enables efficient compression and streaming of large volumetric datasets, manages the complexity of multi-stream synchronization for 6DoF experiences, and defines how networks can optimize delivery through edge computing and network slicing. VST was motivated by industry demand from broadcasters, content creators, and XR application developers for a common framework to build upon, ensuring that immersive experiences can be reliably delivered to a mass market over 5G, transforming fields like live entertainment, remote collaboration, and interactive training.
Key Features
- Standardized capture, encoding, and streaming of volumetric media and point clouds
- Support for six degrees of freedom (6DoF) interactive viewing experiences
- Integration with 5G network capabilities including edge computing and network slicing
- Use of adaptive streaming (e.g., DASH) with timed metadata for synchronization
- APIs for service orchestration, session management, and user interaction
- Low-latency delivery protocols optimized for real-time immersive applications
Evolution Across Releases
Introduced Virtual Studio Technology as part of 5G media enhancements. Defined the initial framework and requirements for immersive media services in TS 26.118 and architectural aspects in TS 26.818. Established foundational concepts for volumetric media delivery over 5G, focusing on use cases and system requirements.
Defining Specifications
| Specification | Title |
|---|---|
| TS 26.118 | 3GPP TS 26.118 |
| TS 26.818 | 3GPP TS 26.818 |