Description
In the context of 3GPP specifications (primarily TS 26.114 for IMS media handling), a Video Object Plane (VOP) is a core concept borrowed from the MPEG-4 Part 2 Visual standard. It represents a fundamental coding structure. Unlike traditional frame-based video (e.g., MPEG-2, H.264/AVC I/P/B frames), MPEG-4 Visual introduced object-based coding, where a scene is composed of separate Video Objects (VOs). A VOP is a temporal instance—a 'snapshot'—of one such Video Object. It contains the shape, motion, and texture information for that object at a specific moment in time.
There are several types of VOPs, analogous to frame types in other codecs. An Intra-VOP (I-VOP) is coded independently using only its own spatial information, serving as an access point. A Predictive-VOP (P-VOP) is coded using motion-compensated prediction from a past I-VOP or P-VOP. A Bidirectionally predictive-VOP (B-VOP) uses prediction from both past and future reference VOPs for higher compression. A Sprite-VOP is a special type used for static background objects. The sequence of VOPs for a single Video Object forms a Video Object Layer (VOL).
The encoding process for a VOP involves multiple steps. First, shape coding defines the object's arbitrary-shaped boundary (using a binary alpha plane or grayscale alpha for transparency). For rectangular VOPs, this step is skipped. Motion estimation and compensation are then performed on a macroblock basis within the object's shape to exploit temporal redundancy. Finally, texture coding (DCT-based, similar to other codecs) is applied to the motion-compensated residual error. The bitstream syntax organizes VOP data with headers indicating its type, time stamp, and coding parameters. Decoding reconstructs the VOP by interpreting this syntax, performing motion compensation, and adding the decoded texture residual.
Within 3GPP, VOPs are relevant because the MPEG-4 Visual codec was a mandated video codec for early 3GPP packet-switched streaming (PSS) and messaging services (MMS). TS 26.114 specifies media codec requirements for the IMS Multimedia Telephony service, and while later releases emphasize H.264/AVC and HEVC, understanding VOPs is part of the historical codec interoperability requirements. The object-based nature of VOPs was a key differentiator, theoretically enabling advanced functionalities like content-based manipulation, separate coding of foreground and background, and interactive scene composition, although these features saw limited commercial deployment in mobile networks compared to simpler frame-based codecs.
Purpose & Motivation
The Video Object Plane concept was created as part of the MPEG-4 standard's revolutionary approach to multimedia coding. Prior standards like MPEG-1 and MPEG-2 were frame-based, treating the entire rectangular video frame as the unit of compression. This was efficient for storage and broadcast but offered limited flexibility for interaction and manipulation of content. The primary purpose of object-based coding with VOPs was to move towards content-centric multimedia, where individual audiovisual objects (like a person, a car, a logo) could be encoded, transmitted, and manipulated independently.
This addressed several limitations. It enabled highly efficient coding for specific applications: for example, a static background could be sent once as a Sprite VOP, saving bandwidth. It allowed for scalability and reuse—objects could be stored in libraries and composed into different scenes. Most importantly, it opened the door for interactive applications where users could select, move, or otherwise interact with individual objects within a video scene, a feature relevant for augmented reality, interactive TV, and advanced gaming.
In the 3GPP context, MPEG-4 Visual (and thus VOPs) was standardized to provide a rich, flexible video codec for early multimedia services over 2.5G and 3G networks (GPRS, UMTS). It was part of the toolkit to enable compelling video applications like streaming, video telephony, and multimedia messaging. However, the computational complexity of shape-adaptive DCT and the object segmentation process, coupled with the rapid rise of more efficient and simpler frame-based codecs like H.264/AVC, meant that the advanced object-based features of VOPs were rarely used in practice. 3GPP eventually shifted its primary video codec mandates to H.264 and later HEVC, but support for MPEG-4 Simple Profile (which uses rectangular VOPs) remained for backward compatibility.
Classification
Detected Changes Across Releases
from 3GPP Change RequestsSpecific changes extracted from the „Change history“ tables of 3GPP specifications (2 CRs across 2 releases). Complements the general historical overview above with the evidence-based evolution of this function.
Studied in Rel-8, normative work from Rel-17.
In Release 17, the VOP (Video Object Plane) function was introduced to support the Immersive Teleconferencing and Telepresence for Remote Terminals (ITT4RT) feature. This specifically enables the transmission and reception of immersive video formats, such as 360-degree video and fisheye video, within multimedia telephony sessions. The release defines distinct ITT4RT client capabilities for sending (ITT4RT-Tx) and receiving (ITT4RT-Rx) this immersive video content.
- Video Support for ITT4RT TS 26.114CR0514
In Release 18, the new VOP (Video Object Plane) function introduced support for High Definition (HD) video calls. This enhancement was part of the broader objective to define procedures for inter-working between different clients and networks, ensuring the user experience of multimedia telephony is equivalent to or better than circuit-switched services. The update maintained backward compatibility while allowing the addition of this new media component and functionality.
- Supporting HD video calls TS 26.114CR0557
Explore further
Broader topics and technologies where VOP plays a role.
Defining Specifications
3GPP specifications that define or reference VOP, with the latest known release. Sourced from the 3GPP document catalog — see methodology.
| Specification | Title | Release |
|---|---|---|
| TS 26.114 vj10 | IMS Multimedia Telephony Media Handling | Rel-19 |