VOP

Video Object Plane

Other
Introduced in Rel-8
A Video Object Plane (VOP) is a fundamental frame type in MPEG-4 Visual coding, representing a snapshot of a video object at a given time. It is the core unit for encoding, decoding, and manipulating individual video objects within a scene, enabling advanced content-based functionalities.

Description

In the context of 3GPP specifications (primarily TS 26.114 for IMS media handling), a Video Object Plane (VOP) is a core concept borrowed from the MPEG-4 Part 2 Visual standard. It represents a fundamental coding structure. Unlike traditional frame-based video (e.g., MPEG-2, H.264/AVC I/P/B frames), MPEG-4 Visual introduced object-based coding, where a scene is composed of separate Video Objects (VOs). A VOP is a temporal instance—a 'snapshot'—of one such Video Object. It contains the shape, motion, and texture information for that object at a specific moment in time.

There are several types of VOPs, analogous to frame types in other codecs. An Intra-VOP (I-VOP) is coded independently using only its own spatial information, serving as an access point. A Predictive-VOP (P-VOP) is coded using motion-compensated prediction from a past I-VOP or P-VOP. A Bidirectionally predictive-VOP (B-VOP) uses prediction from both past and future reference VOPs for higher compression. A Sprite-VOP is a special type used for static background objects. The sequence of VOPs for a single Video Object forms a Video Object Layer (VOL).

The encoding process for a VOP involves multiple steps. First, shape coding defines the object's arbitrary-shaped boundary (using a binary alpha plane or grayscale alpha for transparency). For rectangular VOPs, this step is skipped. Motion estimation and compensation are then performed on a macroblock basis within the object's shape to exploit temporal redundancy. Finally, texture coding (DCT-based, similar to other codecs) is applied to the motion-compensated residual error. The bitstream syntax organizes VOP data with headers indicating its type, time stamp, and coding parameters. Decoding reconstructs the VOP by interpreting this syntax, performing motion compensation, and adding the decoded texture residual.

Within 3GPP, VOPs are relevant because the MPEG-4 Visual codec was a mandated video codec for early 3GPP packet-switched streaming (PSS) and messaging services (MMS). TS 26.114 specifies media codec requirements for the IMS Multimedia Telephony service, and while later releases emphasize H.264/AVC and HEVC, understanding VOPs is part of the historical codec interoperability requirements. The object-based nature of VOPs was a key differentiator, theoretically enabling advanced functionalities like content-based manipulation, separate coding of foreground and background, and interactive scene composition, although these features saw limited commercial deployment in mobile networks compared to simpler frame-based codecs.

Purpose & Motivation

The Video Object Plane concept was created as part of the MPEG-4 standard's revolutionary approach to multimedia coding. Prior standards like MPEG-1 and MPEG-2 were frame-based, treating the entire rectangular video frame as the unit of compression. This was efficient for storage and broadcast but offered limited flexibility for interaction and manipulation of content. The primary purpose of object-based coding with VOPs was to move towards content-centric multimedia, where individual audiovisual objects (like a person, a car, a logo) could be encoded, transmitted, and manipulated independently.

This addressed several limitations. It enabled highly efficient coding for specific applications: for example, a static background could be sent once as a Sprite VOP, saving bandwidth. It allowed for scalability and reuse—objects could be stored in libraries and composed into different scenes. Most importantly, it opened the door for interactive applications where users could select, move, or otherwise interact with individual objects within a video scene, a feature relevant for augmented reality, interactive TV, and advanced gaming.

In the 3GPP context, MPEG-4 Visual (and thus VOPs) was standardized to provide a rich, flexible video codec for early multimedia services over 2.5G and 3G networks (GPRS, UMTS). It was part of the toolkit to enable compelling video applications like streaming, video telephony, and multimedia messaging. However, the computational complexity of shape-adaptive DCT and the object segmentation process, coupled with the rapid rise of more efficient and simpler frame-based codecs like H.264/AVC, meant that the advanced object-based features of VOPs were rarely used in practice. 3GPP eventually shifted its primary video codec mandates to H.264 and later HEVC, but support for MPEG-4 Simple Profile (which uses rectangular VOPs) remained for backward compatibility.

Key Features

  • Fundamental coding unit for a Video Object in MPEG-4 Visual
  • Types include I-VOP (intra), P-VOP (predictive), B-VOP (bidirectional)
  • Supports arbitrary shape coding via alpha planes, enabling non-rectangular objects
  • Enables object-based temporal prediction (motion compensation) and texture coding
  • Forms the basis for advanced functionalities like sprite coding for static backgrounds
  • Defined bitstream syntax for flexible composition and interaction of video objects

Evolution Across Releases

Rel-8 Initial

MPEG-4 Visual, including the VOP concept, was included as a supported video codec for IMS Multimedia Telephony and other services, primarily for backward compatibility with earlier 3GPP releases. Specifications referenced the MPEG-4 standard for the detailed definition of VOP coding syntax and decoding processes.

Continued support for MPEG-4 Visual as a mandatory or recommended codec for specific service profiles, ensuring interoperability with legacy devices. No major changes to the core VOP specification within 3GPP.

As H.264/AVC became the dominant codec for high-quality services, the role of MPEG-4 Visual (and VOPs) diminished to that of a baseline or fallback option. Specifications maintained the codec capability definition.

Further emphasis on advanced codecs like H.264 High Profile and the introduction of HEVC study items. MPEG-4 Visual support remained but was not a focus for new feature development.

Clarifications and maintenance updates for legacy codec support. The VOP structure remained unchanged as defined by the external MPEG-4 standard.

With the rise of VoLTE and RCS, primary video codec mandates solidified around H.264 and VP8. MPEG-4 Visual and its VOPs were part of the broad codec toolbox but not actively enhanced.

Support for next-generation codecs like HEVC and EVS for video and audio. The technical description of VOPs in relevant specs (e.g., 26.114) was maintained for completeness and interoperability testing.

In the 5G era, codec support for new media services was reviewed. While MPEG-4 Visual was not a primary 5G codec, its definition persisted in specifications covering media codec requirements for IMS-based services to ensure backward compatibility across generations.

Focus on immersive media and XR codecs. The VOP concept, as part of an older object-based coding paradigm, was not extended into these new areas, which use different coding tools and container formats.

Maintenance of existing codec specifications. No evolution specific to the MPEG-4 VOP technology.

Ongoing work on advanced media coding for AI-powered compression and neural network-based codecs. The traditional MPEG-4 VOP architecture is not part of this new direction.

Expected to continue the trend of maintaining legacy codec definitions for interoperability while focusing evolution efforts on state-of-the-art codecs like VVC, LCEVC, and AI-based coding methods.

Defining Specifications

SpecificationTitle
TS 26.114 3GPP TS 26.114