CVS

Coded Video Sequence

Services
Introduced in Rel-12
Coded Video Sequence (CVS) is a fundamental video coding structure defined by 3GPP for the carriage of video content over mobile networks. It represents a contiguous sequence of coded video pictures that can be independently decoded, forming a complete video presentation or a significant segment thereof. Its standardized definition ensures reliable video delivery and interoperability across devices and networks, which is critical for mobile video streaming and broadcast services.

Description

In the context of 3GPP specifications, a Coded Video Sequence (CVS) is a self-contained, decodable bitstream that constitutes a video program. Technically, it is a sequence of Network Abstraction Layer (NAL) units that starts with an Instantaneous Decoding Refresh (IDR) access unit or a Broken Link Access (BLA) unit and ends before the next IDR or BLA access unit, or at the end of the bitstream. This structure ensures random access points and clean switching between different video streams or segments, which is essential for adaptive bitrate streaming and channel switching in broadcast services like eMBMS (evolved Multimedia Broadcast Multicast Service).

The CVS architecture is integral to video codecs standardized by 3GPP, such as Advanced Video Coding (AVC/H.264) and High Efficiency Video Coding (HEVC/H.265). It encapsulates a sequence of coded pictures, including I-frames (intra-coded), P-frames (predicted), and B-frames (bi-predicted), organized according to a Group of Pictures (GOP) structure. The sequence begins with a key frame (IDR or BLA) that resets the decoder's state, allowing independent decoding without reference to prior frames. This is crucial for error resilience, as corruption in one CVS does not propagate to subsequent sequences.

Within the 3GPP system, CVS plays a key role in video service delivery protocols. For example, in Multimedia Broadcast Multicast Service (MBMS) and its evolved variant (eMBMS), video content is packetized and transmitted as a series of CVS units over the broadcast/multicast bearer. The 3GPP Packet-switched Streaming (PSS) service and Multimedia Telephony Service for IMS (MTSI) also utilize CVS for unicast video streaming and conversational services, respectively. The specifications (e.g., TS 26.346 for MBMS, TS 26.265 for codec conformance) define how CVS is mapped onto transport protocols like RTP/RTCP and file formats like 3GP, ensuring end-to-end compatibility from content encoding to playback on user equipment.

The role of CVS extends to quality of service (QoS) and network efficiency. By providing clear boundaries for video segments, it enables efficient bandwidth adaptation, error concealment, and seamless handovers during user mobility. Network elements, such as the Broadcast Multicast Service Center (BM-SC) in MBMS, can manipulate CVS units for service announcements, synchronization, and delivery. Furthermore, CVS is a fundamental unit for Digital Rights Management (DRM) and content encryption in 3GPP, allowing selective protection of video sequences without impacting the entire stream.

Purpose & Motivation

The Coded Video Sequence (CVS) was introduced to standardize the carriage and processing of video content within the 3GPP ecosystem, addressing the growing demand for mobile video services. Prior to its formal definition in Release 12, video delivery over cellular networks relied on various proprietary or loosely defined packetization methods, leading to interoperability issues, inefficient bandwidth usage, and poor user experience during network impairments or channel switching. The CVS provides a clear, codec-agnostic structure that ensures video streams can be reliably decoded, manipulated, and transmitted across heterogeneous networks and devices.

A primary motivation for standardizing CVS was to support efficient broadcast and multicast services, specifically MBMS/eMBMS, where a single video stream is delivered to multiple users simultaneously. Without a standardized sequence structure, managing random access, service continuity, and adaptive streaming in broadcast scenarios was challenging. CVS enables the network to insert synchronization markers, announce service boundaries, and apply forward error correction (FEC) at the sequence level, enhancing reliability and reducing latency for live streaming and file delivery.

Furthermore, CVS addresses the limitations of earlier video delivery mechanisms by providing a foundation for advanced features like Dynamic Adaptive Streaming over HTTP (DASH) in mobile environments. By aligning video sequences with DASH segments, CVS facilitates smooth bitrate switching and trick-play operations (e.g., fast-forward, rewind). Its definition also supports the evolution towards higher efficiency video codecs (HEVC) and immersive media (e.g., 360-degree video), ensuring backward compatibility and future-proofing the 3GPP video service architecture. In essence, CVS exists to unify video handling, solve fragmentation issues, and enable scalable, high-quality video delivery across all 3GPP services.

Key Features

  • Self-contained decodable unit starting with an IDR or BLA access unit
  • Enables random access and clean switching between video streams
  • Fundamental structure for adaptive bitrate streaming (e.g., DASH) in mobile networks
  • Essential for error resilience and prevention of error propagation across sequences
  • Supports broadcast/multicast services (MBMS/eMBMS) for efficient content delivery
  • Facilitates synchronization, encryption, and Digital Rights Management (DRM) at the sequence level

Evolution Across Releases

Rel-12 Initial

Introduced the standardized definition of Coded Video Sequence (CVS) within 3GPP specifications to support enhanced video services. It established CVS as a fundamental unit for video carriage, particularly for MBMS/eMBMS broadcast and PSS streaming, ensuring interoperability between encoders, networks, and devices. The initial architecture defined CVS boundaries using IDR/BLA access units, enabling independent decoding and forming the basis for adaptive streaming and error resilience mechanisms.

Defining Specifications

SpecificationTitle
TS 26.116 3GPP TS 26.116
TS 26.265 3GPP TS 26.265
TS 26.346 3GPP TS 26.346