Description
A Group Of Pictures (GOP) is a key high-level syntactic structure in motion video compression standards such as MPEG-2, MPEG-4, H.264/AVC, and H.265/HEVC. It represents a sequence of consecutive pictures (frames) that begins with an independently decodable Intra-coded picture (I-frame) and is followed by an arrangement of Predictively coded pictures (P-frames) and Bidirectionally predictively coded pictures (B-frames). The GOP structure defines the pattern and distance between these different frame types, which is critical for the codec's performance characteristics.
An I-frame is encoded using only spatial redundancy within the single frame, similar to a JPEG image, making it large in size but essential for random access and error recovery. P-frames are encoded by predicting motion from the previous I- or P-frame, storing only the differences (residual) and motion vectors, offering good compression. B-frames use both past and future reference frames for prediction, achieving the highest compression but introducing encoding/decoding delay and complexity. A GOP is defined by two parameters: N (the GOP length, or the distance between I-frames) and M (the distance between reference frames, e.g., the interval between I/P frames). A common structure is IBBPBBP... (N=12, M=3).
Within 3GPP specifications, the GOP concept is central to defining video codec profiles and levels for streaming (e.g., 3GPP PSS), multimedia broadcast/multicast (MBMS), and packet-switched conversational services. Specifications like 26.904 (Transparent end-to-end packet-switched streaming service codec specification) detail the allowed GOP structures for MPEG-4 Visual and H.264 codecs to ensure interoperability. The choice of GOP structure directly impacts the bitrate, latency, error resilience, and channel switching time for video services. For example, a long GOP (large N) increases compression efficiency but makes the stream more vulnerable to errors and increases the time to switch channels (as the decoder must wait for an I-frame). 3GPP specs often constrain these parameters to suit the limitations and use cases of mobile networks.
Purpose & Motivation
The purpose of the GOP structure in video coding is to achieve a optimal trade-off between three competing goals: high compression efficiency, support for random access (e.g., channel switching, seeking), and resilience to data loss. Without inter-frame prediction (using only I-frames), compression would be very poor, making video impractical for bandwidth-constrained mobile networks. However, using only inter-frame prediction (e.g., all P-frames) would create a long prediction chain, making the stream extremely fragile to errors and impossible to join mid-stream.
The GOP solves this by periodically inserting a large, self-contained I-frame that resets the prediction chain. This allows decoders to start decoding at any GOP boundary, enabling features like fast-forward, rewind, and channel switching in broadcast services. The P- and B-frames between I-frames provide the high compression. In the context of 3GPP's standardization of mobile video services, defining allowed GOP structures was essential to ensure that video content encoded by one entity could be reliably decoded by any compliant UE.
Historically, early mobile video services like 3GPP Packet-Switched Streaming (PSS) faced severe bandwidth limitations. The selection of codecs like MPEG-4 Visual and H.264 Baseline Profile with specific, constrained GOP parameters (e.g., short GOP lengths) was motivated by the need to minimize decoding complexity and memory usage on early mobile handsets while providing acceptable quality and error resilience over lossy radio bearers. As device capabilities and network bandwidth improved, later 3GPP releases supported more advanced codecs with flexible GOP structures, enabling High Definition streaming and adaptive bitrate streaming (e.g., DASH), where multiple representations of the same content with different GOP structures might be offered.
Key Features
- Defines the periodic pattern of I, P, and B frames within a video sequence
- Key parameters: N (GOP length/distance between I-frames) and M (distance between reference frames)
- I-frame provides random access points and error resilience by resetting prediction dependencies
- P- and B-frames enable high compression efficiency through temporal prediction
- GOP structure directly impacts bitrate, latency, error propagation, and channel switching delay
- Constrained by 3GPP codec specifications (e.g., for H.264 Baseline Profile) to ensure UE interoperability and performance
Evolution Across Releases
Introduced detailed specifications for advanced video codecs like H.264/AVC within 3GPP services such as MBMS and PSS. Defined constraints on GOP structures (e.g., use of closed GOPs, maximum N size) for the H.264 Baseline and Main profiles to ensure reliable decoding on mobile devices and efficient use of broadcast channels.
Enhanced support for Evolved MBMS (eMBMS) and streaming, including specifications for HEVC (H.265) codec trials. GOP structure considerations were extended to these more efficient codecs, balancing the higher compression of HEVC with the need for timely random access in broadcast scenarios.
Further evolution with 5G broadcast and streaming, including support for Very High Bitrate media. Specifications continued to reference GOP as a fundamental video parameter, ensuring compatibility with industry-standard encoding practices while leveraging 5G's high throughput and low latency for improved video service quality.
Defining Specifications
| Specification | Title |
|---|---|
| TS 26.904 | 3GPP TS 26.904 |
| TS 26.906 | 3GPP TS 26.906 |
| TS 26.948 | 3GPP TS 26.948 |
| TS 26.949 | 3GPP TS 26.949 |
| TS 26.962 | 3GPP TS 26.962 |
| TS 26.999 | 3GPP TS 26.999 |