Description
A Group Of Pictures (GOP) is a key high-level syntactic structure in motion video compression standards such as MPEG-2, MPEG-4, H.264/AVC, and H.265/HEVC. It represents a sequence of consecutive pictures (frames) that begins with an independently decodable Intra-coded picture (I-frame) and is followed by an arrangement of Predictively coded pictures (P-frames) and Bidirectionally predictively coded pictures (B-frames). The GOP structure defines the pattern and distance between these different frame types, which is critical for the codec's performance characteristics.
An I-frame is encoded using only spatial redundancy within the single frame, similar to a JPEG image, making it large in size but essential for random access and error recovery. P-frames are encoded by predicting motion from the previous I- or P-frame, storing only the differences (residual) and motion vectors, offering good compression. B-frames use both past and future reference frames for prediction, achieving the highest compression but introducing encoding/decoding delay and complexity. A GOP is defined by two parameters: N (the GOP length, or the distance between I-frames) and M (the distance between reference frames, e.g., the interval between I/P frames). A common structure is IBBPBBP... (N=12, M=3).
Within 3GPP specifications, the GOP concept is central to defining video codec profiles and levels for streaming (e.g., 3GPP PSS), multimedia broadcast/multicast (MBMS), and packet-switched conversational services. Specifications like 26.904 (Transparent end-to-end packet-switched streaming service codec specification) detail the allowed GOP structures for MPEG-4 Visual and H.264 codecs to ensure interoperability. The choice of GOP structure directly impacts the bitrate, latency, error resilience, and channel switching time for video services. For example, a long GOP (large N) increases compression efficiency but makes the stream more vulnerable to errors and increases the time to switch channels (as the decoder must wait for an I-frame). 3GPP specs often constrain these parameters to suit the limitations and use cases of mobile networks.
Purpose & Motivation
The purpose of the GOP structure in video coding is to achieve a optimal trade-off between three competing goals: high compression efficiency, support for random access (e.g., channel switching, seeking), and resilience to data loss. Without inter-frame prediction (using only I-frames), compression would be very poor, making video impractical for bandwidth-constrained mobile networks. However, using only inter-frame prediction (e.g., all P-frames) would create a long prediction chain, making the stream extremely fragile to errors and impossible to join mid-stream.
The GOP solves this by periodically inserting a large, self-contained I-frame that resets the prediction chain. This allows decoders to start decoding at any GOP boundary, enabling features like fast-forward, rewind, and channel switching in broadcast services. The P- and B-frames between I-frames provide the high compression. In the context of 3GPP's standardization of mobile video services, defining allowed GOP structures was essential to ensure that video content encoded by one entity could be reliably decoded by any compliant UE.
Historically, early mobile video services like 3GPP Packet-Switched Streaming (PSS) faced severe bandwidth limitations. The selection of codecs like MPEG-4 Visual and H.264 Baseline Profile with specific, constrained GOP parameters (e.g., short GOP lengths) was motivated by the need to minimize decoding complexity and memory usage on early mobile handsets while providing acceptable quality and error resilience over lossy radio bearers. As device capabilities and network bandwidth improved, later 3GPP releases supported more advanced codecs with flexible GOP structures, enabling High Definition streaming and adaptive bitrate streaming (e.g., DASH), where multiple representations of the same content with different GOP structures might be offered.
Classification
Evolution Across Releases
Introduced detailed specifications for advanced video codecs like H.264/AVC within 3GPP services such as MBMS and PSS. Defined constraints on GOP structures (e.g., use of closed GOPs, maximum N size) for the H.264 Baseline and Main profiles to ensure reliable decoding on mobile devices and efficient use of broadcast channels.
Enhanced support for Evolved MBMS (eMBMS) and streaming, including specifications for HEVC (H.265) codec trials. GOP structure considerations were extended to these more efficient codecs, balancing the higher compression of HEVC with the need for timely random access in broadcast scenarios.
Further evolution with 5G broadcast and streaming, including support for Very High Bitrate media. Specifications continued to reference GOP as a fundamental video parameter, ensuring compatibility with industry-standard encoding practices while leveraging 5G's high throughput and low latency for improved video service quality.
Explore further
Broader topics and technologies where GOP plays a role.
Defining Specifications
3GPP specifications that define or reference GOP, with the latest known release. Sourced from the 3GPP document catalog — see methodology.
| Specification | Title | Release |
|---|---|---|
| TR 26.904 vj00 | Future video capability requirements for streaming and MBMS | Rel-19 |
| TR 26.906 vj00 | HEVC Evaluation for 3GPP Services | Rel-19 |
| TR 26.948 vj00 | Video enhancements for 3GPP Multimedia Services | Rel-19 |
| TR 26.949 vj00 | TV Service Profiles for 3GPP Networks | Rel-19 |
| TR 26.962 vj00 | ITT4RT Operation and Usage Guidelines | Rel-19 |
| TR 26.999 vj00 | VR Streaming Interoperability Test Material | Rel-19 |