Description
Temporal Motion Vector Prediction (TMVP) is a core prediction tool in modern video compression standards standardized by 3GPP, notably within the Enhanced Voice Services (EVS) codec and for video services. It operates on the principle of temporal redundancy—the idea that objects move in predictable ways across consecutive frames of a video sequence. Instead of encoding the motion vector (which describes the displacement of a block of pixels from one frame to the next) from scratch for every block in every frame, TMVP allows a coder to predict this vector based on motion information already encoded for a spatially corresponding block in a previously decoded reference picture, typically a collocated picture in a different time instance.
The technical process works as follows: For a current block being encoded (the 'current block'), the encoder and decoder identify a 'collocated block' in a pre-defined reference picture, which is often a picture from a different layer in a hierarchical prediction structure (e.g., a picture from a previous GOP). The motion vector(s) associated with this collocated block are fetched. These stored vectors are then scaled based on the temporal distance between the current picture and its reference picture versus the distance between the collocated picture and its reference picture. This scaled motion vector becomes a candidate in a list of potential motion vector predictors (MVP). The encoder selects the best predictor from this list and only needs to encode a small difference (the motion vector difference, MVD) if the prediction is not perfect, saving a significant number of bits.
Within the codec architecture, TMVP is a key component of the inter-prediction module. It works alongside spatial motion vector prediction (which uses vectors from neighboring blocks in the same picture) to form a robust set of predictor candidates. The integration of TMVP is particularly effective in video sequences with consistent, linear motion, such as panning shots or scrolling text, where the motion of an object changes little over time. By leveraging this temporal correlation, TMVP significantly reduces the bitrate required to represent motion information, which is a major component of the overall bitstream. This freed-up bit budget can then be reallocated to improve residual coding (texture detail) or to lower the overall file size, directly contributing to higher compression efficiency, which is paramount for mobile video streaming and video telephony services over bandwidth-constrained cellular networks.
Purpose & Motivation
TMVP was developed to address the fundamental challenge of video compression: reducing bitrate while maintaining perceptual quality. Early video codecs like H.263 and MPEG-4 Part 2 relied heavily on simpler forms of motion compensation. As resolution and frame rate demands increased, the bit cost of transmitting motion vectors became a significant overhead. The creation of TMVP was motivated by the observation that motion is often persistent over time; an object moving in one direction in frame N is highly likely to continue moving in a similar direction in frame N+1.
This technique solves the problem of inefficient motion vector coding by exploiting temporal redundancy directly, a dimension not fully utilized by earlier spatial-only prediction methods. Prior to its adoption, encoders had to code motion vectors more explicitly, or rely solely on spatial neighbors, which could fail for objects newly entering a frame or for scenes with complex motion. TMVP provides a powerful, long-range prediction mechanism that is especially valuable in low-delay and random-access coding configurations used in real-time communication (like video calls) and adaptive streaming. Its inclusion in standards like HEVC (from which 3GPP's video codec profiles are derived) was a key factor in achieving the roughly 50% bitrate reduction compared to its predecessor, H.264/AVC, for the same video quality, enabling high-definition and ultra-high-definition video services on mobile networks.
Key Features
- Predicts motion vectors using data from a temporally collocated block
- Uses motion vector scaling based on picture order count (POC) distances
- Generates candidates for the motion vector predictor (MVP) list
- Highly effective for sequences with persistent, linear motion
- Reduces bits required for motion vector difference (MVD) encoding
- Integrated into inter-prediction of modern codecs (HEVC, VVC)
Evolution Across Releases
Defining Specifications
| Specification | Title |
|---|---|
| TS 26.948 | 3GPP TS 26.948 |