Description
Reference Picture Selection (RPS) is a sophisticated error resilience tool defined within video coding standards such as H.264/Advanced Video Coding (AVC) and H.265/High Efficiency Video Coding (HEVC), which are extensively profiled and referenced in 3GPP specifications for multimedia telephony and streaming services. Its primary function is to mitigate the impact of packet loss or corruption on decoded video quality in real-time communication. In standard predictive video coding, frames (P-frames and B-frames) are encoded relative to one or more previously encoded and decoded 'reference' pictures. Typically, the most recent decoded picture is used. However, if that reference picture is lost or corrupted, the error propagates to all subsequent frames that depend on it, causing severe and prolonged visual artifacts.
RPS works by providing the encoder and decoder with a managed list of multiple reference pictures stored in a 'Decoded Picture Buffer' (DPB). The encoder can intelligently choose which picture in this buffer to use as a reference for encoding the current frame. Crucially, through in-band signaling (in the slice header), the encoder instructs the decoder on which specific picture(s) to use for decoding the current frame and which pictures to keep or remove from the DPB. If the network indicates a loss (e.g., via Negative Acknowledgment - NACK in RTP/RTCP), the encoder can react by selecting a reference picture that is known to be correctly received at the decoder, effectively 'stepping back' to an older, intact frame. This breaks the chain of temporal error propagation.
Architecturally, RPS involves coordination between the application layer (video encoder/decoder) and the underlying transport. In a 3GPP context, this is often used in Multimedia Telephony Service for IMS (MTSI) and Packet-switched Streaming Service (PSS). The key components are the video codec's reference picture management logic, the DPB, and the signaling mechanism to convey the reference picture set. The 3GPP specifications (e.g., 26.906 for video codec performance) define the profiles and levels of the video codecs that include RPS capabilities, ensuring interoperability between devices. Its role is to enhance the robustness of video services over the inherently variable and sometimes unreliable radio access network, improving the user experience during periods of packet loss without requiring a full intra-frame refresh, which is bandwidth-intensive.
Purpose & Motivation
RPS was developed to address a critical weakness in predictive video coding for real-time communication over packet-switched networks like the internet and mobile networks. The core problem is temporal error propagation: a single lost packet containing a reference picture (or part of it) can corrupt many seconds of video, as each new frame incorrectly decodes based on a corrupted reference. This was unacceptable for conversational services like video calls.
Historical approaches to error resilience included sending frequent Intra-frames (I-frames), which are independent but consume significantly more bandwidth, reducing the overall video quality for a given bitrate. Other methods like Forward Error Correction (FEC) add overhead. RPS provides a more efficient, feedback-based solution. It was motivated by the growth of mobile video telephony and streaming in 3G and 4G networks, where radio conditions can change rapidly. By allowing the encoder to switch to a known-good reference picture upon receiving a loss report (via RTCP NACK or similar), RPS localizes the impact of the loss. The decoder experiences a momentary glitch or a slight reduction in coding efficiency (as the older reference may be less correlated), but avoids a catastrophic, prolonged breakdown of the video. This makes it a key enabler for reliable, high-quality video services in 3GPP systems, balancing bandwidth efficiency with robustness.
Key Features
- Manages a list of multiple decoded pictures for potential use as prediction references.
- Allows the encoder to select a reference picture other than the most recent one for encoding.
- Signals the selected reference picture set explicitly in the video bitstream slice headers.
- Enables rapid recovery from packet loss by switching to an older, error-free reference picture.
- Reduces temporal error propagation without the high overhead of frequent intra-frames.
- Requires feedback from decoder to encoder (e.g., via RTCP NACK) for optimal operation.
Evolution Across Releases
Formally specified as a required error resilience feature for advanced video codecs within 3GPP services. 3GPP TS 26.906 (Codec performance for Enhanced Voice Services) and related specs mandated support for H.264/AVC with RPS capabilities for robust video telephony, particularly for the Enhanced Voice Services (EVS) and video conversational services.
Defining Specifications
| Specification | Title |
|---|---|
| TS 26.906 | 3GPP TS 26.906 |
| TS 26.922 | 3GPP TS 26.922 |
| TS 26.948 | 3GPP TS 26.948 |