Description
The Real-time Transport Protocol (RTP) is an IETF-defined protocol (RFC 3550) that is extensively adopted and profiled within 3GPP standards for carrying real-time multimedia traffic. It is not a 3GPP-invented protocol but a crucial building block used in Packet-Switched Streaming (PSS), Multimedia Broadcast/Multicast Service (MBMS), and IP Multimedia Subsystem (IMS)-based services like Voice over LTE (VoLTE) and Video over LTE (ViLTE). RTP typically runs on top of UDP to provide timely delivery over IP networks. Its primary function is to provide payload type identification, sequence numbering, timestamping, and delivery monitoring.
An RTP packet consists of a header and a payload. The header includes critical fields: a sequence number to detect packet loss and reorder packets, a timestamp to enable correct playout timing and synchronization between media streams (e.g., audio and video), a synchronization source (SSRC) identifier to distinguish multiple sources in a session, and a payload type field to identify the codec format (e.g., AMR-WB, EVS, H.264, VP8). The payload contains the compressed media data generated by the codec. RTP itself does not guarantee QoS or timely delivery; it relies on lower-layer protocols and network QoS mechanisms (like QoS Class Identifiers in 5G) for that. Its companion protocol, the RTP Control Protocol (RTCP), provides out-of-band statistics and control information for the session.
Within the 3GPP architecture, RTP sessions are established and managed by the IMS core, specifically the Call Session Control Functions (CSCFs). During a VoLTE call setup, for example, the Session Description Protocol (SDP) within the SIP signaling negotiates the RTP parameters—IP addresses, ports, and codecs. The media path for the RTP stream then flows directly between the UEs (or through media gateways/anchors like the IMS Media Resource Function) over the LTE or 5G data bearer, which is configured with appropriate QoS to prioritize the real-time traffic. The Packet Data Convergence Protocol (PDCP) layer in the radio stack ensures secure and efficient delivery of these IP packets.
RTP's role is to provide a standardized, interoperable envelope for real-time media, enabling equipment from different vendors to exchange voice and video. Its timestamp mechanism is vital for managing jitter buffers at the receiver, which smooth out network delay variations. The sequence number allows the receiver to detect lost packets, which can be concealed using error concealment algorithms or reported for potential retransmission (if using RTP with redundancy). In evolved systems, RTP is used in conjunction with the RTP Control Protocol (RTCP) for feedback, and may be secured using the Secure Real-time Transport Protocol (SRTP) as specified in 3GPP for media plane security.
Purpose & Motivation
RTP exists to solve the fundamental problem of transporting time-sensitive audio and video data over best-effort IP networks, which were originally designed for non-real-time, reliable data transfer. Before the widespread adoption of RTP and VoIP, real-time communication relied on circuit-switched networks (like the traditional phone network), which reserved dedicated end-to-end paths guaranteeing constant delay and bandwidth but were inefficient for data. The rise of the internet and IP networking created a need for a packet-based method to handle interactive media, leading to the development of RTP.
RTP addresses the limitations of using raw UDP or TCP for media. UDP provides no sequencing or timing information, while TCP's reliability mechanisms (retransmissions, in-order delivery) introduce unacceptable and variable delay for real-time playout. RTP introduces just enough structure—sequence numbers and timestamps—to allow receivers to reconstruct timing and detect loss, without imposing a reliability mechanism that would harm latency. This enables adaptive jitter buffers and synchronization between multiple media streams (lip-sync).
In the 3GPP context, the adoption of RTP was driven by the move to all-IP networks, starting with 3G and fully realized in 4G LTE and 5G NR. For IMS-based services like VoLTE, a standard, widely supported media transport protocol was essential for interoperability between mobile handsets, network equipment, and fixed-line VoIP systems. 3GPP profiles and constrains the use of RTP (and related codecs) to ensure consistent service quality, efficient use of radio resources, and compatibility with network-based policy control, charging, and security (via SRTP). It is the linchpin that allows cellular networks to transition from circuit-switched voice to high-quality, feature-rich IP-based multimedia communication.
Key Features
- Payload type identification for dynamic codec negotiation
- Sequence numbering for packet loss detection and reordering
- Timestamping for synchronization and jitter buffer management
- Synchronization source (SSRC) identifiers for multi-source sessions
- Designed to work with companion RTCP for control and feedback
- Extensible header format for profile-specific extensions
Evolution Across Releases
Initial adoption of RTP within 3GPP for Packet-switched Streaming Service (PSS). Defined basic usage profiles for transporting audio and video over UMTS packet-switched bearers. Established RTP as the media transport for early multimedia services in 3G networks.
Enhanced support for RTP within the Multimedia Messaging Service (MMS) and continued evolution of PSS. Started work on IP Multimedia Subsystem (IMS) concepts, which would later heavily rely on RTP for media transport.
Formalized IMS as the core architecture for IP-based multimedia services. RTP became the mandated media transport protocol for IMS real-time sessions, including voice and video calls. Defined integration with SIP for session control and SDP for media negotiation.
Introduced support for the Secure Real-time Transport Protocol (SRTP) and RTP profile for Secure RTP (RFC 3711) to provide confidentiality, integrity, and authentication for media streams. Enhanced codec support and interworking with circuit-switched networks.
Further IMS enhancements and standardization of Voice over IMS (VoIMS). Refined RTP usage for emergency calls and location-based services. Work on Multimedia Broadcast/Multicast Service (MBMS) utilizing RTP for streaming delivery.
Critical for LTE/SAE. Defined Voice over LTE (VoLTE) based on IMS, with RTP as the core media protocol. Specified QoS mechanisms (QCI) to prioritize RTP traffic on the LTE bearer. Profiled specific codecs like AMR-WB for high-definition voice.
Enhanced VoLTE features (Single Radio Voice Call Continuity - SRVCC). Continued profiling of RTP and RTCP for IMS. Introduced support for Enhanced Voice Services (EVS) codec, transported via RTP.
IMS Centralized Services (ICS) and further Rich Communication Suite (RCS) enhancements utilizing RTP for video share and file transfer. Refinements to media handling and resource optimization.
Support for machine-type communication and early IoT considerations. Maintained and updated RTP profiles for ongoing IMS service evolution.
Introduced Wi-Fi calling (VoWiFi) via IMS, extending RTP-based VoLTE to untrusted non-3GPP access. Enhanced codec support and operational efficiency.
Enhanced Voice Services (EVS) codec standardization completed, with defined RTP payload formats. Work on WebRTC integration, which also uses RTP as its media transport.
Mission Critical Services (MCPTT, MCVideo, MCData) over LTE, utilizing RTP for mission-critical audio and video streams with specific reliability and priority requirements.
Foundation for 5G. Defined Voice over NR (VoNR) as the 5G native voice solution, inheriting the IMS and RTP framework from VoLTE. Supported new 5G QoS framework for RTP streams.
5G phase 2. Enhanced support for Industrial IoT and Ultra-Reliable Low Latency Communications (URLLC), impacting real-time media transport requirements. Integrated Access and Backhaul (IAB) considerations for RTP traffic.
Expanded 5G to reduced capability devices, non-terrestrial networks, and enhanced positioning. RTP usage adapted for new device categories and challenging network conditions (e.g., high latency in satellite links).
5G-Advanced studies. Focus on immersive media (Extended Reality - XR) which places new demands on RTP for low-latency, high-throughput, and synchronized multi-stream media delivery.
Future evolution expected to focus on AI/ML optimizations for media transport, further enhancements for XR, and integration with network sensing capabilities.
Defining Specifications
| Specification | Title |
|---|---|
| TS 21.905 | 3GPP TS 21.905 |
| TS 22.401 | 3GPP TS 22.401 |
| TS 22.827 | 3GPP TS 22.827 |
| TS 22.977 | 3GPP TS 22.977 |
| TS 23.107 | 3GPP TS 23.107 |
| TS 23.207 | 3GPP TS 23.207 |
| TS 23.231 | 3GPP TS 23.231 |
| TS 23.279 | 3GPP TS 23.279 |
| TS 23.333 | 3GPP TS 23.333 |
| TS 23.334 | 3GPP TS 23.334 |
| TS 23.701 | 3GPP TS 23.701 |
| TS 23.722 | 3GPP TS 23.722 |
| TS 23.979 | 3GPP TS 23.979 |
| TS 24.173 | 3GPP TS 24.173 |
| TS 24.229 | 3GPP TS 24.229 |
| TS 24.281 | 3GPP TS 24.281 |
| TS 24.282 | 3GPP TS 24.282 |
| TS 24.379 | 3GPP TS 24.379 |
| TS 24.380 | 3GPP TS 24.380 |
| TS 24.404 | 3GPP TS 24.404 |
| TS 24.504 | 3GPP TS 24.504 |
| TS 24.581 | 3GPP TS 24.581 |
| TS 24.604 | 3GPP TS 24.604 |
| TS 25.323 | 3GPP TS 25.323 |
| TS 25.410 | 3GPP TS 25.410 |
| TS 25.414 | 3GPP TS 25.414 |
| TS 25.415 | 3GPP TS 25.415 |
| TS 25.444 | 3GPP TS 25.444 |
| TS 25.993 | 3GPP TS 25.993 |
| TS 26.114 | 3GPP TS 26.114 |
| TS 26.142 | 3GPP TS 26.142 |
| TS 26.179 | 3GPP TS 26.179 |
| TS 26.223 | 3GPP TS 26.223 |
| TS 26.233 | 3GPP TS 26.233 |
| TS 26.234 | 3GPP TS 26.234 |
| TS 26.235 | 3GPP TS 26.235 |
| TS 26.236 | 3GPP TS 26.236 |
| TS 26.237 | 3GPP TS 26.237 |
| TS 26.244 | 3GPP TS 26.244 |
| TS 26.247 | 3GPP TS 26.247 |
| TS 26.254 | 3GPP TS 26.254 |
| TS 26.256 | 3GPP TS 26.256 |
| TS 26.281 | 3GPP TS 26.281 |
| TS 26.346 | 3GPP TS 26.346 |
| TS 26.348 | 3GPP TS 26.348 |
| TS 26.448 | 3GPP TS 26.448 |
| TS 26.453 | 3GPP TS 26.453 |
| TS 26.517 | 3GPP TS 26.517 |
| TS 26.802 | 3GPP TS 26.802 |
| TS 26.804 | 3GPP TS 26.804 |
| TS 26.806 | 3GPP TS 26.806 |
| TS 26.812 | 3GPP TS 26.812 |
| TS 26.847 | 3GPP TS 26.847 |
| TS 26.857 | 3GPP TS 26.857 |
| TS 26.880 | 3GPP TS 26.880 |
| TS 26.902 | 3GPP TS 26.902 |
| TS 26.905 | 3GPP TS 26.905 |
| TS 26.907 | 3GPP TS 26.907 |
| TS 26.914 | 3GPP TS 26.914 |
| TS 26.923 | 3GPP TS 26.923 |
| TS 26.926 | 3GPP TS 26.926 |
| TS 26.927 | 3GPP TS 26.927 |
| TS 26.928 | 3GPP TS 26.928 |
| TS 26.935 | 3GPP TS 26.935 |
| TS 26.936 | 3GPP TS 26.936 |
| TS 26.937 | 3GPP TS 26.937 |
| TS 26.946 | 3GPP TS 26.946 |
| TS 26.947 | 3GPP TS 26.947 |
| TS 26.955 | 3GPP TS 26.955 |
| TS 26.956 | 3GPP TS 26.956 |
| TS 26.962 | 3GPP TS 26.962 |
| TS 26.982 | 3GPP TS 26.982 |
| TS 26.998 | 3GPP TS 26.998 |
| TS 29.163 | 3GPP TS 29.163 |
| TS 29.332 | 3GPP TS 29.332 |
| TS 29.380 | 3GPP TS 29.380 |
| TS 29.412 | 3GPP TS 29.412 |
| TS 29.414 | 3GPP TS 29.414 |
| TS 29.415 | 3GPP TS 29.415 |
| TS 29.424 | 3GPP TS 29.424 |
| TS 29.514 | 3GPP TS 29.514 |
| TS 29.561 | 3GPP TS 29.561 |
| TS 29.582 | 3GPP TS 29.582 |
| TS 32.272 | 3GPP TR 32.272 |
| TS 33.303 | 3GPP TR 33.303 |
| TS 33.328 | 3GPP TR 33.328 |
| TS 33.790 | 3GPP TR 33.790 |
| TS 33.871 | 3GPP TR 33.871 |
| TS 33.880 | 3GPP TR 33.880 |
| TS 36.323 | 3GPP TR 36.323 |
| TS 36.401 | 3GPP TR 36.401 |
| TS 36.579 | 3GPP TR 36.579 |
| TS 36.750 | 3GPP TR 36.750 |
| TS 37.579 | 3GPP TR 37.579 |
| TS 37.901 | 3GPP TR 37.901 |
| TS 38.323 | 3GPP TR 38.323 |
| TS 43.051 | 3GPP TR 43.051 |
| TS 43.129 | 3GPP TR 43.129 |
| TS 43.318 | 3GPP TR 43.318 |
| TS 43.901 | 3GPP TR 43.901 |
| TS 43.902 | 3GPP TR 43.902 |
| TS 44.060 | 3GPP TR 44.060 |
| TS 44.065 | 3GPP TR 44.065 |
| TS 44.318 | 3GPP TR 44.318 |
| TS 48.103 | 3GPP TR 48.103 |