MMSP

Multimodal Synchronization Protocol

Protocol
Introduced in Rel-2
MMSP is a protocol defined by 3GPP for synchronizing multiple media components (like audio, video, and text) in a multimedia presentation or service. It ensures that different media streams are presented to the user in a coordinated, time-aligned manner. This is essential for delivering a coherent and immersive user experience in applications like synchronized slide shows with audio narration.

Description

The Multimodal Synchronization Protocol (MMSP) is a key protocol within the 3GPP Multimedia Broadcast/Multicast Service (MBMS) framework, specifically designed to handle the temporal alignment of multiple, independent media streams that constitute a single multimedia session. Unlike a single container format, MMSP treats each media component (e.g., an audio track, a video track, a timed text stream, or an image sequence) as a separate entity that is delivered, often via broadcast or multicast, and must be precisely synchronized at the receiver. The protocol operates at the application layer, working in conjunction with lower-layer transport protocols to achieve lip-sync, slide synchronization, and other time-critical presentations.

MMSP works by utilizing synchronization markers and timing information embedded within each media stream. A central concept is the use of a common timeline, often derived from a clock reference. The protocol defines structures like synchronization channels and synchronization items that carry this timing data. The MBMS client on the receiving device uses MMSP to collect these streams, buffer them appropriately, and then present them in a synchronized fashion based on the received timing instructions. It handles issues like network jitter and differential delay between streams to ensure that, for example, the audio commentary matches the displayed slide change exactly, regardless of how the individual data packets arrived.

The protocol's architecture is closely tied to the FLUTE/ALC file delivery protocol used in MBMS for transporting the media files or stream fragments. MMSP provides the necessary metadata and control information to tell the client how to associate and align these delivered files. It supports both streaming and download delivery modes of MBMS. Key components include the synchronization source description, which identifies the media components and their relationships, and the timing metadata that provides the precise presentation schedule. By decoupling the delivery of media objects from their synchronized presentation logic, MMSP enables efficient broadcast of rich, time-sensitive multimedia content to large audiences, which is a cornerstone of services like mobile TV and public warning systems with multimedia alerts.

Purpose & Motivation

MMSP was created to address a fundamental challenge in broadcast/multicast multimedia services: delivering a cohesive experience from multiple, separately encoded and transported media elements. Early streaming services often combined media into a single stream (like an MPEG transport stream), which was efficient but inflexible. For MBMS, 3GPP needed a method that allowed different media components (e.g., a high-quality video stream, a separate audio track in multiple languages, and auxiliary image slides) to be broadcast efficiently and combined flexibly at the receiver based on user selection or device capability.

The protocol solves the problem of synchronization in an environment where streams may take different network paths or experience different delays. Without MMSP, an audio track and a sequence of images might arrive correctly but play back out of sequence, ruining the presentation. Its development was motivated by the desire to enable sophisticated MBMS applications like narrated news clips, sports highlights with synchronized statistics, and interactive educational content. MMSP provides the necessary 'glue' at the application layer, allowing network operators and content providers to create rich, TV-like experiences over cellular broadcast networks, optimizing bandwidth usage by allowing users to subscribe only to the language audio track they need, for instance, while all users receive the common video stream.

Key Features

  • Provides temporal synchronization for multiple independent media streams in MBMS
  • Utilizes a common timeline and synchronization markers across streams
  • Works with FLUTE/ALC protocol for efficient file delivery over broadcast
  • Supports both streaming and download delivery modes of MBMS
  • Enables lip-sync, slide-audio synchronization, and other timed presentations
  • Allows flexible combination of media components (e.g., choice of audio language) at the receiver

Evolution Across Releases

Rel-2 Initial

Introduced the initial concept of MMSP as part of the MBMS work item in 3GPP TSG SA. Defined the basic principles and requirements for synchronizing media components in a broadcast/multicast environment, establishing its role within the broader MBMS architecture.

Fully specified the MMSP protocol in detail within the MBMS framework (TS 26.346). Defined the synchronization channel, synchronization items, and the procedures for aligning streaming media and file delivery sessions, enabling the first commercial implementations of synchronized MBMS services.

Enhanced MMSP to support improved error resilience and recovery mechanisms for synchronization data. Introduced optimizations for handling longer duration sessions and more complex media component relationships.

Updated MMSP to align with the evolved MBMS (eMBMS) architecture for LTE. Ensured compatibility with new LTE transport layers and introduced support for more advanced synchronization scenarios required for enhanced mobile TV services.

Further evolved MMSP as part of the FeMBMS (Further evolved MBMS) work for LTE and 5G broadcast. Enhanced efficiency and scalability for large-scale terrestrial broadcast deployments, such as those for public warning and automotive applications.

Defining Specifications

SpecificationTitle
TS 22.977 3GPP TS 22.977