Description
The Multi-Scale Structural Similarity Index (MS-SSIM) is a perceptual video quality measurement model adopted by 3GPP for objective quality assessment. Unlike simple pixel-based metrics like Peak Signal-to-Noise Ratio (PSNR), MS-SSIM is designed to emulate the human visual system's perception of image fidelity. It operates as a full-reference metric, meaning it requires access to both the original, undistorted source video and the processed or degraded video under test. The algorithm works by analyzing the structural information in the videos, based on the principle that the human visual system is highly adapted to extract structural information. The 'Multi-Scale' aspect is key: the algorithm applies a low-pass filter and downsamples the original and processed videos iteratively to create a pyramid of images at different resolutions or scales. At each scale, it computes local statistics—luminance, contrast, and structure—within a sliding window. These statistics are compared between the reference and test videos. Luminance comparison measures mean intensity differences, contrast comparison measures the standard deviation of intensity, and structure comparison measures the correlation coefficient between the two image patches. The results from each scale are then combined using a weighted product, giving more importance to finer scales (higher resolution) where the human eye is more sensitive to detail. The final output is a single score, typically between 0 and 1 (where 1 indicates perfect similarity), which predicts the Mean Opinion Score (MOS) a panel of human viewers would likely give. Within 3GPP, MS-SSIM is used in performance testing specifications to evaluate the quality impact of video processing steps like compression (e.g., with AVC or HEVC), transmission over error-prone channels, and application of error concealment techniques, providing a standardized, repeatable way to quantify perceptual quality.
Purpose & Motivation
MS-SSIM was standardized to address the inadequacy of traditional metrics like PSNR for evaluating perceptual video quality in modern telecommunications. PSNR, while computationally simple, often correlates poorly with human subjective judgments, especially for modern codecs and complex distortions. As video became a primary service over 3G, 4G, and 5G networks, there was a strong need for an objective, automated quality assessment tool that could reliably predict viewer satisfaction during codec development, network planning, and Quality of Experience (QoE) monitoring. The motivation for its inclusion in 3GPP specs (starting in Release 12) was to provide a common, industry-agreed methodology for performance testing of video codecs and transmission systems. This allows for fair and comparable benchmarking between different vendor implementations and technologies. Prior to its adoption, subjective testing with human viewers was the gold standard but was expensive, time-consuming, and not repeatable at scale. MS-SSIM solved this by offering a computational model that closely approximates subjective scores. It specifically addresses the multi-scale nature of human vision, where distortions at different spatial frequencies have different perceptual impacts, making it more accurate than single-scale metrics like the original SSIM for assessing video quality across a range of resolutions and viewing conditions relevant to mobile streaming.
Classification
Evolution Across Releases
Initially standardized MS-SSIM as an objective perceptual video quality assessment method within 3GPP specifications. It defined the algorithm's application for evaluating video coding performance, providing a standardized alternative to PSNR for more human-correlated quality measurement in mobile video services.
Explore further
Broader topics and technologies where MS-SSIM plays a role.
Defining Specifications
3GPP specifications that define or reference MS-SSIM, with the latest known release. Sourced from the 3GPP document catalog — see methodology.
| Specification | Title | Release |
|---|---|---|
| TR 26.938 vj00 | DASH Deployment Guidelines for 3GPP Networks | Rel-19 |
| TR 26.955 vj00 | Video Codec Analysis for 5G Services | Rel-19 |