MS-SSIM

Multi-Scale Structural Similarity Index

Other
Introduced in Rel-12
MS-SSIM (Multi-Scale Structural Similarity Index) is an objective, full-reference video quality assessment algorithm standardized by 3GPP. It evaluates perceptual video quality by comparing a processed video sequence to its original source across multiple spatial scales, providing a score that correlates well with human subjective opinion.

Description

The Multi-Scale Structural Similarity Index (MS-SSIM) is a perceptual video quality measurement model adopted by 3GPP for objective quality assessment. Unlike simple pixel-based metrics like Peak Signal-to-Noise Ratio (PSNR), MS-SSIM is designed to emulate the human visual system's perception of image fidelity. It operates as a full-reference metric, meaning it requires access to both the original, undistorted source video and the processed or degraded video under test. The algorithm works by analyzing the structural information in the videos, based on the principle that the human visual system is highly adapted to extract structural information. The 'Multi-Scale' aspect is key: the algorithm applies a low-pass filter and downsamples the original and processed videos iteratively to create a pyramid of images at different resolutions or scales. At each scale, it computes local statistics—luminance, contrast, and structure—within a sliding window. These statistics are compared between the reference and test videos. Luminance comparison measures mean intensity differences, contrast comparison measures the standard deviation of intensity, and structure comparison measures the correlation coefficient between the two image patches. The results from each scale are then combined using a weighted product, giving more importance to finer scales (higher resolution) where the human eye is more sensitive to detail. The final output is a single score, typically between 0 and 1 (where 1 indicates perfect similarity), which predicts the Mean Opinion Score (MOS) a panel of human viewers would likely give. Within 3GPP, MS-SSIM is used in performance testing specifications to evaluate the quality impact of video processing steps like compression (e.g., with AVC or HEVC), transmission over error-prone channels, and application of error concealment techniques, providing a standardized, repeatable way to quantify perceptual quality.

Purpose & Motivation

MS-SSIM was standardized to address the inadequacy of traditional metrics like PSNR for evaluating perceptual video quality in modern telecommunications. PSNR, while computationally simple, often correlates poorly with human subjective judgments, especially for modern codecs and complex distortions. As video became a primary service over 3G, 4G, and 5G networks, there was a strong need for an objective, automated quality assessment tool that could reliably predict viewer satisfaction during codec development, network planning, and Quality of Experience (QoE) monitoring. The motivation for its inclusion in 3GPP specs (starting in Release 12) was to provide a common, industry-agreed methodology for performance testing of video codecs and transmission systems. This allows for fair and comparable benchmarking between different vendor implementations and technologies. Prior to its adoption, subjective testing with human viewers was the gold standard but was expensive, time-consuming, and not repeatable at scale. MS-SSIM solved this by offering a computational model that closely approximates subjective scores. It specifically addresses the multi-scale nature of human vision, where distortions at different spatial frequencies have different perceptual impacts, making it more accurate than single-scale metrics like the original SSIM for assessing video quality across a range of resolutions and viewing conditions relevant to mobile streaming.

Key Features

  • Full-reference metric comparing processed video to original source
  • Multi-scale analysis using a Gaussian pyramid to assess quality at different resolutions
  • Computes local luminance, contrast, and structure comparisons
  • Produces a single perceptual quality score highly correlated with subjective MOS
  • Standardized computational model ensuring consistent results across implementations
  • Used for objective testing of video codec performance and transmission robustness

Evolution Across Releases

Rel-12 Initial

Initially standardized MS-SSIM as an objective perceptual video quality assessment method within 3GPP specifications. It defined the algorithm's application for evaluating video coding performance, providing a standardized alternative to PSNR for more human-correlated quality measurement in mobile video services.

Defining Specifications

SpecificationTitle
TS 26.938 3GPP TS 26.938
TS 26.955 3GPP TS 26.955