MUSHRA

Multiple Stimulus with Hidden Reference and Anchors method

Services
Introduced in Rel-8
A standardized subjective audio quality assessment methodology defined by 3GPP and ITU. Listeners rate the quality of multiple processed audio samples against a hidden reference and explicit anchor samples (high and low quality). It is the primary method for evaluating the perceptual quality of speech and audio codecs.

Description

The Multiple Stimulus with Hidden Reference and Anchors (MUSHRA) method is a rigorous, controlled procedure for subjectively evaluating the perceptual quality of intermediate to high-quality audio codecs and processing systems. In a MUSHRA test, a panel of listeners with normal hearing is presented with a series of audio sequences. For each test item, the listener hears several versions (stimuli) of the same source audio: one is the hidden, unprocessed reference (the original high-quality signal), others are the codec/processing outputs under test, and included are explicit anchor stimuli—a high-quality anchor (e.g., a mild low-pass filter) and a low-quality anchor (e.g., a severe bandwidth limitation). All stimuli, including the reference, are presented in a randomized order and are labeled anonymously (e.g., A, B, C). The listener's task is to rate each stimulus on a continuous scale from 0 (bad) to 100 (excellent) relative to their perception of ideal quality. The hidden reference serves as an internal control to check listener reliability, while the anchors provide a fixed quality framework, ensuring scores are consistent across different tests and laboratories. The final result for a codec is the average score across all listeners and test items, providing a Mean Opinion Score (MOS) that reliably reflects its perceptual performance.

Purpose & Motivation

MUSHRA was developed to address the limitations of simpler listening test methods, like the Absolute Category Rating (ACR), which are inadequate for assessing high-quality audio where impairments are often subtle. Before MUSHRA, comparing advanced wideband or full-band codecs was challenging due to a lack of sensitivity and context in scoring. The method was created to provide a highly reliable and repeatable way to rank the performance of speech and audio codecs, such as EVS, AMR-WB, and 3GPP audio standards for multimedia services. It solves the problem of subjective bias by hiding the reference and including calibrated anchors, which stabilize the rating scale across different listener panels and test sessions. This is critical for 3GPP standardization, where objective metrics (like PESQ) are insufficient, and definitive, human-centric quality decisions are needed to select the best codec among competing proposals for inclusion in the specifications, ensuring optimal quality of experience for end-users.

Key Features

  • Uses a hidden, unprocessed reference signal for listener calibration
  • Includes explicit high and low-quality anchors to stabilize the rating scale
  • Employs a continuous rating scale from 0 to 100 for fine-grained assessment
  • Stimuli are presented in randomized, blind order to prevent bias
  • Designed for evaluating intermediate to high-quality audio systems (bandwidth > 3.5 kHz)
  • Results in a robust Mean Opinion Score (MOS) for reliable codec comparison

Evolution Across Releases

Rel-8 Initial

Formally adopted and specified within 3GPP as the recommended method for subjective testing of wideband speech codecs and audio systems. Established the core test procedure, requirements for listeners, equipment, and environment, solidifying its role in codec qualification.

Defining Specifications

SpecificationTitle
TS 26.818 3GPP TS 26.818
TS 26.936 3GPP TS 26.936
TS 26.950 3GPP TS 26.950
TS 26.996 3GPP TS 26.996