SP-MIDI (Scalable Polyphony MIDI) — 3GPP Glossary

A 3GPP-adapted version of the MIDI (Musical Instrument Digital Interface) protocol for mobile networks. It enables efficient transmission of polyphonic musical data, allowing ringtones and audio services to scale sound quality based on the receiving device's capabilities and network conditions.

Description

Scalable Polyphony MIDI (SP-MIDI) is a mobile-optimized audio service technology standardized by 3GPP, detailed in specifications such as 26.140 (codec), 26.141 (conformance), and 26.234 (transports). It is based on the industry-standard MIDI protocol but incorporates critical adaptations for the constrained and variable environment of wireless communication. SP-MIDI defines a structured method to encode musical performances as a sequence of events (note-ons, note-offs, program changes, control changes) that is highly compact compared to sampled audio. The core innovation is its 'scalability' feature, where a single musical composition contains multiple, prioritized instrument tracks or channels.

The architecture of an SP-MIDI service involves a content creator, a network server, and a receiving UE. The content is authored using the General MIDI (GM) or Mobile DLS (Downloadable Sounds) Level 2 sound set as a baseline. The scalability is implemented through a 'Channel Priority Table' embedded in the MIDI file. This table assigns a priority level (e.g., 0-15) to each of the 16 MIDI channels. The receiving device, which may have limited polyphony (the number of simultaneous notes it can synthesize, e.g., 4, 16, or 24 voices), uses this table to dynamically adapt playback. If the device's polyphony capability is lower than the number of active notes in the composition, it will mute or disable the notes on the lowest-priority channels first, ensuring the most critical melodic and rhythmic elements are always heard.

How it works in practice: When an SP-MIDI file is delivered to a UE (e.g., as a ringtone download via MMS or streaming), the device's media player parses the file header to read the Channel Priority Table and identifies its own hardware polyphony limit. During playback, the synthesizer engine monitors the number of concurrently active notes. As this number approaches the device's limit, the scheduler begins to drop notes scheduled to sound on channels with the lowest priority numbers. This happens in real-time, providing a graceful degradation of audio richness rather than a complete failure or distortion. The protocol also supports downloadable sound banks (DLS) to ensure consistent timbre across different devices, although the primary focus is on structural scalability of polyphony.

Purpose & Motivation

SP-MIDI was created in 3GPP Release 8 to solve the problem of delivering rich, polyphonic ringtones and audio services across a highly fragmented ecosystem of mobile handsets with vastly different audio capabilities. Before SP-MIDI, monophonic ringtones or proprietary formats led to inconsistent user experiences, and polyphonic MIDI files could sound broken or silent on devices with insufficient synthesizer voices. The mobile industry needed a standardized, forward-compatible format that would allow content creators to author a single piece of music that could sound acceptable on both a basic 4-voice phone and a premium 64-voice smartphone.

The motivation stemmed from the booming market for personalized ringtones in the 2000s. Network operators and content providers wanted a reliable service where a purchased ringtone would work predictably on any subscriber's phone. SP-MIDI addressed the limitations of static MIDI by introducing intelligence at the playback device. Instead of the network needing to know every device's capability and transcode content accordingly, the intelligence was pushed to the edge—the file itself carried the prioritization rules, and the device performed the adaptation. This reduced network processing complexity and storage needs for multiple asset versions, while guaranteeing a baseline acceptable playback experience, thus protecting revenue streams for content services.

Key Features

Dynamic polyphony adaptation based on device capability
Embedded Channel Priority Table (0-15) for graceful degradation
Based on General MIDI and Mobile DLS Level 2 sound sets
Extremely low bandwidth consumption compared to sampled audio
Support for downloadable sound banks for timbre consistency
Standardized transport over 3GPP packet-switched streaming (PSS) and MMS

Evolution Across Releases

Rel-8 Initial

Initial standardization of the Scalable Polyphony MIDI codec and its integration into 3GPP services. Defined the core Channel Priority Table mechanism, file format structure, and conformance points, establishing it as the recommended format for polyphonic ringtones and audio clips in mobile networks.

TS 26.140 TS 26.141 TS 26.234

Defining Specifications

Specification	Title
TS 26.140	3GPP TS 26.140
TS 26.141	3GPP TS 26.141
TS 26.234	3GPP TS 26.234