DSI (Digital Speech Interpolation) — 3GPP Glossary

Digital Speech Interpolation (DSI) is a voice compression technique that increases the capacity of transmission channels by exploiting silent periods in human speech. It dynamically allocates bandwidth only during active speech segments, allowing multiple conversations to share a single channel. This technology is crucial for efficiently utilizing expensive transmission resources in telecommunication networks.

Description

Digital Speech Interpolation (DSI) operates on the principle that a typical two-way telephone conversation contains significant periods of silence, such as pauses between sentences and listening time. The system uses voice activity detection (VAD) to distinguish between active speech and silence or background noise. During active speech, the digital speech samples are packetized and transmitted over the channel. During silent periods, the channel is not allocated to that conversation, freeing up bandwidth for other users.

The core architecture involves a speech detector, a buffer, and a control unit. The speech detector analyzes the incoming signal to identify talk spurts. These spurts are then placed into packets with appropriate addressing information. A key component is the assignment logic, which dynamically assigns available channel slots to active speakers from a pool of users. This statistical multiplexing allows the number of supported users to exceed the number of physical transmission channels.

In the 3GPP context, DSI is referenced in specifications like 21.905 (Vocabulary) and 43.050 (Transmission planning aspects of the speech service in the GERAN). Its role is primarily in optimizing the use of transmission resources in the core network and backhaul links, particularly for circuit-switched voice services. It improves the trunking efficiency between network nodes, reducing the required number of physical E1/T1 lines and associated costs.

Purpose & Motivation

DSI was created to address the high cost and limited availability of transmission lines, especially in long-distance and international telephony. Before techniques like DSI, each voice call required a dedicated 64 kbps channel (DS0) for its entire duration, regardless of actual speech activity. This was highly inefficient, as a large portion of the bandwidth was wasted during silent periods.

The historical context lies in the transition from analog to digital telephony, where maximizing the return on investment for expensive digital transmission infrastructure became paramount. DSI solved this by applying statistical multiplexing to voice traffic, allowing carriers to serve more subscribers with the same physical resources. It directly addressed the limitation of fixed channel allocation, enabling significant cost savings and increased capacity without degrading perceived voice quality for the end user.

Key Features

Voice Activity Detection (VAD) to distinguish speech from silence
Dynamic bandwidth allocation based on speech activity
Statistical multiplexing of multiple voice channels
Packetization of active speech spurts
Increased trunking efficiency for transmission links
Transparent operation to end-users without perceived quality loss

Evolution Across Releases

Rel-5 Initial

Introduced as a defined term and concept within 3GPP standards for GSM/UMTS networks. Specified as a transmission planning technique in GERAN to optimize the utilization of A-interface and other core network transmission links. The initial architecture focused on circuit-switched voice traffic efficiency.

TS 21.905 TS 43.050

Defining Specifications

Specification	Title
TS 21.905	3GPP TS 21.905
TS 43.050	3GPP TR 43.050