DSR

Distributed Speech Recognition

Services →
Introduced in R99 Also in: Services

DSR is a service architecture for mobile networks where speech recognition is split between a device that extracts and compresses speech features and a network server that performs the final recognition.

Category
Services
Introduced
R99
Where
Radio Access Network › NG-RAN (5G)
Also touches
1 segments
Specifications
13 specs
DSR Description Purpose Related Classification Detected Changes Specifications

Description

Distributed Speech Recognition (DSR) is a client-server architecture designed to provide accurate speech recognition services over mobile networks. It operates by dividing the recognition task between the User Equipment (UE) and a remote recognition server in the network. The UE runs the 'front-end' processing: it captures the audio via the microphone, applies acoustic preprocessing (noise suppression, echo cancellation), and then extracts a set of compact parametric representations (features) of the speech signal, typically Mel-Frequency Cepstral Coefficients (MFCCs). These features are then encoded using a standardized, bit-efficient codec and transmitted over the data channel to the network.

In the network, a dedicated DSR server receives the feature stream. This server hosts the 'back-end' recognition engine, which includes the acoustic models, pronunciation dictionaries, and language models. The server decodes the feature stream and uses statistical pattern matching (like Hidden Markov Models or deep neural networks) to convert the features into a text string or a semantic command. The result is then sent back to the UE or to another application server. This separation is key; it allows the computationally intensive and memory-heavy modeling and search processes to reside on powerful, updatable servers, while the UE handles the lighter, standardized front-end.

DSR's role is to deliver a consistent, high-accuracy recognition experience independent of the UE's processing power and the varying quality of the audio channel. By transmitting only features (a few kbps) instead of the full audio stream (e.g., 64 kbps for PCM), it conserves bandwidth and is more robust to transmission errors and low-bitrate voice codec distortions that would degrade server-side recognition if applied to decoded audio. It is a service enabler for network-based voice assistants, automated voice dialing, and voice-controlled services in vehicles.

Purpose & Motivation

DSR was created to solve the problem of providing high-quality, server-based speech recognition in the variable and sometimes constrained environment of early mobile networks (2G, 3G). Traditional 'server-only' recognition, where the UE sends compressed audio (e.g., using AMR), suffered because the voice codecs were optimized for human listening, not machine recognition. Codec artifacts and transmission errors could significantly degrade recognition accuracy.

The purpose of DSR was to standardize the interface between the mobile device and the recognition server, ensuring interoperability. It addressed the limitations of device-only recognition, which was constrained by the UE's limited processing and memory, making it impossible to host large vocabulary or complex models. By distributing the process, DSR leveraged the network's computational resources to provide a more powerful and updatable service, while the standardized front-end ensured the features sent to the server were clean and optimized for recognition, not listening, thus improving overall accuracy and reliability across different networks and devices.

Classification

Part ofIMS
Related approachesQoE

Detected Changes Across Releases

from 3GPP Change Requests

Specific changes extracted from the „Change history“ tables of 3GPP specifications (2 CRs across 2 releases). Complements the general historical overview above with the evidence-based evolution of this function.

Rel-18 1 change

In Release 18, a specific enhancement was introduced for the Distributed Speech Recognition (DSR) function concerning data volume calculation. The change addresses the scenario where DSR is associated with at least two RLC entities, ensuring accurate accounting of the uplink encoded speech data. This update provides necessary clarification for the DSR Optimised Codec operation within the Speech Recognition Framework.

  • Data volume calculation for DSR when associated with at least two RLC entities TS 38.323CR0133
Rel-19 1 change

In Release 19, the primary update for the Distributed Speech Recognition (DSR) function was a correction to the DSR triggering procedure. This change refined the mechanism that initiates the use of the DSR Optimised Codec on the UE, which extracts and encodes acoustic features for uplink transmission to server-side speech engines. The adjustment ensures the framework for automated voice services activates the optimized speech recognition path more reliably.

  • Correction on DSR triggering TS 38.321CR2140

Explore further

Broader topics and technologies where DSR plays a role.

Defining Specifications

3GPP specifications that define or reference DSR, with the latest known release. Sourced from the 3GPP document catalog — see methodology.

SpecificationTitleRelease
TR 22.977 vj00 Speech Enabled Services and Multimodal Framework Rel-19
TS 26.177 vj00 DSR Extended Advanced Front-end Test Sequences Rel-19
TS 26.235 vc00 Default Codecs for 3GPP IP Multimedia Subsystem Rel-12
TS 26.236 vc00 Packet Switched Conversational Multimedia Protocols Rel-12
TS 26.243 vj00 DSR Extended Advanced Front-end C Code Rel-19
TR 26.943 vj00 SES Codec Selection Report Rel-19
TS 38.300 vj00 NG-RAN Overall Description Rel-19
TS 38.306 vj00 NR UE Radio Access Capability Parameters Rel-19
TS 38.321 vj00 NR MAC Protocol Specification Rel-19
TS 38.322 vj00 NR Radio Link Control (RLC) Protocol Rel-19
TS 38.323 vj00 Packet Data Convergence Protocol (PDCP) Rel-19
TS 38.331 vj00 NR Radio Resource Control (RRC) Protocol Specification Rel-19
TR 45.912 vj00 GERAN Evolution Feasibility Study Rel-19