Description
Distributed Speech Recognition (DSR) is a client-server architecture designed to provide accurate speech recognition services over mobile networks. It operates by dividing the recognition task between the User Equipment (UE) and a remote recognition server in the network. The UE runs the 'front-end' processing: it captures the audio via the microphone, applies acoustic preprocessing (noise suppression, echo cancellation), and then extracts a set of compact parametric representations (features) of the speech signal, typically Mel-Frequency Cepstral Coefficients (MFCCs). These features are then encoded using a standardized, bit-efficient codec and transmitted over the data channel to the network.
In the network, a dedicated DSR server receives the feature stream. This server hosts the 'back-end' recognition engine, which includes the acoustic models, pronunciation dictionaries, and language models. The server decodes the feature stream and uses statistical pattern matching (like Hidden Markov Models or deep neural networks) to convert the features into a text string or a semantic command. The result is then sent back to the UE or to another application server. This separation is key; it allows the computationally intensive and memory-heavy modeling and search processes to reside on powerful, updatable servers, while the UE handles the lighter, standardized front-end.
DSR's role is to deliver a consistent, high-accuracy recognition experience independent of the UE's processing power and the varying quality of the audio channel. By transmitting only features (a few kbps) instead of the full audio stream (e.g., 64 kbps for PCM), it conserves bandwidth and is more robust to transmission errors and low-bitrate voice codec distortions that would degrade server-side recognition if applied to decoded audio. It is a service enabler for network-based voice assistants, automated voice dialing, and voice-controlled services in vehicles.
Purpose & Motivation
DSR was created to solve the problem of providing high-quality, server-based speech recognition in the variable and sometimes constrained environment of early mobile networks (2G, 3G). Traditional 'server-only' recognition, where the UE sends compressed audio (e.g., using AMR), suffered because the voice codecs were optimized for human listening, not machine recognition. Codec artifacts and transmission errors could significantly degrade recognition accuracy.
The purpose of DSR was to standardize the interface between the mobile device and the recognition server, ensuring interoperability. It addressed the limitations of device-only recognition, which was constrained by the UE's limited processing and memory, making it impossible to host large vocabulary or complex models. By distributing the process, DSR leveraged the network's computational resources to provide a more powerful and updatable service, while the standardized front-end ensured the features sent to the server were clean and optimized for recognition, not listening, thus improving overall accuracy and reliability across different networks and devices.
Classification
Detected Changes Across Releases
from 3GPP Change RequestsSpecific changes extracted from the „Change history“ tables of 3GPP specifications (2 CRs across 2 releases). Complements the general historical overview above with the evidence-based evolution of this function.
In Release 18, a specific enhancement was introduced for the Distributed Speech Recognition (DSR) function concerning data volume calculation. The change addresses the scenario where DSR is associated with at least two RLC entities, ensuring accurate accounting of the uplink encoded speech data. This update provides necessary clarification for the DSR Optimised Codec operation within the Speech Recognition Framework.
- Data volume calculation for DSR when associated with at least two RLC entities TS 38.323CR0133
In Release 19, the primary update for the Distributed Speech Recognition (DSR) function was a correction to the DSR triggering procedure. This change refined the mechanism that initiates the use of the DSR Optimised Codec on the UE, which extracts and encodes acoustic features for uplink transmission to server-side speech engines. The adjustment ensures the framework for automated voice services activates the optimized speech recognition path more reliably.
- Correction on DSR triggering TS 38.321CR2140
Explore further
Broader topics and technologies where DSR plays a role.
Defining Specifications
3GPP specifications that define or reference DSR, with the latest known release. Sourced from the 3GPP document catalog — see methodology.
| Specification | Title | Release |
|---|---|---|
| TR 22.977 vj00 | Speech Enabled Services and Multimodal Framework | Rel-19 |
| TS 26.177 vj00 | DSR Extended Advanced Front-end Test Sequences | Rel-19 |
| TS 26.235 vc00 | Default Codecs for 3GPP IP Multimedia Subsystem | Rel-12 |
| TS 26.236 vc00 | Packet Switched Conversational Multimedia Protocols | Rel-12 |
| TS 26.243 vj00 | DSR Extended Advanced Front-end C Code | Rel-19 |
| TR 26.943 vj00 | SES Codec Selection Report | Rel-19 |
| TS 38.300 vj00 | NG-RAN Overall Description | Rel-19 |
| TS 38.306 vj00 | NR UE Radio Access Capability Parameters | Rel-19 |
| TS 38.321 vj00 | NR MAC Protocol Specification | Rel-19 |
| TS 38.322 vj00 | NR Radio Link Control (RLC) Protocol | Rel-19 |
| TS 38.323 vj00 | Packet Data Convergence Protocol (PDCP) | Rel-19 |
| TS 38.331 vj00 | NR Radio Resource Control (RRC) Protocol Specification | Rel-19 |
| TR 45.912 vj00 | GERAN Evolution Feasibility Study | Rel-19 |