NLSML

Natural Language Semantics Markup Language

Services
Introduced in Rel-7
NLSML is an XML-based markup language defined by 3GPP for representing the semantic results of natural language processing (NLP) in speech-enabled services. It standardizes how a spoken user query is interpreted into structured data (like commands or search intents) for processing by network applications, enabling voice control and interactive voice response systems.

Description

The Natural Language Semantics Markup Language (NLSML) is an XML schema defined in 3GPP TS 23.333. It serves as a standardized data format for conveying the semantic interpretation of a user's natural language utterance, typically originating from a speech recognition system. When a user speaks a command or query to a network-based service (e.g., "call John Smith" or "what is the weather in Paris?"), automatic speech recognition (ASR) converts the audio to text. Natural Language Understanding (NLU) components then analyze this text to extract meaning and intent. NLSML provides a common structure to represent this extracted semantics in a machine-readable way.

The structure of an NLSML document includes elements to describe the interpretation results. Key components include the 'interpretation' element, which contains one or more possible interpretations of the utterance, each with a confidence score. Within an interpretation, details such as the identified 'action' (e.g., 'dial', 'search'), 'input' modes, and 'instance' data (like the contact name 'John Smith' for a dial action) are encoded. This structured representation allows the receiving application server (e.g., a voice call control server or an information service) to unambiguously understand the user's request without needing to re-parse the raw text, enabling reliable execution of the intended service.

NLSML operates within the broader 3GPP IP Multimedia Subsystem (IMS) service architecture for speech-enabled applications. It is a crucial part of the interface between the Media Resource Function (MRF), which may host speech processing resources, and the application servers that provide the actual service logic. By standardizing this semantic interface, NLSML ensures interoperability between speech recognition engines from different vendors and the myriad of potential application services, fostering a ecosystem for advanced voice services in mobile networks.

Purpose & Motivation

NLSML was created to solve the problem of interoperability in speech-enabled services. Before standardization, voice application developers and network equipment vendors would use proprietary formats to represent the results of speech recognition and understanding. This locked service providers into specific vendor solutions and made it difficult to mix and match best-of-breed components for ASR, NLU, and application logic. The proliferation of interactive voice services and voice-controlled features in mobile networks demanded a common language for semantics.

Its development, beginning in Release 7, was motivated by the growth of IMS and the desire to offer rich, network-based voice services like voice dialing, voice search, and voice navigation in a vendor-neutral manner. NLSML provides a clear contract between the component that understands the user's speech (often in the network) and the component that acts upon that understanding (the service application). This separation of concerns allows for innovation and specialization in both speech technology and service design, while maintaining a stable interface. It addresses the limitation of previous ad-hoc approaches by providing a robust, extensible, and standardized markup that can represent complex user intents and associated data slots.

Key Features

  • XML-based schema for representing semantic interpretations of natural language
  • Supports multiple alternative interpretations with confidence scores
  • Encodes user intent (action), relevant entities (instances), and modality
  • Enables interoperability between speech processors and application servers
  • Integrates with 3GPP IMS architecture and Media Resource Function (MRF)
  • Facilitates the development of vendor-independent voice-enabled services

Evolution Across Releases

Rel-7 Initial

Initially defined in TS 23.333 as part of the IMS Multimedia Telephony service and other speech-enabled applications. It established the core XML schema for representing speech recognition semantics to enable basic voice commands and interactive services.

Defining Specifications

SpecificationTitle
TS 23.333 3GPP TS 23.333