VXML

Voice Extensible Markup Language

Services →
Introduced in Rel-7

VXML is an XML-based markup language standardized by 3GPP for creating voice-driven applications like IVR in the IMS, separating application logic from media processing for web-like deployment.

Category
Services
Introduced
Rel-7
Where
Core Network › 5G Core
Specifications
1 specs
VXML Description Purpose Related Classification Specifications

Description

Voice Extensible Markup Language (VXML), standardized by the W3C and adopted by 3GPP in specification 23.333, is a key technology for developing voice-based services in telecommunications networks, particularly within the IP Multimedia Subsystem (IMS). It functions as an application-layer protocol that defines a dialog flow between a user and a voice service. A VXML document, or script, is processed by a special interpreter called a Voice Browser, which resides in a media server (e.g., a Media Resource Function Processor, MRFP). The browser executes the script, controls audio playback (synthesized speech or pre-recorded audio), processes user input (speech or DTMF tones), and makes logic decisions to navigate the call flow.

The architecture involves several key components. The VXML Forum's architecture, referenced by 3GPP, includes the Voice Browser, which fetches VXML documents from an Application Server (AS) via HTTP. The AS hosts the service logic and business rules, generating dynamic VXML pages. The Media Server provides the actual speech recognition (ASR), speech synthesis (TTS), and audio playback resources. A VXML script is composed of a series of dialog states (like <form> and <menu>) containing <field> elements to collect input, <prompt> elements to play audio, and <filled> blocks that define actions to take when input is received. Event handling (<catch>) manages errors and unexpected inputs. This declarative model separates the service logic on the AS from the media processing details, allowing developers to focus on the conversational design.

In the 3GPP IMS network, VXML plays a crucial role in enabling standardized, network-agnostic voice applications. When an IMS subscriber initiates a voice call to a service (like a voice portal, automated customer service, or conference system), the Serving-Call Session Control Function (S-CSCF) routes the call to an appropriate Application Server based on initial Filter Criteria (iFC). This AS can then act as a VXML interpreter or, more commonly, fetch VXML documents from a web server and relay them to a dedicated Media Resource Function (MRF) that hosts the Voice Browser. The MRF establishes a media session with the user's device using protocols like RTP and executes the VXML dialog. This allows for rich, interactive services such as voice-activated dialing, voice messaging, audio conferencing controls, and natural language voice portals, all delivered seamlessly over packet-switched IMS networks alongside other multimedia services.

Purpose & Motivation

VXML was created to solve the historical problem of proprietary, complex, and costly development of interactive voice response (IVR) systems. Before VXML, IVR applications were typically built using low-level, vendor-specific programming languages and tools that tightly coupled the application logic with the telephony hardware and media resources. This made applications difficult to port, expensive to develop and maintain, and limited innovation to a small pool of specialized developers.

3GPP's adoption of VXML, beginning in Release 7, was motivated by the move towards all-IP networks and the IMS. IMS aimed to provide a standardized, service-creation environment for multimedia. For voice services, a web-inspired model was needed. VXML provided exactly that: it applied the successful paradigm of web development (client-server, markup languages, HTTP) to the voice world. By using XML, it became easy to generate dynamic voice dialogs from web application servers, allowing a vast community of web developers to create telephony applications. This addressed the limitations of the old approach by promoting interoperability, reducing development time, fostering a tools ecosystem, and enabling the easy integration of voice services with web data and business logic. It was a key enabler for delivering consistent, advanced voice services across the evolving network landscape towards LTE and 5G.

Classification

Part ofIVR
Related approachesSIPRTP

Evolution Across Releases

Rel-7 Initial

Initial adoption of VoiceXML 2.0/2.1 into the 3GPP IMS service framework. Defined the architecture for VXML-based services, specifying the role of the Application Server (AS) and Media Resource Function (MRF) with an integrated Voice Browser. Established the basic mechanisms for call routing from the S-CSCF to a VXML service and the execution of voice dialogs over the IMS Media Plane.

Explore further

Broader topics and technologies where VXML plays a role.

Defining Specifications

3GPP specifications that define or reference VXML, with the latest known release. Sourced from the 3GPP document catalog — see methodology.

SpecificationTitleRelease
TS 23.333 vj00 MRFC-MRFP Mp Interface Requirements Rel-19