UCS2

Universal two byte coded Character Set

Other
Introduced in R99
A 16-bit character encoding standard used in 3GPP networks for representing text, particularly for SMS and control plane signaling. It enables consistent international text handling by providing a fixed two-byte representation for each character, supporting a wide range of languages and scripts.

Description

UCS2, the Universal two byte coded Character Set, is a fundamental character encoding scheme standardized within 3GPP. It is based on the ISO/IEC 10646 Universal Character Set (UCS) and uses a fixed 16-bit (two-byte) representation for every character. This encoding maps directly to the Basic Multilingual Plane (BMP) of Unicode, covering code points from U+0000 to U+FFFF. In the 3GPP architecture, UCS2 is primarily employed in the Non-Access Stratum (NAS) and for messaging services like Short Message Service (SMS) to ensure text data is transmitted in a consistent, universally interpretable format across different network elements and user equipment from various manufacturers.

Technically, UCS2 operates by assigning each character a unique 16-bit code unit. This is in contrast to variable-length encodings like UTF-8. When a device or network element needs to send text, it converts the string into a sequence of these 16-bit values. For SMS, this encoding is specified in the TP-User-Data field, allowing messages to contain characters from diverse languages such as Arabic, Chinese, Greek, or Cyrillic, provided all characters reside within the BMP. The encoding is handled by the protocol layers responsible for user data adaptation and signaling message construction.

Its role in the network is critical for interoperability and global roaming. Core network nodes like the MSC, SGSN, and HSS, as well as the UE itself, must agree on character representation to correctly display subscriber names, SMS content, and other service-related strings. UCS2 provides this common language. However, it does not support characters outside the BMP (like some emojis or historic scripts), which led to the adoption of UTF-16 in later releases. The encoding is specified across multiple 3GPP technical specifications (TSs) governing vocabulary, services, and secure applications.

Purpose & Motivation

UCS2 was introduced to solve the problem of inconsistent and limited character encoding in early mobile telecommunications systems. Prior to its standardization, networks often used proprietary or regional 7-bit or 8-bit encodings (like GSM 7-bit default alphabet), which severely restricted the range of representable characters and hindered global interoperability. The growth of international roaming and the demand for multilingual services necessitated a unified, extensive character set.

The primary motivation was to enable mobile services, especially SMS, to support virtually any written language used worldwide, thereby making GSM and UMTS networks truly global. By adopting a 16-bit encoding aligned with the Unicode standard, 3GPP ensured that text data could be exchanged between any compliant devices and networks without corruption or loss of meaning. This was a foundational step for service personalization, international subscriber identity, and the development of value-added services reliant on text.

Key Features

  • Fixed 16-bit encoding for each character
  • Direct alignment with Unicode Basic Multilingual Plane (BMP)
  • Essential for SMS (TP-User-Data) and control plane signaling
  • Enables consistent multilingual text representation
  • Foundation for global interoperability and roaming
  • Specified across core service and vocabulary specifications

Evolution Across Releases

R99 Initial

Initially standardized as the primary character encoding for 3GPP systems, particularly for SMS and network signaling. It provided a fixed two-byte per character representation based on ISO/IEC 10646, enabling support for a wide array of international characters within the Basic Multilingual Plane.

Defining Specifications

SpecificationTitle
TS 21.905 3GPP TS 21.905
TS 22.042 3GPP TS 22.042
TS 22.112 3GPP TS 22.112
TS 31.113 3GPP TR 31.113