UCS-2 (Universal Character Set (the two octet form)) — 3GPP Glossary

A fixed-width, 16-bit character encoding form of the UCS/Unicode standard. Each character is represented by exactly two octets (16 bits), covering the Basic Multilingual Plane (BMP). It was widely used in early 3GPP systems for SMS and signaling but is largely superseded by UTF-16.

Description

UCS-2, standing for 'Universal Character Set - 2 octet form,' is a fixed-width character encoding specified in ISO/IEC 10646. It directly represents each character's code point in the UCS repertoire using exactly two octets (16 bits). This encoding is limited to representing characters from the Basic Multilingual Plane (BMP) of Unicode/UCS, which includes most commonly used characters for modern languages, but excludes supplementary characters like many historic scripts, symbols, and emojis that reside in other planes. In 3GPP systems, UCS-2 was historically implemented as a specific encoding for text, particularly in the context of the Short Message Service (SMS) and certain signaling parameters.

Technically, how UCS-2 works is straightforward: a sequence of 16-bit code units directly corresponds to a sequence of characters from the BMP. For example, the Latin capital letter 'A' (U+0041) is encoded as the two-octet sequence 0x00 0x41. This fixed-width property made processing simpler for early systems, as string operations like counting characters or indexing could be done directly based on byte position. Within 3GPP, specifications such as those for Multimedia Messaging Service (MMS) and Packet-Switched Streaming Service (PSS) referenced UCS-2 as a permissible or required character encoding for content and metadata to support a broader range of languages than the original GSM 7-bit alphabet.

However, the role of UCS-2 has evolved. Its major limitation is its inability to encode any character outside the BMP (code points above U+FFFF). To address this, the variable-length UTF-16 encoding was developed, which is backward-compatible with UCS-2 for BMP characters but uses surrogate pairs (two 16-bit code units) to represent supplementary characters. Consequently, in modern 3GPP specifications, 'UCS-2' is often used historically or interchangeably with the UTF-16BE (Big Endian) encoding when only BMP characters are in use. For true universal coverage, UTF-16 is now the recommended encoding, but understanding UCS-2 remains important for interoperability with legacy devices and networks that strictly implemented the fixed two-octet form.

Purpose & Motivation

UCS-2 was introduced to provide a significant upgrade from the limited 7-bit and 8-bit character sets used in early GSM. It solved the immediate problem of supporting a wide array of international languages, including those with large character sets like Chinese, Japanese, and Korean (CJK), within mobile messaging services like SMS. Before UCS-2, sending an SMS in these languages was impossible or required proprietary, non-interoperable extensions.

The motivation for its adoption in 3GPP Release 8 and related specs was to establish a concrete, implementable subset of the full UCS for use in bandwidth-constrained and processing-limited environments. UCS-2 offered a good compromise: it supported tens of thousands of characters with a simple, fixed-width format that was easier to handle in device firmware and network signaling than a variable-length encoding. It enabled the first wave of true internationalization for text-based mobile services.

However, the limitations of UCS-2 became apparent as the need to support even more characters (e.g., for ancient scripts, specialized symbols, and later, emojis) grew. It could not represent characters outside the BMP, which led to the development and eventual preference for UTF-16. Thus, while UCS-2 served a critical transitional purpose in globalizing mobile text communication, its evolution was driven by the need for a truly complete character encoding, making it a foundational but largely historical step in 3GPP's text handling strategy.

Key Features

Fixed-width encoding using exactly two octets (16 bits) per character
Encodes characters from the Unicode Basic Multilingual Plane (BMP, Plane 0)
Simple processing model due to consistent character size
Historically used for SMS and MMS to support international character sets
Backward-compatible base for the UTF-16 encoding scheme
Defined with specific byte order (often Big Endian) in 3GPP transport

Evolution Across Releases

Rel-8 Initial

Specified UCS-2 as a key character encoding format within 3GPP, particularly for services like MMS and PSS. It provided a standardized 16-bit per character method to support a wide range of global languages beyond the limitations of earlier GSM alphabets, enabling basic international text interoperability.

TS 26.234 TS 26.246

Defining Specifications

Specification	Title
TS 26.234	3GPP TS 26.234
TS 26.246	3GPP TS 26.246