RAS (Reliability, Availability and Survivability) — 3GPP Glossary

A set of principles and requirements for ensuring network services remain operational and resilient against failures, disasters, or attacks. It encompasses design, operational, and architectural measures to maintain service continuity, minimize downtime, and recover quickly from disruptions in 3GPP networks.

Description

Reliability, Availability and Survivability (RAS) is a comprehensive framework within 3GPP standards that defines the requirements and mechanisms for maintaining network service integrity under various failure conditions. It is addressed across multiple technical specifications, including those for network management (32-series) and radio access (38-series). RAS is not a single protocol but a collection of design goals, architectural features, and operational procedures that ensure the network meets stringent service level agreements (SLAs). Reliability refers to the probability that a system performs its intended functions without failure over a specified period. Availability is the proportion of time a system is in an operable state, often measured as a percentage (e.g., 99.999% or 'five nines'). Survivability is the capability of a network to maintain service continuity during and after major failures, such as natural disasters, hardware faults, or cyber-attacks.

Architecturally, RAS is implemented through redundancy, fault tolerance, and self-healing mechanisms at multiple network layers. In the core network, this includes geographically distributed data centers, redundant network functions (e.g., dual-homed MMEs, SGWs, PGWs), and stateful failover procedures. For the Radio Access Network (RAN), RAS involves features like dual connectivity, robust fronthaul/backhaul links, and base station resilience. Key components include redundancy managers, fault detection systems (using protocols like OAM), and automated recovery scripts. The framework also encompasses disaster recovery plans, where backup sites can take over operations if primary sites are compromised.

Operationally, RAS is managed through the Operations, Administration and Maintenance (OAM) system, which continuously monitors network health using performance and fault management data. When a failure is detected, the system triggers predefined mitigation actions, such as rerouting traffic, restarting software instances, or switching to backup hardware. In 5G networks, RAS principles are extended through cloud-native designs using microservices and container orchestration (e.g., Kubernetes), which enable rapid scaling and healing. The framework also includes requirements for regular testing, such as disaster recovery drills and resilience audits, to ensure that RAS mechanisms function as intended under real failure scenarios.

Purpose & Motivation

RAS requirements were formalized in 3GPP to address the critical need for telecom networks to be highly dependable, especially as society becomes increasingly reliant on mobile connectivity for essential services. Prior to explicit RAS standards, network resilience was often vendor-specific or implemented ad-hoc, leading to inconsistent service quality during failures. The framework provides a unified set of objectives and methods to design and operate networks that can withstand faults, disasters, and attacks while maintaining acceptable service levels.

The motivation stems from the evolution of mobile networks into critical infrastructure supporting emergency communications, financial transactions, healthcare, and industrial automation. These applications demand near-perfect availability and rapid recovery from disruptions. RAS solves problems like single points of failure, prolonged outage times, and inadequate disaster preparedness. By embedding RAS principles into network architecture from the start, operators can reduce downtime, improve customer satisfaction, and meet regulatory obligations for service continuity.

Historically, RAS gained prominence with the introduction of all-IP networks in Release 8, which introduced new failure modes compared to traditional circuit-switched systems. It was further emphasized in 5G (Release 15 onward) due to the support for ultra-reliable low-latency communications (URLLC) and network slicing, where specific slices (e.g., for public safety) require exceptional resilience. The framework addresses limitations of earlier networks by mandating systematic approaches to redundancy, automated fault management, and geographic dispersion, ensuring that modern 3GPP networks can deliver the robustness expected of foundational digital infrastructure.

Key Features

Defines targets for reliability (e.g., MTBF), availability (e.g., uptime percentage), and survivability metrics
Architectural redundancy at node, link, and site levels (N+k redundancy, geo-redundancy)
Automated fault detection, isolation, and recovery (self-healing) mechanisms
Integration with OAM systems for continuous monitoring and proactive maintenance
Disaster recovery plans including backup site activation and data replication
Support for service continuity during maintenance and software upgrades

Evolution Across Releases

Rel-8 Initial

Introduced foundational RAS requirements for EPS/LTE networks, focusing on core network resilience and basic redundancy schemes. Defined initial availability targets and fault management procedures within OAM frameworks, addressing the transition to all-IP architecture.

TS 32.808 TS 36.755 TS 38.807 TS 38.820 TS 38.860 TS 38.892

Rel-15

Enhanced RAS for 5G networks, incorporating cloud-native principles and network slicing. Introduced requirements for ultra-reliable services (URLLC) and automated resilience in software-defined infrastructure. Expanded survivability scenarios to include edge computing failures.

TS 32.808 TS 36.755 TS 38.807 TS 38.820 TS 38.860 TS 38.892

Rel-16

Strengthened RAS for industrial IoT and vertical applications, adding specifications for time-sensitive communication survivability. Defined enhanced redundancy models for 5G RAN, including integrated access and backhaul (IAB) resilience.

TS 32.808 TS 36.755 TS 38.807 TS 38.820 TS 38.860 TS 38.892

Rel-17

Further refined RAS for non-terrestrial networks (NTN) and aerial vehicles, addressing unique failure modes in satellite and airborne systems. Introduced metrics for service continuity in mobility scenarios across heterogeneous networks.

TS 32.808 TS 36.755 TS 38.807 TS 38.820 TS 38.860 TS 38.892

Rel-18

Extended RAS frameworks to support AI-driven network management for predictive fault detection and recovery. Added requirements for energy efficiency and resilience in green networks, ensuring RAS objectives align with sustainability goals.

TS 32.808 TS 36.755 TS 38.807 TS 38.820 TS 38.860 TS 38.892

Defining Specifications

Specification	Title
TS 32.808	3GPP TR 32.808
TS 36.755	3GPP TR 36.755
TS 38.807	3GPP TR 38.807
TS 38.820	3GPP TR 38.820
TS 38.860	3GPP TR 38.860
TS 38.892	3GPP TR 38.892