STSA (Stepwise Temporal Sub-layer Access) — 3GPP Glossary

Stepwise Temporal Sub-layer Access (STSA) is a media streaming technique defined by 3GPP for Dynamic Adaptive Streaming over HTTP (DASH). It allows a streaming client to progressively access higher temporal sub-layers of video content, enabling a smoother quality transition and improved user experience when network conditions fluctuate.

Description

Stepwise Temporal Sub-layer Access (STSA) is a feature specified within the 3GPP DASH (Dynamic Adaptive Streaming over HTTP) standards, particularly for content encoded using Scalable Video Coding (SVC). SVC encodes a video stream into multiple layers: a base layer (providing the lowest quality) and one or more enhancement layers (which improve spatial resolution, quality, or temporal frame rate). STSA specifically deals with temporal enhancement layers, which increase the frame rate (e.g., from 15 fps to 30 fps).

In a typical SVC-DASH scenario, a Media Presentation Description (MPD) file describes the available video representations (bitrates, resolutions, frame rates) and their dependency relationships. Without STSA, switching to a representation with a higher temporal layer (higher frame rate) might require the client to download a large segment that includes both the base layer and the new temporal enhancement layer data, which can be inefficient if the user only wants a slight frame rate improvement. STSA addresses this by structuring the media segments so that the data for each temporal sub-layer (e.g., frames that increase the frame rate from 15fps to 30fps) is accessible in a stepwise, incremental fashion.

Operationally, the MPD indicates that certain representations support STSA. When a DASH client decides to adapt upwards to a higher frame rate, it can first request and download only the incremental temporal enhancement sub-layer for the next segment(s), rather than the full high-frame-rate representation. This sub-layer data is then combined with the already downloaded base layer data to reconstruct the higher frame rate video. This reduces the initial bitrate "spike" during an upward switch, leading to a smoother transition, less risk of buffer drain, and a more responsive adaptation to improving network conditions. It allows for finer granularity in quality adaptation, specifically on the temporal axis.

Purpose & Motivation

STSA was created to enhance the efficiency and quality of experience (QoE) for adaptive video streaming, particularly in mobile environments where bandwidth is variable and scarce. Traditional adaptive streaming (using AVC/H.264) requires switching between entirely different encoded bitstreams, which can be inefficient and cause noticeable quality jumps. The adoption of Scalable Video Coding (SVC) promised more efficient switching, but early implementations still had coarse adaptation steps.

The specific problem STSA solves is the inefficient transition to higher temporal resolutions (frame rates). Increasing frame rate significantly improves perceptual quality, especially for sports or action content, but requesting a full high-frame-rate segment consumes substantial bandwidth instantly. In poor or fluctuating network conditions, this could cause buffer depletion and rebuffering. STSA enables a "softer" upgrade path: the client can first upgrade the frame rate incrementally by fetching only the additional temporal data, which is a smaller download. This reduces the adaptation overhead and makes the client more agile and conservative with its buffer, leading to a more stable playback.

Introduced in 3GPP Release 12 as part of the evolving DASH and SVC standards, STSA was motivated by the need for more sophisticated adaptation logic to support high-quality mobile video services like LTE Broadcast (eMBMS) and later 5G media delivery. It allows content providers to encode once with SVC and STSA markers, enabling clients to make smarter, stepwise adaptation decisions, ultimately conserving network resources while maximizing user-perceived video smoothness.

Key Features

Enables incremental fetching of temporal enhancement layers in SVC-encoded DASH content
Reduces bitrate overhead when switching to a higher frame rate representation
Facilitates smoother quality transitions and more responsive adaptation to improving bandwidth
Requires specific signaling in the DASH Media Presentation Description (MPD)
Improves client buffer management by allowing finer-grained adaptation steps
Optimized for mobile streaming scenarios with fluctuating network conditions

Evolution Across Releases

Rel-12 Initial

Introduced Stepwise Temporal Sub-layer Access as a new feature for 3GPP DASH. Defined the MPD descriptors and segment formats necessary to signal and deliver temporally scalable content in a stepwise manner, enabling clients to access higher frame rates incrementally.

TS 26.906

Rel-14

Enhanced STSA support for more complex codec profiles and integration with other DASH advanced features like content steering and server/network-assisted DASH (SAND), improving its utility in managed network environments.

TS 26.906

Rel-16

Further refined STSA for use with immersive media formats and 5G media streaming, ensuring compatibility with high-efficiency video coding (HEVC) and exploring its role in adaptive streaming for 360-degree video and virtual reality applications.

TS 26.906

Rel-18

Explored AI/ML-based enhancements for adaptation logic that can leverage STSA structures to predict optimal stepwise transitions, optimizing QoE for new video applications in 5G-Advanced networks.

TS 26.906

Defining Specifications

Specification	Title
TS 26.906	3GPP TS 26.906