STSA

Stepwise Temporal Sub-layer Access

Services →
Introduced in Rel-12

STSA is a 3GPP media streaming technique for DASH that allows a client to progressively access higher temporal video sub-layers for smoother quality transitions during network fluctuations.

Category
Services
Introduced
Rel-12
Where
Services › Codecs
Specifications
1 specs
STSA Description Purpose Related Classification Specifications

Description

Stepwise Temporal Sub-layer Access (STSA) is a feature specified within the 3GPP DASH (Dynamic Adaptive Streaming over HTTP) standards, particularly for content encoded using Scalable Video Coding (SVC). SVC encodes a video stream into multiple layers: a base layer (providing the lowest quality) and one or more enhancement layers (which improve spatial resolution, quality, or temporal frame rate). STSA specifically deals with temporal enhancement layers, which increase the frame rate (e.g., from 15 fps to 30 fps).

In a typical SVC-DASH scenario, a Media Presentation Description (MPD) file describes the available video representations (bitrates, resolutions, frame rates) and their dependency relationships. Without STSA, switching to a representation with a higher temporal layer (higher frame rate) might require the client to download a large segment that includes both the base layer and the new temporal enhancement layer data, which can be inefficient if the user only wants a slight frame rate improvement. STSA addresses this by structuring the media segments so that the data for each temporal sub-layer (e.g., frames that increase the frame rate from 15fps to 30fps) is accessible in a stepwise, incremental fashion.

Operationally, the MPD indicates that certain representations support STSA. When a DASH client decides to adapt upwards to a higher frame rate, it can first request and download only the incremental temporal enhancement sub-layer for the next segment(s), rather than the full high-frame-rate representation. This sub-layer data is then combined with the already downloaded base layer data to reconstruct the higher frame rate video. This reduces the initial bitrate "spike" during an upward switch, leading to a smoother transition, less risk of buffer drain, and a more responsive adaptation to improving network conditions. It allows for finer granularity in quality adaptation, specifically on the temporal axis.

Purpose & Motivation

STSA was created to enhance the efficiency and quality of experience (QoE) for adaptive video streaming, particularly in mobile environments where bandwidth is variable and scarce. Traditional adaptive streaming (using AVC/H.264) requires switching between entirely different encoded bitstreams, which can be inefficient and cause noticeable quality jumps. The adoption of Scalable Video Coding (SVC) promised more efficient switching, but early implementations still had coarse adaptation steps.

The specific problem STSA solves is the inefficient transition to higher temporal resolutions (frame rates). Increasing frame rate significantly improves perceptual quality, especially for sports or action content, but requesting a full high-frame-rate segment consumes substantial bandwidth instantly. In poor or fluctuating network conditions, this could cause buffer depletion and rebuffering. STSA enables a "softer" upgrade path: the client can first upgrade the frame rate incrementally by fetching only the additional temporal data, which is a smaller download. This reduces the adaptation overhead and makes the client more agile and conservative with its buffer, leading to a more stable playback.

Introduced in 3GPP Release 12 as part of the evolving DASH and SVC standards, STSA was motivated by the need for more sophisticated adaptation logic to support high-quality mobile video services like LTE Broadcast (eMBMS) and later 5G media delivery. It allows content providers to encode once with SVC and STSA markers, enabling clients to make smarter, stepwise adaptation decisions, ultimately conserving network resources while maximizing user-perceived video smoothness.

Classification

Related approachesDASH

Evolution Across Releases

Rel-12 Initial

Introduced Stepwise Temporal Sub-layer Access as a new feature for 3GPP DASH. Defined the MPD descriptors and segment formats necessary to signal and deliver temporally scalable content in a stepwise manner, enabling clients to access higher frame rates incrementally.

Enhanced STSA support for more complex codec profiles and integration with other DASH advanced features like content steering and server/network-assisted DASH (SAND), improving its utility in managed network environments.

Further refined STSA for use with immersive media formats and 5G media streaming, ensuring compatibility with high-efficiency video coding (HEVC) and exploring its role in adaptive streaming for 360-degree video and virtual reality applications.

Explored AI/ML-based enhancements for adaptation logic that can leverage STSA structures to predict optimal stepwise transitions, optimizing QoE for new video applications in 5G-Advanced networks.

Explore further

Broader topics and technologies where STSA plays a role.

Defining Specifications

3GPP specifications that define or reference STSA, with the latest known release. Sourced from the 3GPP document catalog — see methodology.

SpecificationTitleRelease
TR 26.906 vj00 HEVC Evaluation for 3GPP Services Rel-19