3GPP TS 46.060 Enhanced Full Rate (EFR) speech transcoding
Specification: 46060
Summary
This document describes the detailed mapping between input blocks of 160 speech samples in 13-bit uniform PCM format to encoded blocks of 244 bits and from encoded blocks of 244 bits to output blocks of 160 reconstructed speech samples within the digital cellular telecommunications system.
Specification Intelligence
This is a Technical Document in the Unknown Series series, focusing on Technical Document. The document is currently in approved by tsg and under change control and is under formal change control.
Classification
Specifics
Version
Full Document v800
3GPP TS 46.060 V8.0.0 (2008-12) |
Technical Specification |
3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Enhanced Full Rate (EFR) speech transcoding (Release 8)
|
|
The present document has been developed within the 3rd
Generation Partnership Project (3GPP TM) and may be further
elaborated for the purposes of 3GPP.    |
|
Keywords GSM, speech, codec |
3GPP Postal address
3GPP support office address 650 Route des Lucioles - Sophia Antipolis Valbonne - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Internet http://www.3gpp.org |
Contents
Foreword................................................................................................................................................ 4
1....... Scope........................................................................................................................................... 5
2....... References.................................................................................................................................... 5
3....... Definitions, symbols and abbreviations........................................................................................... 6
3.1......... Definitions............................................................................................................................................................................ 6
3.2......... Symbols................................................................................................................................................................................ 7
3.3......... Abbreviations.................................................................................................................................................................... 11
4....... Outline description...................................................................................................................... 11
4.1......... Functional description of audio parts............................................................................................................................ 11
4.2......... Preparation of speech samples........................................................................................................................................ 12
4.2.1........... PCM format conversion............................................................................................................................................ 12
4.3......... Principles of the GSM enhanced full rate speech encoder........................................................................................ 12
4.4......... Principles of the GSM enhanced full rate speech decoder........................................................................................ 14
4.5......... Sequence and subjective importance of encoded parameters................................................................................... 14
5....... Functional description of the encoder............................................................................................ 14
5.1......... Preâprocessing................................................................................................................................................................... 14
5.2......... Linear prediction analysis and quantization................................................................................................................. 15
5.2.1........... Windowing and autoâcorrelation computation...................................................................................................... 15
5.2.2........... LevinsonâDurbin algorithm...................................................................................................................................... 16
5.2.3........... LP to LSP conversion................................................................................................................................................ 17
5.2.4........... LSP to LP conversion................................................................................................................................................ 18
5.2.5........... Quantization of the LSP coefficients...................................................................................................................... 19
5.2.6........... Interpolation of the LSPs.......................................................................................................................................... 20
5.3......... Openâloop pitch analysis................................................................................................................................................. 20
5.4......... Impulse response computation....................................................................................................................................... 21
5.5......... Target signal computation............................................................................................................................................... 21
5.6......... Adaptive codebook search.............................................................................................................................................. 22
5.7......... Algebraic codebook structure and search..................................................................................................................... 23
5.8......... Quantization of the fixed codebook gain...................................................................................................................... 26
5.9......... Memory update................................................................................................................................................................. 27
6....... Functional description of the decoder............................................................................................ 27
6.1......... Decoding and speech synthesis...................................................................................................................................... 27
6.2......... Postâprocessing................................................................................................................................................................. 29
6.2.1........... Adaptive postâfiltering............................................................................................................................................... 29
6.2.2........... Upâscaling.................................................................................................................................................................... 30
7....... Variables, constants and tables in the Câcode of the GSM EFR codec............................................. 30
7.1......... Description of the constants and variables used in the C code................................................................................. 30
8....... Homing sequences....................................................................................................................... 33
8.1......... Functional description...................................................................................................................................................... 33
8.2......... Definitions.......................................................................................................................................................................... 33
8.3......... Encoder homing................................................................................................................................................................ 35
8.4......... Decoder homing................................................................................................................................................................ 35
8.5......... Encoder home state........................................................................................................................................................... 36
8.6......... Decoder home state.......................................................................................................................................................... 37
9....... Bibliography............................................................................................................................... 42
Annex A (informative):....... Change history....................................................................................... 43
This Technical Specification has been produced by the 3rd Generation Partnership Project (3GPP).
The present document describes the detailed mapping between input blocks of 160 speech samples in 13âbit uniform PCM format to encoded blocks of 244 bits and from encoded blocks of 244 bits to output blocks of 160 reconstructed speech samples within the digital cellular telecommunications system.
The contents of the present document are subject to continuing work within the TSG and may change following formal TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an identifying change of release date and an increase in version number as follows:
Version x.y.z
where:
x   the first digit:
1Â Â Â presented to TSG for information;
2Â Â Â presented to TSG for approval;
3Â Â Â or greater indicates TSG approved document under change control.
y   the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, updates, etc.
z   the third digit is incremented when editorial only changes have been incorporated in the document.
The present document describes the detailed mapping between input blocks of 160 speech samples in 13âbit uniform PCM format to encoded blocks of 244 bits and from encoded blocks of 244 bits to output blocks of 160 reconstructed speech samples. The sampling rate is 8 000 sample/s leading to a bit rate for the encoded bit stream of 12,2 kbit/s. The coding scheme is the soâcalled Algebraic Code Excited Linear Prediction Coder, hereafter referred to as ACELP.
The present document also specifies the conversion between Aâlaw or m-law (PCS 1900) PCM and 13âbit uniform PCM. Performance requirements for the audio input and output parts are included only to the extent that they affect the transcoder performance. This part also describes the codec down to the bit level, thus enabling the verification of compliance to the part to a high degree of confidence by use of a set of digital test sequences. These test sequences are described in GSM 06.54 [7] and are available on disks.
In case of discrepancy between the requirements described in the present document and the fixed point computational description (ANSIâC code) of these requirements contained in GSM 06.53 [6], the description in GSM 06.53 [6] will prevail.
The transcoding procedure specified in the present document is applicable for the enhanced full rate speech traffic channel (TCH) in the GSM system.
In GSM 06.51 [5], a reference configuration for the speech transmission chain of the GSM enhanced full rate (EFR) system is shown. According to this reference configuration, the speech encoder takes its input as a 13âbit uniform PCM signal either from the audio part of the Mobile Station or on the network side, from the PSTN via an 8âbit/Aâlaw or m-law (PCS 1900) to 13âbit uniform PCM conversion. The encoded speech at the output of the speech encoder is delivered to a channel encoder unit which is specified in GSM 05.03 [3]. In the receive direction, the inverse operations take place.
The following documents contain provisions which, through reference in this text, constitute provisions of the present document.
· References are either specific (identified by date of publication, edition number, version number, etc.) or nonâspecific.
· For a specific reference, subsequent revisions do not apply.
· For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document.
[1]Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â GSM 01.04: "Digital cellular telecommunications system (Phase 2+); Abbreviations and acronyms".
[2]Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â GSM 03.50: "Digital cellular telecommunications system (Phase 2+); Transmission planning aspects of the speech service in the GSM Public Land Mobile Network (PLMN) system".
[3]Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â GSM 05.03: "Digital cellular telecommunications system (Phase 2+); Channel coding".
[4]Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â GSM 06.32: "Digital cellular telecommunications system (Phase 2+); Voice Activity Detection (VAD)".
[5]Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â GSM 06.51: "Digital cellular telecommunications system (Phase 2+); Enhanced Full Rate (EFR) speech processing functions General description".
[6]Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â GSM 06.53: "Digital cellular telecommunications system (Phase 2+); ANSIâC code for the GSM Enhanced Full Rate (EFR) speech codec".
[7]Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â GSM 06.54: "Digital cellular telecommunications system (Phase 2+); Test vectors for the GSM Enhanced Full Rate (EFR) speech codec".
[8]Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ITUâT Recommendation G.711 (1988): "Coding of analogue signals by pulse code modulation Pulse code modulation (PCM) of voice frequencies".
[9]Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ITUâT Recommendation G.726: "40, 32, 24, 16 kbit/s adaptive differential pulse code modulation (ADPCM)".
3.1Â Â Â Â Â Â Â Definitions
For the purposes of the present document, the following terms and definitions apply:
adaptive codebook: adaptive codebook contains excitation vectors that are adapted for every subframe. The adaptive codebook is derived from the long term filter state. The lag value can be viewed as an index into the adaptive codebook.
adaptive postfilter: this filter is applied to the output of the short term synthesis filter to enhance the perceptual quality of the reconstructed speech. In the GSM enhanced full rate codec, the adaptive postfilter is a cascade of two filters: a formant postfilter and a tilt compensation filter.
algebraic codebook: fixed codebook where algebraic code is used to populate the excitation vectors (innovation vectors).The excitation contains a small number of nonzero pulses with predefined interlaced sets of positions.
closedâloop pitch analysis: this is the adaptive codebook search, i.e., a process of estimating the pitch (lag) value from the weighted input speech and the long term filter state. In the closedâloop search, the lag is searched using error minimization loop (analysisâbyâsynthesis). In the GSM enhanced full rate codec, closedâloop pitch search is performed for every subframe.
direct form coefficients: one of the formats for storing the short term filter parameters. In the GSM enhanced full rate codec, all filters which are used to modify speech samples use direct form coefficients.
fixed codebook: fixed codebook contains excitation vectors for speech synthesis filters. The contents of the codebook are nonâadaptive (i.e., fixed). In the GSM enhanced full rate codec, the fixed codebook is implemented using an algebraic codebook.
fractional lags: set of lag values having subâsample resolution. In the GSM enhanced full rate codec a subâsample resolution of 1/6th of a sample is used.
frame: time interval equal to 20 ms (160 samples at an 8 kHz sampling rate).
integer lags: set of lag values having whole sample resolution.
interpolating filter: FIR filter used to produce an estimate of subâsample resolution samples, given an input sampled with integer sample resolution.
inverse filter: this filter removes the short term correlation from the speech signal. The filter models an inverse frequency response of the vocal tract.
lag: long term filter delay. This is typically the true pitch period, or a multiple or subâmultiple of it.
Line Spectral Frequencies: (see Line Spectral Pair).
Line Spectral Pair: transformation of LPC parameters. Line Spectral Pairs are obtained by decomposing the inverse filter transfer function A(z) to a set of two transfer functions, one having even symmetry and the other having odd symmetry. The Line Spectral Pairs (also called as Line Spectral Frequencies) are the roots of these polynomials on the z-unit circle).
LP analysis window: for each frame, the short term filter coefficients are computed using the high pass filtered speech samples within the analysis window. In the GSM enhanced full rate codec, the length of the analysis window is 240 samples. For each frame, two asymmetric windows are used to generate two sets of LP coefficients. No samples of the future frames are used (no lookahead).
LP coefficients: Linear Prediction (LP) coefficients (also referred as Linear Predictive Coding (LPC) coefficients) is a generic descriptive term for describing the short term filter coefficients.
openâloop pitch search: process of estimating the near optimal lag directly from the weighted speech input. This is done to simplify the pitch analysis and confine the closedâloop pitch search to a small number of lags around the openâloop estimated lags. In the GSM enhanced full rate codec, openâloop pitch search is performed every 10 ms.
residual: output signal resulting from an inverse filtering operation.
short term synthesis filter: this filter introduces, into the excitation signal, short term correlation which models the impulse response of the vocal tract.
perceptual weighting filter: this filter is employed in the analysisâbyâsynthesis search of the codebooks. The filter exploits the noise masking properties of the formants (vocal tract resonances) by weighting the error less in regions near the formant frequencies and more in regions away from them.
subframe: time interval equal to 5 ms (40 samples at an 8 kHz sampling rate).
vector quantization: method of grouping several parameters into a vector and quantizing them simultaneously.
zero input response: output of a filter due to past inputs, i.e. due to the present state of the filter, given that an input of zeros is applied.
zero state response: output of a filter due to the present input, given that no past inputs have been applied, i.e., given the state information in the filter is all zeroes.
3.2Â Â Â Â Â Â Â Symbols
For the purposes of the present document, the following symbols apply:
                   The inverse filter with unquantized coefficients
                   The inverse filter with quantified coefficients
  The speech
synthesis filter with quantified coefficients
                         The unquantized linear prediction parameters (direct form
coefficients)
                         The
quantified linear prediction parameters
                         The order of the
LP model
                  The
longâterm synthesis filter
                   The perceptual weighting filter (unquantized coefficients)
                 The
perceptual weighting factors
                 Adaptive
preâfilter
                         The
nearest integer pitch lag to the closedâloop fractional pitch lag of the
subframe
                          The
adaptive preâfilter coefficient (the quantified pitch gain)
  The formant
postfilter
                       Control
coefficient for the amount of the formant postâfiltering
                        Control coefficient for the amount of the formant postâfiltering
                 Tilt
compensation filter
                         Control coefficient for the amount of the tilt compensation
filtering
         A tilt
factor, with
being the first reflection
coefficient
                 The
truncated impulse response of the formant postfilter
                       The
length of
                   The
autoâcorrelations of
          The
inverse filter (numerator) part of the formant postfilter
    The synthesis
filter (denominator) part of the formant postfilter
                    The
residual signal of the inverse filter
                  Impulse
response of the tilt compensation filter
               The
AGCâcontrolled gain scaling factor of the adaptive postfilter
                         The
AGC factor of the adaptive postfilter
                Preâprocessing
highâpass filter
,
     LP analysis windows
                   Length of the first
part of the LP analysis window
                  Length of the second
part of the LP analysis window
                  Length of the first part
of the LP analysis window
                 Length of the second part
of the LP analysis window
                The
autoâcorrelations of the windowed speech
                Lag
window for the autoâcorrelations (60 Hz bandwidth expansion)
                        The bandwidth
expansion in Hz
                         The
sampling frequency in Hz
             The modified
(bandwidth expanded) autoâcorrelations
              The
prediction error in the ith iteration of the Levinson algorithm
                         The
ith reflection coefficient
                     The
jth direct form coefficient in the ith iteration of the Levinson
algorithm
                Symmetric
LSF polynomial
                Antisymmetric LSF
polynomial
                 Polynomial
 with root
 eliminated
                 Polynomial
 with root
 eliminated
                         The line spectral
pairs (LSPs) in the cosine domain
                          An
LSP vector in the cosine domain
                     The
quantified LSP vector at the ith subframe of the frame n
                        The line spectral
frequencies (LSFs)
                 A
th order Chebyshev polynomial
       The
coefficients of the polynomials
and
     The
coefficients of the polynomials
 and
                    The
coefficients of either
 or
                 Sum
polynomial of the Chebyshev polynomials
                          Cosine
of angular frequency
                       Recursion
coefficients for the Chebyshev polynomial evaluation
                       The
line spectral frequencies (LSFs) in Hz
The vector
representation of the LSFs in Hz
,
  The meanâremoved LSF vectors at
frame n
,
    The LSF prediction residual
vectors at frame n
                    The
predicted LSF vector at frame n
      The
quantified second residual vector at the past frame
                        The
quantified LSF vector at quantization index k
                  The
LSP quantization error
LSPâquantization
weighting factors
                       The
distance between the line spectral frequencies
 and
                  The
impulse response of the weighted synthesis filter
                     The
correlation maximum of openâloop pitch analysis at delay k
  The correlation
maxima at delays
The normalized
correlation maxima
 and the corresponding
delays
 The weighted
synthesis filter
         The
numerator of the perceptual weighting filter
   The
denominator of the perceptual weighting filter
                         The
nearest integer to the fractional pitch lag of the previous (1st or 3rd)
subframe
                   The
windowed speech signal
                The
weighted speech signal
                   Reconstructed
speech signal
                  The
gainâscaled postâfiltered signal
               Postâfiltered
speech signal (before scaling)
                  The
target signal for adaptive codebook search
,
        The target signal for algebraic codebook search
          The LP
residual signal
                   The
fixed codebook vector
                  The
adaptive codebook vector
   The filtered
adaptive codebook vector
                 The
past filtered excitation
                   The
excitation signal
                    The
emphasized adaptive codebook vector
                  The
gainâscaled emphasized excitation signal
                      The
best openâloop lag
                    Minimum
lag search value
                   Maximum
lag search value
                  Correlation
term to be maximized in the adaptive codebook search
                      The
FIR filter for interpolating the normalized correlation term
                The
interpolated value of
 for the integer delay k
and fraction t
                      The
FIR filter for interpolating the past excitation signal
 to
yield the adaptive codebook vector
                      Correlation
term to be maximized in the algebraic codebook search at index k
                       The
correlation in the numerator of
 at index k
                    The
energy in the denominator of
 at index k
         The
correlation between the target signal
 and
the impulse response
, i.e., backward filtered
target
                        The
lower triangular Toepliz convolution matrix with diagonal
 and lower diagonals
        The
matrix of correlations of
                   The
elements of the vector d
                The
elements of the symmetric matrix
                        The
innovation vector
                        The
correlation in the numerator of
                        The position of the i th pulse
                        The
amplitude of the i th pulse
                      The
number of pulses in the fixed codebook excitation
                     The
energy in the denominator of
        The
normalized longâterm prediction residual
                   The
sum of the normalized
 vector and normalized longâterm
prediction residual
                 The
sign signal for the algebraic codebook search
                Sign extended backward
filtered target
              The
modified elements of the matrix
, including sign
information
,
           The fixed codebook vector
convolved with
                 The
meanâremoved innovation energy (in dB)
                         The
mean of the innovation energy
                 The
predicted energy
   The MA
prediction coefficients
                 The
quantified prediction error at subframe k
                      The
mean innovation energy
                  The
prediction error of the fixedâcodebook gain quantization
                     The
quantization error of the fixedâcodebook gain quantization
                    The
states of the synthesis filter
                The
perceptually weighted error of the analysisâbyâsynthesis search
                         The
gain scaling factor for the emphasized excitation
                        The
fixedâcodebook gain
                        The
predicted fixedâcodebook gain
                         The quantified fixed codebook gain
                        The
adaptive codebook gain
                       The
quantified adaptive codebook gain
 A correction
factor between the gain
 and the estimated
one
                    The
optimum value for
                      Gain
scaling factor
3.3Â Â Â Â Â Â Â Abbreviations
For the purposes of the present document, the following abbreviations apply. Further GSM related abbreviations may be found in GSM 01.04 [1].
ACELPÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Algebraic Code Excited Linear Prediction
AGCÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Adaptive Gain Control
CELPÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Code Excited Linear Prediction
FIRÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Finite Impulse Response
ISPPÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Interleaved SingleâPulse Permutation
LPÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Linear Prediction
LPCÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Linear Predictive Coding
LSFÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Line Spectral Frequency
LSPÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Line Spectral Pair
LTPÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Long Term Predictor (or Long Term Prediction)
MAÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Moving Average
The present document is structured as follows.
Subclause 4.1 contains a functional description of the audio parts including the A/D and D/A functions. Subclause 4.2 describes the conversion between 13âbit uniform and 8âbit Aâlaw or m-law (PCS 1900) samples. Subclauses 4.3 and 4.4 present a simplified description of the principles of the GSM EFR encoding and decoding process respectively. In clause 4.5, the sequence and subjective importance of encoded parameters are given.
Clause 5 presents the functional description of the GSM EFR encoding, whereas clause 6 describes the decoding procedures. Clause 7 describes variables, constants and tables of the Câcode of the GSM EFR codec.
4.1Â Â Â Â Â Â Â Functional description of audio parts
The analogueâtoâdigital and digitalâtoâanalogue conversion will in principle comprise the following elements:
1)Â Â analogue to uniform digital PCM:
-Â Â Â Â microphone;
â    input level adjustment device;
â    input antiâaliasing filter;
â    sampleâhold device sampling at 8 kHz;
â    analogueâtoâuniform digital conversion to 13âbit representation.
     The uniform format shall be represented in two's complement.
2)Â Â uniform digital PCM to analogue:
â    conversion from 13âbit/8 kHz uniform PCM to analogue;
â    a hold device;
â    reconstruction filter including x/sin( x ) correction;
â    output level adjustment device;
â    earphone or loudspeaker.
     In the terminal equipment, the A/D function may be achieved either:
â    by direct conversion to 13âbit uniform PCM format;
â    or by conversion to 8âbit/Aâlaw or m-law (PCS 1900) compounded format, based on a standard Aâlaw or m-law (PCS 1900) codec/filter according to ITUâT Recommendations G.711 [8] and G.714, followed by the 8âbit to 13âbit conversion as specified in clause 4.2.1.
For the D/A operation, the inverse operations take place.
In the latter case it should be noted that the specifications in ITUâT G.714 (superseded by G.712) are concerned with PCM equipment located in the central parts of the network. When used in the terminal equipment, the present document does not on its own ensure sufficient outâofâband attenuation. The specification of outâofâband signals is defined in GSM 03.50 [2] in clause 2.
4.2Â Â Â Â Â Â Â Preparation of speech samples
The encoder is fed with data comprising of samples with a resolution of 13 bits left justified in a 16âbit word. The three least significant bits are set to '0'. The decoder outputs data in the same format. Outside the speech codec further processing must be applied if the traffic data occurs in a different representation.
4.2.1Â Â Â Â Â Â PCM format conversion
The conversion between 8âbit AâLaw or m-law (PCS 1900) compressed data and linear data with 13âbit resolution at the speech encoder input shall be as defined in ITUâT Rec. G.711 [8].
ITUâT Recommendation G.711 [8] specifies the AâLaw or m-law (PCS 1900) to linear conversion and vice versa by providing table entries. Examples on how to perform the conversion by fixedâpoint arithmetic can be found in ITUâT Recommendation G.726 [9]. Subclause 4.2.1 of G.726 [9] describes AâLaw and m-law (PCS 1900) to linear expansion and clause 4.2.7 of G.726 [9] provides a solution for linear to AâLaw and m-law (PCS 1900) compression.
4.3Â Â Â Â Â Â Â Principles of the GSM enhanced full rate speech encoder
The codec is based on the codeâexcited linear predictive (CELP) coding model. A 10th order linear prediction (LP), or shortâterm, synthesis filter is used which is given by:
                                                                                                                           (1)
where  are
the (quantified) linear prediction (LP) parameters, and
 is
the predictor order. The longâterm, or pitch, synthesis filter is given by:
                                                                                                                                                         (2)
where  is
the pitch delay and
 is the pitch gain. The
pitch synthesis filter is implemented using the soâcalled adaptive codebook
approach.
The CELP speech synthesis model is shown in figure 2. In this model, the excitation signal at the input of the shortâterm LP synthesis filter is constructed by adding two excitation vectors from adaptive and fixed (innovative) codebooks. The speech is synthesized by feeding the two properly chosen vectors from these codebooks through the shortâterm synthesis filter. The optimum excitation sequence in a codebook is chosen using an analysisâbyâsynthesis search procedure in which the error between the original and synthesized speech is minimized according to a perceptually weighted distortion measure.
The perceptual weighting filter used in the analysisâbyâsynthesis search technique is given by:
                                                                                                                                                           (3)
where  is the unquantized LP filter and
 are the perceptual weighting
factors. The values
 and
 are used. The weighting filter uses
the unquantized LP parameters while the formant synthesis filter uses the
quantified ones.
The coder operates on speech frames of 20 ms corresponding to 160 samples at the sampling frequency of 8 000 sample/s. At each 160 speech samples, the speech signal is analysed to extract the parameters of the CELP model (LP filter coefficients, adaptive and fixed codebooks' indices and gains). These parameters are encoded and transmitted. At the decoder, these parameters are decoded and speech is synthesized by filtering the reconstructed excitation signal through the LP synthesis filter.
The signal flow at the encoder is shown in figure 3. LP analysis is performed twice per frame. The two sets of LP parameters are converted to line spectrum pairs (LSP) and jointly quantified using split matrix quantization (SMQ) with 38 bits. The speech frame is divided into 4 subframes of 5 ms each (40 samples). The adaptive and fixed codebook parameters are transmitted every subframe. The two sets of quantified and unquantized LP filters are used for the second and fourth subframes while in the first and third subframes interpolated LP filters are used (both quantified and unquantized). An openâloop pitch lag is estimated twice per frame (every 10 ms) based on the perceptually weighted speech signal.
Then the following operations are repeated for each subframe:
     The target signal  is
computed by filtering the LP residual through the weighted synthesis filter
 with the initial states of the
filters having been updated by filtering the error between LP residual and
excitation (this is equivalent to the common approach of subtracting the zero
input response of the weighted synthesis filter from the weighted speech
signal).
     The impulse response,  of the weighted synthesis filter is
computed.
     Closedâloop pitch analysis is then performed
(to find the pitch lag and gain), using the target  and
impulse response
, by searching around the
openâloop pitch lag. Fractional pitch with 1/6th of a sample resolution is
used. The pitch lag is encoded with 9 bits in the first and third
subframes and relatively encoded with 6 bits in the second and fourth
subframes.
     The target signal  is
updated by removing the adaptive codebook contribution (filtered adaptive
codevector), and this new target,
, is used in the
fixed algebraic codebook search (to find the optimum innovation). An algebraic
codebook with 35 bits is used for the innovative excitation.
     The gains of the adaptive and fixed codebook are scalar quantified with 4 and 5 bits respectively (with moving average (MA) prediction applied to the fixed codebook gain).
     Finally, the filter memories are updated (using the determined excitation signal) for finding the target signal in the next subframe.
The bit allocation of the codec is shown in table 1. In each 20 ms speech frame, 244 bits are produced, corresponding to a bit rate of 12.2 kbit/s. More detailed bit allocation is available in table 6. Note that the most significant bits (MSB) are always sent first.
Table 1: Bit allocation of the 12.2 kbit/s coding algorithm for 20 ms frame
Parameter |
1st & 3rd subframes |
2nd & 4th subframes |
total per frame |
2 LSP sets |
|
|
38 |
|
|
|
|
Pitch delay |
9 |
6 |
30 |
Pitch gain |
4 |
4 |
16 |
Algebraic code |
35 |
35 |
140 |
Codebook gain |
5 |
5 |
20 |
Total |
|
|
244 |
4.4Â Â Â Â Â Â Â Principles of the GSM enhanced full rate speech decoder
The signal flow at the decoder is shown in figure 4. At the decoder, the transmitted indices are extracted from the received bitstream. The indices are decoded to obtain the coder parameters at each transmission frame. These parameters are the two LSP vectors, the 4 fractional pitch lags, the 4 innovative codevectors, and the 4 sets of pitch and innovative gains. The LSP vectors are converted to the LP filter coefficients and interpolated to obtain LP filters at each subframe. Then, at each 40âsample subframe:
â    the excitation is constructed by adding the adaptive and innovative codevectors scaled by their respective gains;
â    the speech is reconstructed by filtering the excitation through the LP synthesis filter.
Finally, the reconstructed speech signal is passed through an adaptive postfilter.
4.5Â Â Â Â Â Â Â Sequence and subjective importance of encoded parameters
The encoder will produce the output information in a unique sequence and format, and the decoder must receive the same information in the same way. In table 6, the sequence of output bits s1 to s244 and the bit allocation for each parameter is shown.
The different parameters of the encoded speech and their individual bits have unequal importance with respect to subjective quality. Before being submitted to the channel encoding function the bits have to be rearranged in the sequence of importance as given in table 6 in 05.03 [3].
In this clause, the different functions of the encoder represented in figure 3 are described.
5.1Â Â Â Â Â Â Â Preâprocessing
Two preâprocessing functions are applied prior to the encoding process: highâpass filtering and signal downâscaling.
Downâscaling consists of dividing the input by a factor of 2 to reduce the possibility of overflows in the fixedâpoint implementation.
The highâpass filter serves as a precaution against undesired low frequency components. A filter with a cut off frequency of 80 Hz is used, and it is given by:
                                                                             (4)
Downâscaling and highâpass filtering are
combined by dividing the coefficients at the numerator of  by 2.
5.2Â Â Â Â Â Â Â Linear prediction analysis and quantization
Shortâterm prediction, or linear prediction (LP), analysis is performed twice per speech frame using the autoâcorrelation approach with 30 ms asymmetric windows. No lookahead is used in the autoâcorrelation computation.
The autoâcorrelations of windowed speech are converted to the LP coefficients using the LevinsonâDurbin algorithm. Then the LP coefficients are transformed to the Line Spectral Pair (LSP) domain for quantization and interpolation purposes. The interpolated quantified and unquantized filter coefficients are converted back to the LP filter coefficients (to construct the synthesis and weighting filters at each subframe).
5.2.1Â Â Â Â Â Â Windowing and autoâcorrelation computation
LP analysis is performed twice per frame using two different asymmetric windows. The first window has its weight concentrated at the second subframe and it consists of two halves of Hamming windows with different sizes. The window is given by:
                                    (5)
The values  and
 are used. The second window has its
weight concentrated at the fourth subframe and it consists of two parts: the
first part is half a Hamming window and the second part is a quarter of a
cosine function cycle. The window is given by:
                                    (6)
where the values  and
 are used.
Note that both LP analyses are performed on the same set of speech samples. The windows are applied to 80 samples from past speech frame in addition to the 160 samples of the present speech frame. No samples from future frames are used (no lookahead). A diagram of the two LP analysis windows is depicted below.
Figure 1: LP analysis windows
The autoâcorrelations of the windowed
speech , are computed by:
                                                                                                       (7)
and a 60 Hz bandwidth expansion is used by lag windowing the autoâcorrelations using the window:
                                                                                               (8)
where  Hz
is the bandwidth expansion and
 Hz is the
sampling frequency. Further,
 is multiplied by
the white noise correction factor 1.0001 which is equivalent to adding a noise
floor at â40 dB.
5.2.2Â Â Â Â Â Â LevinsonâDurbin algorithm
The modified
autoâcorrelations  and
 are used to obtain the direct form
LP filter coefficients
 by solving the set of
equations.
                                                                                                         (9)
The set of equations in (9) is solved using the LevinsonâDurbin algorithm. This algorithm uses the following recursion:
The final solution is given as .
The LP filter coefficients are converted to the line spectral pair (LSP) representation for quantization and interpolation purposes. The conversions to the LSP domain and back to the LP filter coefficient domain are described in the next clause.
5.2.3Â Â Â Â Â Â LP to LSP conversion
The LP filter coefficients , are converted to the line spectral
pair (LSP) representation for quantization and interpolation purposes. For a
10th order LP filter, the LSPs are defined as the roots of the sum and
difference polynomials:
                                                                                                                                  (10)
and
                                                                    ,                                                             (11)
respectively. The polynomial  and
 are
symmetric and antiâsymmetric, respectively. It can be proven that all roots of
these polynomials are on the unit circle and they alternate each other.
 has a root
 and
 has a root
.
To eliminate these two roots, we define the new polynomials:
                                                                                                                                        (12)
and
                                                                                                                                      (13)
Each
polynomial has 5 conjugate roots on the unit circle ,
therefore, the polynomials can be written as
                                                                                                                          (14)
and
                                                                ,                                                         (15)
where  with
 being the line spectral frequencies
(LSF) and they satisfy the ordering property
.
We refer to
 as the LSPs in the cosine domain.
Since both polynomials  and
 are
symmetric only the first 5 coefficients of each polynomial need to be computed.
The coefficients of these polynomials are found by the recursive relations (for
 to 4):
                                                                                                                     (16)
where  is
the predictor order.
The LSPs are found by evaluating the
polynomials  and
 at
60 points equally spaced between 0 and
 and
checking for sign changes. A sign change signifies the existence of a root and
the sign change interval is then divided 4 times to better track the root. The
Chebyshev polynomials are used to evaluate
 and
. In this method the roots are found
directly in the cosine domain
. The polynomials
 or
 evaluated
at
 can be written as:
with:
                ,         (17)
where  is
the
th order Chebyshev polynomial, and
 are the coefficients of either
 or
,
computed using the equations in (16). The polynomial
 is
evaluated at a certain value of
 using the
recursive relation:
with initial values  and
 The details of the Chebyshev
polynomial evaluation method are found in P. Kabal and R.P.
Ramachandran [6].
5.2.4Â Â Â Â Â Â LSP to LP conversion
Once the
LSPs are quantified and interpolated, they are converted back to the LP
coefficient domain . The conversion to the LP
domain is done as follows. The coefficients of
 or
 are found by expanding equations
(14) and (15) knowing the quantified and interpolated LSPs
. The following recursive relation is
used to compute
:
with initial
values  and
.
The coefficients
 are computed similarly by
replacing
 by
.
Once the coefficients  and
 are
found,
 and
 are
multiplied by
 and
, respectively, to obtain
 and
;
that is:
                                                                                                     (18)
Finally the LP coefficients are found by:
                                                                                         (19)
This is directly derived from the relation , and considering the fact that
 and
 are
symmetric and antiâsymmetric polynomials, respectively.
5.2.5Â Â Â Â Â Â Quantization of the LSP coefficients
The two sets of LP filter coefficients per frame are quantified using the LSP representation in the frequency domain; that is:
                                                                                                                        (20)
where  are the line spectral frequencies
(LSF) in Hz [0,4000] and
 is the sampling
frequency. The LSF vector is given by
,
with t denoting transpose.
A 1st order MA prediction is applied, and the two residual LSF
vectors are jointly quantified using split matrix quantization (SMQ). The
prediction and quantization are performed as follows. Let  and
 denote
the meanâremoved LSF vectors at frame
.
The prediction residual vectors
 and
 are given by:
                                                                                                                        (21)
where  is the predicted
LSF vector at frame
. First order movingâaverage
(MA) prediction is used where:
                                                                                                                                                 (22)
where  is the quantified second residual
vector at the past frame.
The two LSF
residual vectors  and
 are jointly quantified using split
matrix quantization (SMQ). The matrix
 is
split into 5 submatrices of dimension 2 x 2 (two elements from each vector).
For example, the first submatrix consists of the elements
, and
.
The 5 submatrices are quantified with 7, 8, 8+1, 8, and 6 bits,
respectively. The third submatrix uses a 256âentry signed codebook (8âbit index
plus 1âbit sign).
A weighted LSP
distortion measure is used in the quantization process. In general, for an
input LSP vector  and a quantified vector at
index
,
,
the quantization is performed by finding the index
 which
minimizes:
                                                                                                                                          (23)
The weighting
factors , are given by
    Â
                                                                                                                                                                                                 (24)
where  with
 and
. Here, two sets of weighting
coefficients are computed for the two LSF vectors. In the quantization of each
submatrix, two weighting coefficients from each set are used with their
corresponding LSFs.
5.2.6Â Â Â Â Â Â Interpolation of the LSPs
The two sets of quantified (and unquantized) LP parameters are used
for the second and fourth subframes whereas the first and third subframes use a
linear interpolation of the parameters in the adjacent subframes. The
interpolation is performed on the LSPs in the  domain.
Let
 be the LSP vector at the 4th
subframe of the present frame
,
 be the LSP vector at the 2nd
subframe of the present frame
, and
 the LSP vector at the 4th subframe
of the past frame
. The interpolated LSP
vectors at the 1st and 3rd subframes are given by:
                                                                                                                                       (25)
The interpolated LSP vectors are used to compute a different LP filter at each subframe (both quantified and unquantized coefficients) using the LSP to LP conversion method described in clause 5.2.4.
5.3Â Â Â Â Â Â Â Openâloop pitch analysis
Openâloop pitch analysis is performed twice per frame (each 10 ms) to find two estimates of the pitch lag in each frame. This is done in order to simplify the pitch analysis and confine the closedâloop pitch search to a small number of lags around the openâloop estimated lags.
Openâloop pitch
estimation is based on the weighted speech signal  which
is obtained by filtering the input speech signal through the weighting filter
. That is, in a subframe of size
, the weighted speech is given by:
                                                    (26)
Openâloop pitch analysis is performed as follows. In the first step, 3 maxima of the correlation:
                                                                                                                                               (27)
are found in the three ranges:
The retained
maxima , are normalized by dividing by
, respectively. The normalized maxima
and corresponding delays are denoted by
.
The winner,
, among the three normalized
correlations is selected by favouring the delays with the values in the lower
range. This is performed by weighting the normalized correlations corresponding
to the longer delays. The best openâloop delay
 is
determined as follows:
This procedure of dividing the delay range into 3 clauses and favouring the lower clauses is used to avoid choosing pitch multiples.
5.4Â Â Â Â Â Â Â Impulse response computation
The impulse
response, , of the weighted synthesis filter
 is computed each subframe. This
impulse response is needed for the search of adaptive and fixed codebooks. The
impulse response
 is computed by filtering
the vector of coefficients of the filter
 extended
by zeros through the two filters
 and
.
5.5Â Â Â Â Â Â Â Target signal computation
The target
signal for adaptive codebook search is usually computed by subtracting the zero
input response of the weighted synthesis filter  from
the weighted speech signal
. This is
performed on a subframe basis.
An equivalent procedure for computing the target signal, which is
used in the present document, is the filtering of the LP residual signal  through the combination of synthesis
filter
 and the weighting filter
. After determining the excitation
for the subframe, the initial states of these filters are updated by filtering
the difference between the LP residual and excitation. The memory update of
these filters is explained in clause 5.9.
The residual
signal  which is needed for finding the
target vector is also used in the adaptive codebook search to extend the past excitation
buffer. This simplifies the adaptive codebook search procedure for delays less
than the subframe size of 40 as will be explained in the next clause. The LP
residual is given by:
                                                                                                                                (28)
5.6Â Â Â Â Â Â Â Adaptive codebook search
Adaptive codebook search is performed on a subframe basis. It consists of performing closedâloop pitch search, and then computing the adaptive codevector by interpolating the past excitation at the selected fractional pitch lag.
The adaptive codebook parameters (or pitch parameters) are the delay and gain of the pitch filter. In the adaptive codebook approach for implementing the pitch filter, the excitation is repeated for delays less than the subframe length. In the search stage, the excitation is extended by the LP residual to simplify the closedâloop search.
In the first and
third subframes, a fractional pitch delay is used with resolutions: 1/6 in the
range  and integers only in the
range [95, 143]. For the second and fourth subframes, a pitch resolution
of 1/6 is always used in the range
, where
 is nearest integer to the fractional
pitch lag of the previous (1st or 3rd) subframe, bounded by 18...143.
Closedâloop
pitch analysis is performed around the openâloop pitch estimates on a subframe
basis. In the first (and third) subframe the range ,
bounded by 18...143, is searched. For the other subframes, closedâloop pitch
analysis is performed around the integer pitch selected in the previous
subframe, as described above. The pitch delay is encoded with 9 bits in
the first and third subframes and the relative delay of the other subframes is
encoded with 6 bits.
The closedâloop pitch search is performed by minimizing the meanâsquare weighted error between the original and synthesized speech. This is achieved by maximizing the term:
                                                                                                                                    (29)
where  is the target signal and
 is the past filtered excitation at
delay
 (past excitation convolved with
). Note that the search range is
limited around the openâloop pitch as explained earlier.
The convolution  is computed for the first delay tmin
in the searched range, and for the other delays in the search range
, it is updated using the recursive
relation:
                                                                ,                                                         (30)
where , is the excitation buffer. Note that
in search stage, the samples
, are not known,
and they are needed for pitch delays less than 40. To simplify the search, the
LP residual is copied to
 in order to make
the relation in equation (30) valid for all delays.
Once the optimum
integer pitch delay is determined, the fractions from  to
 with a step of
 around that integer are tested. The
fractional pitch search is performed by interpolating the normalized correlation
in equation (29) and searching for its maximum. The interpolation is performed
using an FIR filter
 based on a Hamming
windowed
 function truncated at ± 23 and
padded with zeros at ± 24 (
). The filter has its cutâoff
frequency (â3 dB) at 3 600 Hz in the overâsampled domain. The interpolated
values of
 for the fractions
 to
 are
obtained using the interpolation formula:
                              (31)
where corresponds to the fractions 0,
,
,
,
,
and
, respectively. Note that it is
necessary to compute the correlation terms in equation (29) using a range
 to allow for the proper
interpolation.
Once the
fractional pitch lag is determined, the adaptive codebook vector  is computed by interpolating the
past excitation signal
 at the given integer delay
 and phase (fraction)
:
 (32)
The
interpolation filter  is based on a Hamming
windowed
 function truncated at ± 59 and
padded with zeros at ± 60 (
). The filter has a cutâoff
frequency (â3 dB) at 3 600 Hz in the overâsampled domain.
The adaptive codebook gain is then found by:
                                                                                  (33)
where  is the filtered adaptive codebook
vector (zero state response of
 to
).
The computed adaptive codebook gain is quantified using 4âbit nonâuniform scalar quantization in the range [0.0,1.2].
5.7Â Â Â Â Â Â Â Algebraic codebook structure and search
The algebraic codebook structure is based on interleaved singleâpulse permutation (ISPP) design. In this codebook, the innovation vector contains 10 nonâzero pulses. All pulses can have the amplitudes +1 or â1. The 40 positions in a subframe are divided into 5 tracks, where each track contains two pulses, as shown in table 2.
Table 2: Potential positions of individual pulses in the algebraic codebook
Track |
Pulse |
positions |
1 |
i0, i5 |
0, 5, 10, 15, 20, 25, 30, 35 |
2 |
i1, i6 |
1, 6, 11, 16, 21, 26, 31, 36 |
3 |
i2, i7 |
2, 7, 12, 17, 22, 27, 32, 37 |
4 |
i3, i8 |
3, 8, 13, 18, 23, 28, 33, 38 |
5 |
i4, i9 |
4, 9, 14, 19, 24, 29, 34, 39 |
Each two pulse positions in one track are encoded with 6 bits (total of 30 bits, 3 bits for the position of every pulse), and the sign of the first pulse in the track is encoded with 1 bit (total of 5 bits).
For two pulses located in the same track, only one sign bit is needed. This sign bit indicates the sign of the first pulse. The sign of the second pulse depends on its position relative to the first pulse. If the position of the second pulse is smaller, then it has opposite sign, otherwise it has the same sign than in the first pulse.
All the 3âbit pulse positions are Gray coded in order to improve robustness against channel errors. This gives a total of 35 bits for the algebraic code.
The algebraic codebook is searched by minimizing the mean square error between the weighted input speech and the weighted synthesized speech. The target signal used in the closedâloop pitch search is updated by subtracting the adaptive codebook contribution. That is:
                                                                                                                (34)
where  is the filtered adaptive codebook
vector and
 is the quantified adaptive codebook
gain. If
 is the algebraic codevector at index
, then the algebraic codebook is
searched by maximizing the term:
                                                                                                                                                 (35)
where  is the correlation between the
target signal
 and the impulse response
,
 is
a the lower triangular Toepliz convolution matrix with diagonal
 and lower diagonals
, and
 is
the matrix of correlations of
. The vector
 (backward filtered target) and the
matrix
 are computed prior to the codebook
search. The elements of the vector
 are computed by
                                                                                                               (36)
and the elements
of the symmetric matrix  are computed by:
                                                                                                                (37)
The algebraic structure of the codebooks allows for very fast search
procedures since the innovation vector  contains
only a few nonzero pulses. The correlation in the numerator of Equation (35) is
given by:
                                                                                                                                                            (38)
where  is the position of the
th pulse,
 is
its amplitude, and
 is the number of pulses (Np = 10 ).
The energy in the denominator of equation (35) is given by:
                                                                                        (39)
To simplify the search procedure, the pulse amplitudes are preset by
the mere quantization of an appropriate signal. In this case the signal , which is a sum of the normalized
 vector and normalized longâterm
prediction residual
:
                                                (40)
is used. This is
simply done by setting the amplitude of a pulse at a certain position equal to
the sign of  at that position. The simplification
proceeds as follows (prior to the codebook search). First, the sign signal
 and the signal
 are computed. Second, the matrix
 is modified by including the sign
information; that is,
. The correlation in equation
(38) is now given by:
                                                                                                                                                               (41)
and the energy in equation (39) is given by:
                                                                                                         (42)
Having preset the pulse amplitudes, as explained above, the optimal pulse positions are determined using an efficient nonâexhaustive analysisâbyâsynthesis search technique. In this technique, the term in equation (35) is tested for a small percentage of position combinations.
First, for each of the five tracks the
pulse positions with maximum absolute values of  are
searched. From these the global maximum value for all the pulse positions is
selected. The first pulse i0 is always set into the position corresponding to
the global maximum value.
Next, four iterations are carried out. During each iteration the position of pulse i1 is set to the local maximum of one track. The rest of the pulses are searched in pairs by sequentially searching each of the pulse pairs {i2,i3}, {i4,i5}, {i6,i7} and {i8,i9} in nested loops. Every pulse has 8 possible positions, i.e., there are four 8x8âloops, resulting in 256 different combinations of pulse positions for each iteration.
In each iteration all the 9 pulse starting positions are cyclically shifted, so that the pulse pairs are changed and the pulse i1 is placed in a local maximum of a different track. The rest of the pulses are searched also for the other positions in the tracks. At least one pulse is located in a position corresponding to the global maximum and one pulse is located in a position corresponding to one of the 4 local maxima.
A special
feature incorporated in the codebook is that the selected codevector is
filtered through an adaptive preâfilter  which
enhances special spectral components in order to improve the synthesized speech
quality. Here the filter
 is used, where
 is the nearest integer pitch lag to
the closedâloop fractional pitch lag of the subframe, and
 is a pitch gain. In the present
document,
 is given by the quantified pitch
gain bounded by [0.0,1.0]. Note that prior to the codebook search, the
impulse response
 must include the preâfilter
. That is,
.
The fixed codebook gain is then found by:
                                                                                                                                                                            (43)
where  is
the target vector for fixed codebook search and
 is
the fixed codebook vector convolved with
,
                                                                                                                   (44)
5.8Â Â Â Â Â Â Â Quantization of the fixed codebook gain
The fixed
codebook gain quantization is performed using MA prediction with fixed
coefficients. The 4th order MA prediction is performed on the innovation energy
as follows. Let  be the meanâremoved
innovation energy (in dB) at subframe
,
and given by:
                                                                                                                     (45)
where  is the subframe size,
 is the fixed codebook excitation,
and
 dB is the mean of the innovation
energy. The predicted energy is given by:
                                                                             ,                                                                     (46)
where  are the MA prediction coefficients,
and
 is the quantified prediction error
at subframe
. The predicted energy is used to
compute a predicted fixedâcodebook gain
 as
in equation (45) (by substituting
 by
 and
 by
). This is done as follows. First,
the mean innovation energy is found by:
                                                                                                                                        (47)
and then the
predicted gain  is found by:
                                                                                                                                              (48)
A correction
factor between the gain  and the estimated
one
 is given by:
                                                                                     .                                                                              (49)
Note that the prediction error is given by:
                                                                                                                        (50)
The correction
factor  is quantified using a 5âbit codebook.
The quantization table search is performed by minimizing the error:
                                                                                                                                                     (51)
Once the optimum
value  is chosen, the quantified fixed
codebook gain is given by
.
5.9Â Â Â Â Â Â Â Memory update
An update of the states of the synthesis and weighting filters is needed in order to compute the target signal in the next subframe.
After the two
gains are quantified, the excitation signal, ,
in the present subframe is found by:
                                                                                                             (52)
where  and
 are
the quantified adaptive and fixed codebook gains, respectively,
 the adaptive codebook vector
(interpolated past excitation), and
 is the fixed
codebook vector (algebraic code including pitch sharpening). The states of the
filters can be updated by filtering the signal
 (difference
between residual and excitation) through the filters
 and
 for the 40âsample subframe and
saving the states of the filters. This would require 3 filterings. A simpler
approach which requires only one filtering is as follows. The local synthesized
speech,
, is computed by filtering the
excitation signal through
. The output of
the filter due to the input
 is equivalent to
. So the states of the synthesis
filter
 are given by
.
Updating the states of the filter
 can be done by
filtering the error signal
 through this
filter to find the perceptually weighted error
.
However, the signal
 can be equivalently found
by:
                                                                                                                         (53)
Since the
signals , and
 are
available, the states of the weighting filter are updated by computing
 as in equation (53) for
. This saves two filterings.
The function of the decoder consists of decoding the transmitted parameters (LP parameters, adaptive codebook vector, adaptive codebook gain, fixed codebook vector, fixed codebook gain) and performing synthesis to obtain the reconstructed speech. The reconstructed speech is then postâfiltered and upscaled. The signal flow at the decoder is shown in figure 4.
6.1Â Â Â Â Â Â Â Decoding and speech synthesis
The decoding process is performed in the following order:
Decoding of
LP filter parameters: The received indices of LSP
quantization are used to reconstruct the two quantified LSP vectors. The
interpolation described in clause 5.2.6 is performed to obtain 4 interpolated
LSP vectors (corresponding to 4 subframes). For each subframe, the interpolated
LSP vector is converted to LP filter coefficient domain,
which is used for synthesizing the reconstructed speech in the subframe.
The following steps are repeated for each subframe:
1)Â Â Decoding of the adaptive codebook vector: The
received pitch index (adaptive codebook index) is used to find the integer and
fractional parts of the pitch lag. The adaptive codebook vector  is found by interpolating the past
excitation
 (at the pitch delay) using the FIR
filter described in clause 5.6.
2)Â Â Decoding of the adaptive codebook gain:
The received index is used to readily find the quantified adaptive codebook
gain, from the quantization table.
3)Â Â Decoding of the innovative codebook
vector: The received algebraic codebook index is used to extract the
positions and amplitudes (signs) of the excitation pulses and to find the
algebraic codevector. If the integer part of
the pitch lag is less than the
subframe size 40, the pitch sharpening procedure is applied which translates
into modifying
 by
,
where
 is the decoded pitch gain,
, bounded by [0.0,1.0].
4)Â Â Decoding of the fixed codebook gain: The
received index gives the fixed codebook gain correction factor . The estimated fixed codebook gain
 is found as described in
clause 5.7. First, the predicted energy is found by:
                                                                                                                                                  (54)
     and then the mean innovation energy is found by:
                                                                                                                                         (55)
     The predicted gain is
found by:
                                                                          .                                                                   (56)
     The quantified fixed codebook gain is given by:
                                                                                                                                                                       (57)
5)Â Â Computing the reconstructed speech: The excitation at the input of the synthesis filter is given by:
                                                                                                                                               (58)
     Before the speech synthesis, a postâprocessing of excitation elements is performed. This means that the total excitation is modified by emphasizing the contribution of the adaptive codebook vector:
                                                                                                             (59)
     Adaptive gain
control (AGC) is used to compensate for the gain difference between the nonâemphasized
excitation  and emphasized excitation
 The gain scaling factor h for the
emphasized excitation is computed by:
                                                                                                                           (60)
     The gainâscaled emphasized excitation signal
 is given by:
                                                                                                                                                                    (61)
     The reconstructed speech for the subframe of size 40 is given by:
                                                                                               (62)
     where  are
the interpolated LP filter coefficients.
The synthesized
speech  is then passed through an adaptive
postfilter which is described in the following clause.
6.2Â Â Â Â Â Â Â Postâprocessing
Postâprocessing consists of two functions: adaptive postâfiltering and signal upâscaling.
6.2.1Â Â Â Â Â Â Adaptive postâfiltering
The adaptive postfilter is the cascade of two filters: a formant postfilter, and a tilt compensation filter. The postfilter is updated every subframe of 5 ms.
The formant postfilter is given by:
                                                                                                                                                        (63)
where  is the received quantified (and
interpolated) LP inverse filter (LP analysis is not performed at the decoder),
and the factors
 and
 control the amount of the formant postâfiltering.
Finally, the filter  compensates for
the tilt in the formant postfilter
 and is given by:
                                                                                                                                                         (64)
where  is a tilt factor, with
 being the first reflection coefficient calculated on the
truncated
impulse response,
, of the filter
.
 is
given by:
                                                                                               (65)
The postâfiltering
process is performed as follows. First, the synthesized speech  is inverse filtered through
 to produce the residual signal
. The signal
 is
filtered by the synthesis filter
. Finally, the
signal at the output of the synthesis filter
 is
passed to the tilt compensation filter
 resulting
in the postâfiltered speech signal
.
Adaptive gain
control (AGC) is used to compensate for the gain difference between the
synthesized speech signal  and the postâfiltered
signal
. The gain scaling factor
 for the present subframe is computed
by:
                                                                                                                                                         (66)
The gainâscaled
postâfiltered signal  is given by:
                                                                                                                                                     (67)
where  is updated in sampleâbyâsample basis
and given by:
                                                                                                                         (68)
where  is a AGC factor with value of 0.9.
The adaptive
postâfiltering factors are given by:,
 and
                                                                             .                                                                     (69)
6.2.2Â Â Â Â Â Â Upâscaling
Upâscaling consists of multiplying the postâfiltered speech by a factor of 2 to compensate for the downâscaling by 2 which is applied to the input signal.
The various components of the 12,2 kbit/s GSM enhanced full rate codec are described in the form of a fixedâpoint bitâexact ANSI C code, which is found in GSM 06.53 [6]. This C simulation is an integrated software of the speech codec, VAD/DTX, comfort noise and bad frame handler functions. In the fixedâpoint ANSI C simulation, all the computations are performed using a predefined set of basic operators.
Two types of variables are used in the fixedâpoint implementation. These two types are signed integers in 2's complement representation, defined by:
            Word16  16 bit variables
            Word32  32 bit variables
The variables of the Word16 type are denoted var1, var2,..., varn, and those of type Word32 are denoted L_var1, L_var2,..., L_varn.
7.1Â Â Â Â Â Â Â Description of the constants and variables used in the C code
The ANSI C code simulation of the codec is, to a large extent, selfâdocumented. However, a description of the variables and constants used in the code is given to facilitate the understanding of the code. The fixedâpoint precision (in terms of Q format, double precision (DP), or normalized precision) of the vectors and variables is given, along with the vectors dimensions and constant values.
Table 3 gives the coder global constants and table 4 describes the variables and vectors used in the encoder routine with their precision. Table 5 describes the fixed tables in the codec.
Table 3: Codec global constants
Parameter |
Value |
Description |
L_TOTAL |
240 |
size of speech buffer |
L_WINDOW |
240 |
size of LP analysis window |
L_FRAME |
160 |
size of speech frame |
L_FRAME_BY2 |
80 |
half the speech frame size |
L_SUBFR |
40 |
size of subframe |
M |
10 |
order of LP analysis |
MP1 |
11 |
M+1 |
AZ_SIZE |
44 |
4*M+4 |
PIT_MAX |
143 |
maximum pitch lag |
PIT_MIN |
18 |
minimum pitch lag |
L_INTERPOL |
10 |
order of sinc filter for interpolating |
|
|
the excitations is 2*L_INTERPOL*6+1 |
PRM_SIZE |
57 |
size of vector of analysis parameters |
SERIAL_SIZE |
245 |
number of speech bits + bfi |
MU |
26214 |
tilt compensation filter factor (0.8 in Q15) |
AGC_FAC |
29491 |
automatic gain control factor (0.9 in Q15) |
Table 4: Description of the coder vectors and variables
Parameter |
Size |
Precision |
Description |
speech |
â80..159 |
Q0 |
speech buffer |
wsp |
â143..159 |
Q0 |
weighted speech buffer |
exc |
â(143+11)..159 |
Q0 |
LP excitation |
F_gamma1 |
0..9 |
Q15 |
spectral expansion factors |
F_gamma2 |
0..9 |
Q15 |
spectral expansion factors |
lsp_old |
0..9 |
Q15 |
LSP vector in past frame |
lsp_old_q |
0..9 |
Q15 |
quantified LSP vector in past frame |
mem_syn |
0..9 |
Q0 |
memory of synthesis filter |
mem_w |
0..9 |
Q0 |
memory of weighting filter (applied to input) |
mem_wO |
0..9 |
Q0 |
memory of weighting filter (applied to error) |
error |
â10..39 |
Q0 |
error signal (input minus synthesized speech) |
r_1 & r_h |
0..10 |
normalized DP |
correlations of windowed speech (low and hi) |
A_t |
11x4 |
Q12 |
LP filter coefficients in 4 subframes |
Aq_t |
11x4 |
Q12 |
quantified LP filter coefficients in 4 subframes |
Ap1 |
0..10 |
Q12 |
LP coefficients with spectral expansion |
Ap2 |
0..10 |
Q12 |
LP coefficients with spectral expansion |
lsp_new |
0..9 |
Q15 |
LSP vector in 4th subframe |
lsp_new_q |
0..9 |
Q15 |
quantified LSP vector in 4th subframe |
lsp_mid |
0..9 |
Q15 |
LSP vector in 2nd subframe |
lsp_mid_q |
0..9 |
Q15 |
quantified LSP vector in 2nd subframe |
code |
0..39 |
Q12 |
fixed codebook excitation vector |
h1 |
0..39 |
Q12 |
impulse response of weighted synthesis filter |
xn |
0..39 |
Q0 |
target vector in pitch search |
xn2 |
0..39 |
Q0 |
target vector in algebraic codebook search |
dn |
0..39 |
scaled max < 8192 |
backward filtered target vector |
y1 |
0..39 |
Q0 |
filtered adaptive codebook vector |
y2 |
0..39 |
Q12 |
filtered fixed codebook vector |
zero |
0..39 |
|
zero vector |
res2 |
0..39 |
|
longâterm prediction residual |
gain_pit |
scalar |
Q12 |
adaptive codebook gain |
gain_code |
scalar |
Q0 |
algebraic codebook gain |
Table 5: Codec fixed tables
Parameter |
Size |
Precision |
Description |
grid [ ] |
61 |
Q15 |
grid points at which Chebyshev polynomials are evaluated |
lag_h [ ] and lag_1 [ ] |
10 |
DP |
higher and lower parts of the lag window table |
window_160_80 [ ] |
240 |
Q15 |
1st LP analysis window |
window_232_8 [ ] |
240 |
Q15 |
2nd LP analysis window |
table [ ] in Lsf_lsp ( ) |
65 |
Q15 |
table to compute cos(x) in Lsf_lsp ( ) |
slope [ ] in Lsp_lsf ( ) |
64 |
Q12 |
table to compute acos(x) in LSP_lsf ( ) |
table [ ] in Inv_sqrt ( ) |
49 |
|
table used in inverse square root computation |
table [ ] in Log2 ( ) |
33 |
|
table used in base 2 logarithm computation |
table [ ] in Pow2 ( ) |
33 |
|
table used in 2 to the power computation |
mean_lsf [ ] |
10 |
Q15 |
LSF means in normalized frequency [0.0, 0.5] |
dico1_lsf [ ] |
128 x 4 |
Q15 |
1st LSF quantizer in normalized frequency [0.0, 0.5] |
dico2_lsf [ ] |
256 x 4 |
Q15 |
2nd LSF quantizer in normalized frequency [0.0, 0.5] |
dico3_lsf [ ] |
256 x 4 |
Q15 |
3rd LSF quantizer in normalized frequency [0.0, 0.5] |
dico4_lsf [ ] |
256 x 4 |
Q15 |
4th LSF quantizer in normalized frequency [0.0, 0.5] |
dico5_lsf [ ] |
64 x 4 |
Q15 |
5th LSF quantizer in normalized frequency [0.0, 0.5] |
qua_gain_pitch [ ] |
16 |
Q14 |
quantization table of adaptive codebook gain |
qua_gain_code [ ] |
32 |
Q11 |
quantization table of fixed codebook gain |
inter_6 [ ] in Interpol_6 ( ) |
25 |
Q15 |
interpolation filter coefficients in Interpol_6 ( ) |
inter_6 [ ] in Pred_lt_6 ( ) |
61 |
Q15 |
interpolation filter coefficients in Pred_lt_6 ( ) |
b [ ] |
3 |
Q12 |
HP filter coefficients (numerator) in Pre_Process ( ) |
a [ ] |
3 |
Q12 |
HP filter coefficients (denominator) in Pre_Process ( ) |
bitno [ ] |
57 |
Q0 |
number of bits corresponding to transmitted parameters |
Table 6: Source Encoder output parameters in order
of occurrence
and bit allocation within the speech frame of 244 bits/20 ms
Bits (MSBâLSB) |
Description |
s1 â s7 |
index of 1st LSF submatrix |
s8 â s15 |
index of 2nd LSF submatrix |
s16 â s23 |
index of 3rd LSF submatrix |
s24 |
sign of 3rd LSF submatrix |
s25 â s32 |
index of 4th LSF submatrix |
s33 â s38 |
index of 5th LSF submatrix |
subframe 1 |
|
s39 â s47 |
adaptive codebook index |
s48 â s51 |
adaptive codebook gain |
s52 |
sign information for 1st and 6th pulses |
s53 â s55 |
position of 1st pulse |
s56 |
sign information for 2nd and 7th pulses |
s57 â s59 |
position of 2nd pulse |
s60 |
sign information for 3rd and 8th pulses |
s61 â s63 |
position of 3rd pulse |
s64 |
sign information for 4th and 9th pulses |
s65 â s67 |
position of 4th pulse |
s68 |
sign information for 5th and 10th pulses |
s69 â s71 |
position of 5th pulse |
s72 â s74 |
position of 6th pulse |
s75 â s77 |
position of 7th pulse |
s78 â s80 |
position of 8th pulse |
s81 â s83 |
position of 9th pulse |
s84 â s86 |
position of 10th pulse |
s87 â s91 |
fixed codebook gain |
subframe 2 |
|
s92 â s97 |
adaptive codebook index (relative) |
s98 â s141 |
same description as s48 â s91 |
subframe 3 |
|
s142 â s194 |
same description as s39 â s91 |
subframe 4 |
|
s195 â s244 |
same description as s92 â s141 |
8.1Â Â Â Â Â Â Â Functional description
The enhanced full rate speech codec is described in a bitâexact arithmetic to allow for easy type approval as well as general testing purposes of the enhanced full rate speech codec.
The response of the codec to a predefined input sequence can only be foreseen if the internal state variables of the codec are in a predefined state at the beginning of the experiment. Therefore, the codec has to be put in a so called home state before a bitâexact test can be performed. This is usually done by a reset (a procedure in which the internal state variables of the codec are set to their defined initial values).
To allow a reset of the codec in remote locations, special homing frames have been defined for the encoder and the decoder, thus enabling a codec homing by inband signalling.
The codec homing procedure is defined in such a way, that in either direction (encoder or decoder) the homing functions are called after processing the homing frame that is input. The output corresponding to the first homing frame is therefore dependent on the codec state when receiving that frame and hence usually not known. The response to any further homing frame in one direction is by definition a homing frame of the other direction. This procedure allows homing of both, the encoder and decoder from either side, if a loop back configuration is implemented, taking proper framing into account.
8.2Â Â Â Â Â Â Â Definitions
Encoder homing frame: The encoder homing frame consists of 160 identical samples, each 13 bits long, with the least significant bit set to "one" and all other bits set to "zero". When written to 16âbit words with left justification, the samples have a value of 0008 hex. The speech decoder has to produce this frame as a response to the second and any further decoder homing frame if at least two decoder homing frames were input to the decoder consecutively.
Decoder homing frame: The decoder homing frame has a fixed set of speech parameters as described in table7. It is the natural response of the speech encoder to the second and any further encoder homing frame if at least two encoder homing frames were input to the encoder consecutively.
Table7: Parameter values for the decoder homing frame
Parameter |
Value (LSB=b0) |
LPC 1 |
0x0004 |
LPC 2 |
0x002F |
LPC 3 |
0x00B4 |
LPC 4 |
0x0090 |
LPC 5 |
0x003E |
LTPâLAG 1 |
0x0156 |
LTPâLAG 2 |
0x0036 |
LTPâLAG 3 |
0x0156 |
LTPâLAG 4 |
0x0036 |
LTPâGAIN 1 |
0x000B |
LTPâGAIN 2 |
0x0001 |
LTPâGAIN 3 |
0x0000 |
LTPâGAIN 4 |
0x000B |
FCBâGAIN 1 |
0x0003 |
FCBâGAIN 2 |
0x0000 |
FCBâGAIN 3 |
0x0000 |
FCBâGAIN 4 |
0x0000 |
PULSE 1_1 |
0x0000 |
PULSE 1_2 |
0x0001 |
PULSE 1_3 |
0x000F |
PULSE 1_4 |
0x0001 |
PULSE 1_5 |
0x000D |
PULSE 1_6 |
0x0000 |
PULSE 1_7 |
0x0003 |
PULSE 1_8 |
0x0000 |
PULSE 1_9 |
0x0003 |
PULSE 1_10 |
0x0000 |
PULSE 2_1 |
0x0008 |
PULSE 2_2 |
0x0008 |
PULSE 2_3 |
0x0005 |
PULSE 2_4 |
0x0008 |
PULSE 2_5 |
0x0001 |
PULSE 2_6 |
0x0000 |
PULSE 2_7 |
0x0000 |
PULSE 2_8 |
0x0001 |
PULSE 2_9 |
0x0001 |
PULSE 2_10 |
0x0000 |
PULSE 3_1 |
0x0000 |
PULSE 3_2 |
0x0000 |
PULSE 3_3 |
0x0000 |
PULSE 3_4 |
0x0000 |
PULSE 3_5 |
0x0000 |
PULSE 3_6 |
0x0000 |
PULSE 3_7 |
0x0000 |
PULSE 3_8 |
0x0000 |
PULSE 3_9 |
0x0000 |
PULSE 3_10 |
0x0000 |
PULSE 4_1 |
0x0000 |
PULSE 4_2 |
0x0000 |
PULSE 4_3 |
0x0000 |
PULSE 4_4 |
0x0000 |
PULSE 4_5 |
0x0000 |
PULSE 4_6 |
0x0000 |
PULSE 4_7 |
0x0000 |
PULSE 4_8 |
0x0000 |
PULSE 4_9 |
0x0000 |
PULSE 4_10 |
0x0000 |
8.3Â Â Â Â Â Â Â Encoder homing
Whenever the enhanced full rate speech encoder receives at its input an encoder homing frame exactly aligned with its internal speech frame segmentation, the following events take place:
Step 1:Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â The speech encoder performs its normal operation including VAD and DTX and produces a speech parameter frame at its output which is in general unknown. But if the speech encoder was in its home state at the beginning of that frame, then the resulting speech parameter frame is identical to the decoder homing frame (this is the way how the decoder homing frame was constructed).
Step 2:Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â After successful termination of that operation the speech encoder provokes the homing functions for all subâmodules including VAD and DTX and sets all state variables into their home state. On the reception of the next input frame, the speech encoder will start from its home state.
NOTE:Â Â Â Â Â Applying a sequence of N encoder homing frames will cause at least Nâ1 decoder homing frames at the output of the speech encoder.
8.4Â Â Â Â Â Â Â Decoder homing
Whenever the speech decoder receives at its input a decoder homing frame, then the following events take place:
Step 1:Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â The speech decoder performs its normal operation and produces a speech frame at its output which is in general unknown. But if the speech decoder was in its home state at the beginning of that frame, then the resulting speech frame is replaced by the encoder homing frame. This would not naturally be the case but is forced by this definition here.
Step 2:Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â After successful termination of that operation the speech decoder provokes the homing functions for all subâmodules including the comfort noise generator and sets all state variables into their home state. On the reception of the next input frame, the speech decoder will start from its home state.
NOTE 1:Â Applying a sequence of N decoder homing frames will cause at least Nâ1 encoder homing frames at the output of the speech decoder.
NOTE 2:Â By definition (!) the first frame of each decoder test sequence must differ from the decoder homing frame at least in one bit position within the parameters for LPC and first subframe. Therefore, if the decoder is in its home state, it is sufficient to check only these parameters to detect a subsequent decoder homing frame. This definition is made to support a delayâoptimized implementation in the TRAU uplink direction.
8.5Â Â Â Â Â Â Â Encoder home state
In table 8, a listing of all the encoder state variables with their predefined values when in the home state is given.
Table 8: Initial values of the encoder state variables
File |
Variable |
Initial value |
cod_12k2.c |
old_speech[0:319] |
All set to 0 |
|
old_exc[0:153] |
All set to 0 |
|
old_wsp[0:142] |
All set to 0 |
|
mem_syn[0:9] |
All set to 0 |
|
mem_w[0:9] |
All set to 0 |
|
mem_w0[0:9] |
All set to 0 |
|
mem_err[0:9] |
All set to 0 |
|
ai_zero[11:50] |
All set to 0 |
|
hvec[0:39] |
All set to 0 |
|
lsp_old[0], lsp_old_q[0] |
30000 |
|
lsp_old[1], lsp_old_q[1] |
26000 |
|
lsp_old[2], lsp_old_q[2] |
21000 |
|
lsp_old[3], lsp_old_q[3] |
15000 |
|
lsp_old[4], lsp_old_q[4] |
8000 |
|
lsp_old[5], lsp_old_q[5] |
0 |
|
lsp_old[6], lsp_old_q[6] |
â8000 |
|
lsp_old[7], lsp_old_q[7] |
â15000 |
|
lsp_old[8], lsp_old_q[8] |
â21000 |
|
lsp_old[9], lsp_old_q[9] |
â26000 |
levinson.c |
old_A[0] |
4096 |
|
old_A[1:10] |
All set to 0 |
pre_proc.c |
y2_hi, y2_lo, y1_hi, y1_lo, x1, x0 |
All set to 0 |
q_plsf_5.c |
past_r2_q[0:9] |
All set to 0 |
q_gains.c |
past_qua_en[0:3] |
All set to â2381 |
|
pred[0] |
44 |
|
pred[1] |
37 |
|
pred[2] |
22 |
|
pred[3] |
12 |
dtx.c |
txdtx_hangover |
7 |
|
txdtx_N_elapsed |
0x7fff |
|
txdtx_ctrl |
0x0003 |
|
old_CN_mem_tx[0:5] |
All set to 0 |
|
lsf_old_tx[0:6][0] |
1384 |
|
lsf_old_tx[0:6][1] |
2077 |
|
lsf_old_tx[0:6][2] |
3420 |
|
lsf_old_tx[0:6][3] |
5108 |
|
lsf_old_tx[0:6][4] |
6742 |
|
lsf_old_tx[0:6][5] |
8122 |
|
lsf_old_tx[0:6][6] |
9863 |
|
lsf_old_tx[0:6][7] |
11092 |
|
lsf_old_tx[0:6][8] |
12714 |
|
lsf_old_tx[0:6][9] |
13701 |
|
gain_code_old_tx[0:27] |
All set to 0 |
|
L_pn_seed_tx |
0x70816958 |
|
buf_p_tx |
0 |
Initial values for variables used by the VAD algorithm are listed in GSM 06.32 [4].
8.6Â Â Â Â Â Â Â Decoder home state
In table 9, a listing of all the decoder state variables with their predefined values when in the home state is given.
Table 9: Initial values of the decoder state variables
File |
Variable |
Initial value |
decoder.c |
synth_buf[0:9] |
All set to 0 |
dec_12k2.c |
old_exc[0:153] |
All set to 0 |
|
mem_syn[0:9] |
All set to 0 |
|
lsp_old[0] |
30000 |
|
lsp_old[1] |
26000 |
|
lsp_old[2] |
21000 |
|
lsp_old[3] |
15000 |
|
lsp_old[4] |
8000 |
|
lsp_old[5] |
0 |
|
lsp_old[6] |
â8000 |
|
lsp_old[7] |
â15000 |
|
lsp_old[8] |
â21000 |
|
lsp_old[9] |
â26000 |
|
prev_bf |
0 |
|
state |
0 |
agc.c |
past_gain |
4096 |
d_plsf_5.c |
past_r2_q[0:9] |
All set to 0 |
|
past_lsf_q[0], lsf_p_CN[0], lsf_old_CN[0],lsf_new_CN[0] |
1384 |
|
past_lsf_q[1], lsf_p_CN[1], lsf_old_CN[1],lsf_new_CN[1] |
2077 |
|
past_lsf_q[2], lsf_p_CN[2], lsf_old_CN[2],lsf_new_CN[2] |
3420 |
|
past_lsf_q[3], lsf_p_CN[3], lsf_old_CN[3],lsf_new_CN[3] |
5108 |
|
past_lsf_q[4], lsf_p_CN[4], lsf_old_CN[4],lsf_new_CN[4] |
6742 |
|
past_lsf_q[5], lsf_p_CN[5], lsf_old_CN[5],lsf_new_CN[5] |
8122 |
|
past_lsf_q[6], lsf_p_CN[6], lsf_old_CN[6],lsf_new_CN[6] |
9863 |
|
past_lsf_q[7], lsf_p_CN[7], lsf_old_CN[7],lsf_new_CN[7] |
11092 |
|
past_lsf_q[8], lsf_p_CN[8], lsf_old_CN[8],lsf_new_CN[8] |
12714 |
|
past_lsf_q[9], lsf_p_CN[9], lsf_old_CN[9],lsf_new_CN[9] |
13701 |
d_gains.c |
pbuf[0:4] |
All set to 410 |
|
gbuf[0:4] |
All set to 1 |
|
past_gain_pit |
0 |
|
past_gain_code |
0 |
|
prev_gp |
4096 |
|
prev_gc |
1 |
|
gcode0_CN |
0 |
|
gain_code_old_CN |
0 |
|
gain_code_new_CN |
0 |
|
gain_code_muting_CN |
0 |
|
past_qua_en[0:3] |
All set to â2381 |
|
pred[0] |
44 |
|
pred[1] |
37 |
|
pred[2] |
22 |
|
pred[3] |
12 |
|
|
|
(continued) |
Table 9 (concluded): Initial values of the decoder state variables
File |
Variable |
Initial value |
dtx.c |
rxdtx_aver_period |
7 |
|
rxdtx_N_elapsed |
0x7fff |
|
rxdtx_ctrl |
0x0001 |
|
lsf_old_rx[0:6][0] |
1384 |
|
lsf_old_rx[0:6][1] |
2077 |
|
lsf_old_rx[0:6][2] |
3420 |
|
lsf_old_rx[0:6][3] |
5108 |
|
lsf_old_rx[0:6][4] |
6742 |
|
lsf_old_rx[0:6][5] |
8122 |
|
lsf_old_rx[0:6][6] |
9863 |
|
lsf_old_rx[0:6][7] |
11092 |
|
lsf_old_rx[0:6][8] |
12714 |
|
lsf_old_rx[0:6][9] |
13701 |
|
gain_code_old_rx[0:27] |
All set to 0 |
|
L_pn_seed_rx |
0x70816958 |
|
rx_dtx_state |
23 |
|
prev_SID_frames_lost |
0 |
|
buf_p_rx |
0 |
dec_lag6.c |
old_T0 |
40 |
preemph.c |
mem_pre |
0 |
pstfilt2.c |
mem_syn_pst[0:9] |
All set to 0 |
|
res2[0:39] |
All set to 0 |
Figure 2: Simplified block diagram of the CELP synthesis model
                                                           Figure 3: Simplified block diagram of the GSM enhanced full rate encoder
Figure 4: Simplified block diagram of the GSM enhanced full rate decoder
1)Â Â M.R. Schroeder and B.S. Atal, "CodeâExcited Linear Prediction (CELP): High quality speech at very low bit rates,"' Proc. ICASSP'85, pp. 937â940, 1985.
2)Â Â Y. Tohkura and F. Itakura, "Spectral smoothing technique in PARCOR speech analysisâsynthesis," IEEE Trans. on ASSP, vol. 26, no. 6, pp. 587â596, Dec. 1978.
3)Â Â L.R. Rabiner and R.W. Schaefer. Digital processing of speech signals. PrenticeâHall Int., 1978.
4)Â Â F. Itakura, "Line spectral representation of linear predictive coefficients of speech signals," J. Acoust. Soc. Amer, vol. 57, Supplement no. 1, S35, 1975.
5)Â Â F.K. Soong and B.H. Juang, "Line spectrum pair (LSP) and speech data compression", Proc. ICASSP'84, pp. 1.10.1â1.10.4, 1984.
6)Â Â P. Kabal and R.P. Ramachandran, "The computation of line spectral frequencies using Chebyshev polynomials", IEEE Trans. on ASSP, vol. 34, no. 6, pp. 1419â1426, Dec. 1986.
7)Â Â C. Laflamme, JâP. Adoul, R. Salami, S. Morissette, and P. Mabilleau, "16 kpbs wideband speech coding technique based on algebraic CELP" Proc. ICASSP'91, pp. 13â16.
SMG# |
SPEC |
CR |
PHASE |
VERS |
NEW_VERS |
SUBJECT |
s23 |
06.60 |
A003 |
2 |
4.0.0 |
4.0.1 |
Vote 115 comments |
s25 |
06.60 |
A005 |
2 |
4.0.1 |
4.1.0 |
Corrections to GSM 06.60 |
s28 |
06.60 |
|
|
4.1.0 |
6.0.0 |
Release 1997 version |
s28 |
06.60 |
A007 |
|
6.0.0 |
7.0.0 |
Addition of mu-Law (PCS 1900) |
|
06.60 |
|
|
7.0.1 |
7.0.2 |
Update to Version 7.0.2 for Publication |
s31 |
06.60 |
|
|
7.0.2 |
8.0.0 |
Release 1999 version |
|
06.60 |
|
|
8.0.0 |
8.0.1 |
Update to Version 8.0.1 for Publication |
Change history |
|||||||
Date |
TSG # |
TSG Doc. |
CR |
Rev |
Subject/Comment |
Old |
New |
03-2001 |
11 |
|
|
|
Version for Release 4 |
|
4.0.0 |
06-2002 |
16 |
|
|
|
Version for Release 5 |
4.0.0 |
5.0.0 |
12-2004 |
26 |
|
|
|
Version for Release 6 |
5.0.0 |
6.0.0 |
06-2007 |
36 |
|
|
|
Version for Release 7 |
6.0.0 |
7.0.0 |
12-2008 |
42 |
|
|
|
Version for Release 8 |
7.0.0 |
8.0.0 |
Version Control
Version Control
Toto je jediná verze této specifikace.
Download & Access
46060-800
Technical Details
AI Classification
Version Information
Document Info
Keywords & Refs
Partners
File Info
3GPP Spec Explorer - Enhanced specification intelligence