Voice Activity Detector (VAD) for full rate speech traffic channels

Specification: 46032

🟢Approvedv900
Rel-9
Relevance:7/10

Summary

This document specifies the Voice Activity Detector (VAD) to be used in the Discontinuous Transmission (DTX) for the digital cellular telecommunications system. The VAD is used to indicate whether each 20 ms frame produced by the speech encoder contains speech or not.

Specification Intelligence

This is a Technical Document in the Unknown Series series, focusing on Technical Document. The document is currently in approved by tsg and under change control and is under formal change control.

Classification

Type: Technical Document
Subject: Unknown Series
Series: 46.xxx
Target: Technical Implementers

Specifics

Status: Change Control

Version

900.0.0
Release 900
0 technical • 0 editorial

Full Document v900

3GPP TS 46.032 V9.0.0 (2009-12)
Technical Specification
3rd Generation Partnership Project;
Technical Specification Group Services and System Aspects;
Full rate speech;
Voice Activity Detector (VAD)
for full rate speech traffic channels
(Release 9)

 EMBED Word.Picture.6  	
The present document has been developed within the 3rd Generation Partnership Project (3GPP TM) and may be further elaborated for the purposes of 3GPP.	 
The present document has not been subject to any approval process by the 3GPP Organizational Partners and shall not be implemented.	 
This Specification is provided for future development work within 3GPP only. The Organizational Partners accept no liability for any use of this Specification.
Specifications and reports for implementation of the 3GPP TM system should be obtained via the 3GPP Organizational Partners' Publications Offices.



Keywords
GSM, speech, codec

3GPP
Postal address

3GPP support office address
650 Route des Lucioles - Sophia Antipolis
Valbonne - FRANCE
Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16
Internet
http://www.3gpp.org

Copyright Notification
No part may be reproduced except as authorized by written permission.
The copyright and the foregoing restriction extend to reproduction in all media.

© 2009, 3GPP Organizational Partners (ARIB, ATIS, CCSA, ETSI, TTA, TTC).
All rights reserved.

UMTS™ is a Trade Mark of ETSI registered for the benefit of its members
3GPP™ is a Trade Mark of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners
LTE™ is a Trade Mark of ETSI currently being registered for the benefit of its Members and of the 3GPP Organizational Partners
GSM® and the GSM logo are registered and owned by the GSM Association

Contents
 TOC \o "1-9" Foreword	 PAGEREF _Toc248308557 \h 5
1	Scope	 PAGEREF _Toc248308558 \h 6
2	References	 PAGEREF _Toc248308559 \h 6
3	Abbreviations	 PAGEREF _Toc248308560 \h 6
4	General	 PAGEREF _Toc248308561 \h 6
5	Functional description	 PAGEREF _Toc248308562 \h 7
5.1	Overview and principles of operation	 PAGEREF _Toc248308563 \h 7
5.2	Algorithm description	 PAGEREF _Toc248308564 \h 7
5.2.1	Adaptive filtering and energy computation	 PAGEREF _Toc248308565 \h 9
5.2.2	ACF averaging	 PAGEREF _Toc248308566 \h 9
5.2.3	Predictor values computation	 PAGEREF _Toc248308567 \h 9
5.2.4	Spectral comparison	 PAGEREF _Toc248308568 \h 10
5.2.5	Periodicity detection	 PAGEREF _Toc248308569 \h 10
5.2.6	Information tone detection	 PAGEREF _Toc248308570 \h 11
5.2.7	Threshold adaptation	 PAGEREF _Toc248308571 \h 12
5.2.8	VAD decision	 PAGEREF _Toc248308572 \h 15
5.2.9	VAD hangover addition	 PAGEREF _Toc248308573 \h 15
6	Computational details	 PAGEREF _Toc248308574 \h 15
6.1	Adaptive filtering and energy computation	 PAGEREF _Toc248308575 \h 17
6.2	ACF averaging	 PAGEREF _Toc248308576 \h 18
6.3	Predictor values computation	 PAGEREF _Toc248308577 \h 18
6.3.1	Schur recursion to compute reflection coefficients	 PAGEREF _Toc248308578 \h 19
6.3.2	Step‑up procedure to obtain the aav1[0..8]	 PAGEREF _Toc248308579 \h 19
6.3.3	Computation of the rav1[0..8]	 PAGEREF _Toc248308580 \h 20
6.4	Spectral comparison	 PAGEREF _Toc248308581 \h 20
6.5	Periodicity detection	 PAGEREF _Toc248308582 \h 21
6.6	Threshold adaptation	 PAGEREF _Toc248308583 \h 21
6.7	VAD decision	 PAGEREF _Toc248308584 \h 23
6.8	VAD hangover addition	 PAGEREF _Toc248308585 \h 23
6.9	Periodicity updating	 PAGEREF _Toc248308586 \h 24
6.10	Tone detection	 PAGEREF _Toc248308587 \h 24
6.10.1	Windowing	 PAGEREF _Toc248308588 \h 24
6.10.2	Auto‑correlation	 PAGEREF _Toc248308589 \h 24
6.10.3	Computation of the reflection coefficients	 PAGEREF _Toc248308590 \h 25
6.10.4	Filter coefficient calculation	 PAGEREF _Toc248308591 \h 26
6.10.5	Pole Frequency Test	 PAGEREF _Toc248308592 \h 26
6.10.6	Prediction gain test	 PAGEREF _Toc248308593 \h 26
7	Digital test sequences	 PAGEREF _Toc248308594 \h 27
7.1	Test configuration	 PAGEREF _Toc248308595 \h 27
7.2	Test sequences	 PAGEREF _Toc248308596 \h 28
Annex A (informative):	 PAGEREF _Toc248308597 \h 29
A.1	Simplified block filtering operation	 PAGEREF _Toc248308598 \h 29
A.2	Description of digital test sequences	 PAGEREF _Toc248308599 \h 29
A.2.1	Test sequences	 PAGEREF _Toc248308600 \h 29
A.2.2	File format description	 PAGEREF _Toc248308601 \h 31
A.3	VAD performance	 PAGEREF _Toc248308602 \h 33
A.4	Pole frequency calculation	 PAGEREF _Toc248308603 \h 34
Annex B (normative):	Test sequences	 PAGEREF _Toc248308604 \h 35
Annex C (informative):	Change history	 PAGEREF _Toc248308605 \h 36

Foreword
This Technical Specification has been produced by the 3rd Generation Partnership Project (3GPP).
The present document specifies the Voice Activity Detector (VAD) to be used in the Discontinuous Transmission (DTX) for the digital cellular telecommunications system.
Archive en_300965v080000p0.zip which accompanies the present document, contains test sequences, as described in clause A.2.
en_300965v080000p0.zip	Annex B: Test sequences for the GSM Full Rate speech codec; Test sequences files *.inp, *.cod, *.vad.
The specification from which the present document has been derived was originally based on CEPT documentation, hence the presentation of the present document may not be entirely in accordance with the ETSI/PNE Rules.
The contents of the present document are subject to continuing work within the TSG and may change following formal TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an identifying change of release date and an increase in version number as follows:
Version x.y.z
where:
x	the first digit:
1	presented to TSG for information;
2	presented to TSG for approval;
3	or greater indicates TSG approved document under change control.
y	the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, updates, etc.
z	the third digit is incremented when editorial only changes have been incorporated in the document.
1	Scope
The present document specifies the Voice Activity Detector (VAD) to be used in the Discontinuous Transmission (DTX) as described in GSM 06.31. It also specifies the test methods to be used to verify that a VAD complies with the technical specification.
The requirements are mandatory on any VAD to be used either in the GSM Mobile Stations (MS)s or Base Station Systems (BSS)s.
2	References
The following documents contain provisions which, through reference in this text, constitute provisions of the present document.
References are either specific (identified by date of publication, edition number, version number, etc.) or non‑specific.
For a specific reference, subsequent revisions do not apply.
For a non-specific reference, the latest version applies.  In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document.
[1]	GSM 01.04: "Digital cellular telecommunications system (Phase 2+); Abbreviations and acronyms".
[2]	GSM 06.10: "Digital cellular telecommunications system(Phase 2+); Full rate speech; Transcoding".
[3]	GSM 06.12: "Digital cellular telecommunications system(Phase 2+); Full rate speech; Comfort noise aspect for full rate speech traffic channels".
[4]	GSM 06.31: "Digital cellular telecommunications system(Phase 2+); Full rate speech; Discontinuous Transmission (DTX) for full rate speech traffic channels".
3	Abbreviations
Abbreviations used in the present document are listed in GSM 01.04 [1].
4	General
The function of the VAD is to indicate whether each 20 ms frame produced by the speech encoder contains speech or not. The output is a binary flag which is used by the TX DTX handler defined in GSM 06.31 [4].
The ETS is organized as follows.
Clause 2 describes the principles of operation of the VAD.
In clause 3, the computational details necessary for the fixed point implementation of the VAD algorithm are given. This clause uses the same notation as used for computational details in GSM 06.10.
The verification of the VAD is based on the use of digital test sequences. Clause 4 defines the input and output signals and the test configuration, whereas the detailed description of the test sequences is contained in clause A.2.
The performance of the VAD algorithm is characterized by the amount of audible speech clipping it introduces and the percentage activity it indicates. These characteristics for the VAD defined in the present document have been established by extensive testing under a wide range of operating conditions. The results are summarized in clause A.3.
5	Functional description
The purpose of this clause is to give the reader an understanding of the principles of operation of the VAD, whereas the detailed description is given in clause 3. In case of discrepancy between the two descriptions, the detailed description of clause 3 shall prevail.
In the following subclauses of clause 2, a Pascal programming type of notation has been used to describe the algorithm.
5.1	Overview and principles of operation
The function of the VAD is to distinguish between noise with speech present and noise without speech present. The biggest difficulty for detecting speech in a mobile environment is the very low speech/noise ratios which are often encountered. The accuracy of the VAD is improved by using filtering to increase the speech/noise ratio before the decision is made.
For a mobile environment, the worst speech/noise ratios are encountered in moving vehicles. It has been found that the noise is relatively stationary for quite long periods in a mobile environment. It is therefore possible to use an adaptive filter with coefficients obtained during noise, to remove much of the vehicle noise.
The VAD is basically an energy detector. The energy of the filtered signal is compared with a threshold; speech is indicated whenever the threshold is exceeded.
The noise encountered in mobile environments may be constantly changing in level. The spectrum of the noise can also change, and varies greatly over different vehicles. Because of these changes the VAD threshold and adaptive filter coefficients must be constantly adapted. To give reliable detection the threshold must be sufficiently above the noise level to avoid noise being identified as speech but not so far above it that low level parts of speech are identified as noise. The threshold and the adaptive filter coefficients are only updated when speech is not present. It is, of course, potentially dangerous for a VAD to update these values on the basis of its own decision. This adaptation therefore only occurs when the signal seems stationary in the frequency domain but does not have the pitch component inherent in voiced speech. A tone detector is also used to prevent adaptation during information tones.
A further mechanism is used to ensure that low level noise (which is often not stationary over long periods) is not detected as speech. Here, an additional fixed threshold is used.
A VAD hangover period is used to eliminate mid‑burst clipping of low level speech. Hangover is only added to speech‑bursts which exceed a certain duration to avoid extending noise spikes.
5.2	Algorithm description
The block diagram of the VAD algorithm is shown in figure 2.1. The individual blocks are described in the following subclauses. ACF, N and sof are calculated in the speech encoder.
 EMBED Designer  
Figure 2.1: Functional block diagram of the VAD
The global variables shown in the block diagram are described as follows:
‑	ACF are auto‑correlation coefficients which are calculated in the speech encoder defined in GSM 06.10 (subclause 3.1.4, see also clause A.1). The inputs to the speech encoder are 16 bit 2's complement numbers, as described in GSM 06.10, subclause 4.2.0;
‑	av0 and av1 are averaged ACF vectors;
‑	rav1 are autocorrelated predictor values obtained from av1;
‑	rvad are the autocorrelated predictor values of the adaptive filter;
‑	N is the long term predictor lag value which is obtained every sub‑segment in the speech coder defined in GSM 06.10;
‑	ptch indicates whether the signal has a steady periodic component;
‑	sof is the offset compensated signal frame obtained in the speech coder defined in GSM 06.10;
‑	pvad is the energy in the current frame of the input signal after filtering;
‑	thvad is an adaptive threshold;
‑	stat indicates spectral stationarity;
‑	vvad indicates the VAD decision before hangover is added;
‑	vad is the final VAD decision with hangover included.
5.2.1	Adaptive filtering and energy computation
Pvad is computed as follows:
 EMBED Equation.2   EMBED Equation.2  
This corresponds to performing an 8th order block filtering on the input samples to the speech encoder, after zero offset compensation and pre‑emphasis. This is explained in clause A.1.
5.2.2	ACF averaging
Spectral characteristics of the input signal have to be obtained using blocks that are larger than one 20 ms frame. This is done by averaging the auto‑correlation values for several consecutive frames. This averaging is given by the following equations:
 EMBED Equation.2  
 EMBED Equation.2  
Where n represents the current frame, n‑1 represents the previous frame etc. The values of constants are given in table 2.1.
Table 2.1: Constants and variables for ACF averaging
Constant
Value
Variable
Initial value
frames
4
previous ACF's
av0 & av1

All set to 0

5.2.3	Predictor values computation
The filter predictor values aav1 are obtained from the auto‑correlation values av1 according to the equation:
 EMBED Equation.2  
where:

         ‑                                                                                              ‑
R =  | av1[0], av1[1], av1[2], av1[3], av1[4], av1[5], av1[6], av1[7] |
         | av1[1], av1[0], av1[1], av1[2], av1[3], av1[4], av1[5], av1[6] |
         | av1[2], av1[1], av1[0], av1[1], av1[2], av1[3], av1[4], av1[5] |
         | av1[3], av1[2], av1[1], av1[0], av1[1], av1[2], av1[3], av1[4] |
         | av1[4], av1[3], av1[2], av1[1], av1[0], av1[1], av1[2], av1[3] |
         | av1[5], av1[4], av1[3], av1[2], av1[1], av1[0], av1[1], av1[2] |
         | av1[6], av1[5], av1[4], av1[3], av1[2], av1[1], av1[0], av1[1] |
         | av1[7], av1[6], av1[5], av1[4], av1[3], av1[2], av1[1], av1[0] |
         ‑                                                                                              ‑

and:

          ‑         ‑                                      ‑           ‑
p =   |av1[1]|                              a =  |aav1[1]|
         |av1[2]|                                      |aav1[2]|
         |av1[3]|                                      |aav1[3]|
         |av1[4]|                                      |aav1[4]|
         |av1[5]|                                      |aav1[5]|
         |av1[6]|                                      |aav1[6]|
         |av1[7]|                                      |aav1[7]|
         |av1[8]|                                      |aav1[8]|
         ‑         ‑                                      ‑           ‑

aav1[0] = ‑1
av1 is used in preference to av0 as av0 may contain speech.
The autocorrelated predictor values rav1 are then obtained:
 EMBED Equation.2  

5.2.4	Spectral comparison
The spectra represented by the autocorrelated predictor values rav1 and the averaged auto‑correlation values av0 are compared using the distortion measure dm defined below. This measure is used to produce a Boolean value stat every 20 ms, as given by these equations:
 EMBED Equation.2  
	difference = |dm ‑ lastdm|
	lastdm = dm
	stat = difference < thresh
The values of constants and initial values are given in table 2.2.
Table 2.2: Constants and variables for spectral comparison
Constant
Value
Variable
Initial value
thresh
0.05
lastdm
0

5.2.5	Periodicity detection
The frequency spectrum of mobile noise is relatively stationary over quite long periods. The Inverse Filter Autocorrelated Predictor coefficients of the adaptive filter rvad are only updated when this stationarity is detected. Vowel sounds however, also have this stationarity, but can be excluded by detecting the periodicity of these sounds using the long term predictor lag values (Nj) which are obtained every sub‑segment from the speech codec defined in GSM 06.10. Consecutive lag values are compared. Cases in which one lag value is a factor of the other are catered for, however cases in which both lag values have a common factor, are not. This case is not important for speech input but this method of periodicity detection may fail for some sine waves. The Boolean variable ptch is updated every 20 ms and is true when periodicity is detected. It is calculated according to the following equation:
	ptch = oldlagcount + veryoldlagcount >= nthresh
The following operations are done after the VAD decision and when the current LTP lag values (N0 .. N3) are available, this reduces the delay of the VAD decision. (N{‑1} = N3 of previous segment.)
  lagcount = 0

  for j = 0 to 3 do
  begin
    smallag = maximum(Nj,N{j‑1}) mod minimum(Nj,N{j‑1})
    if minimum(smallag,minimum(Nj,N{j‑1})‑smallag) < lthresh
      then increment(lagcount)
  end

  veryoldlagcount = oldlagcount

  oldlagcount = lagcount

The values of constants and initial values are given in table 2..
Table 2.3: Constants and variables for periodicity detection
Constant
Value
Variable
Initial value
lthresh
nthresh
2
4
oldlagcount
veryoldlagcount
N3
0
0
40

5.2.6	Information tone detection
The tone flag is only evaluated in the downlink VAD. In the uplink VAD, tone detection is not performed and tone = false.
Computation of the tone flag is complex. It is therefore evaluated after the processing of the current speech encoder frame. In this way transmission of the speech or SID frame is not delayed.
Information tones and environmental noise can be classified by inspecting the short term prediction gain, information tones resulting in higher prediction gains than environmental noise. Tones can therefore be detected by comparing the prediction gain to a fixed threshold. By limiting the prediction gain calculation to a fourth order analysis, information signals consisting of one or two tones can be detected whilst minimizing the prediction gain for environmental noise.
The prediction gain decision is implemented by comparing the normalized prediction error with a threshold. This measure is used to evaluate the Boolean variable tone every 20 ms. The signal is classified as a tone if the prediction error is smaller than the threshold predth. This is equivalent to a prediction gain threshold of 13,5 dB.
Mobile noise can contain very strong resonances at low frequencies, resulting in a high prediction gain. A further test is therefore made to determine the pole frequency of a second order analysis of the signal frame. The signal is classified as noise if the frequency of the pole is less than 385 Hz. The pole frequency calculation is described in clause A.4.
The algorithm for detecting information tones is as follows:
	tone = false
	
	den = a[1]*a[1]
	num = 4*a[2] ‑ a[1]*a[1]

	if ( num <= 0 ) 
	  return

	if (( a[1] < 0 ) AND ( num / den < freqth ))
	  return
	            4
	prederr = MULT (1 ‑ RC[i]*RC[i])
	           i=1

	if (prederr < predth)
	  tone = true

	return

The values of the constants are given in table 2.4. The coefficients a[1..2] are transversal filter coefficients calculated from rc[1..2]. The calculation of the reflection coefficients rc[1..4] is described below.
The offset compensated signal frame sof[0..159] is multiplied by the Hanning window to give the windowed frame sofh[0..159]:
 EMBED Equation.2  
where
 EMBED Equation.2  
The auto‑correlation acfh[0..4] of the windowed signal frame is then calculated:
 EMBED Equation.2  
rc[1..4] are then calculated from acfh[0..4] using the Schur recursion described in the RPE‑LTP codec.
Table 2.4: Constants for information tone detection
Constant
Value
freqth
predth
0,0973
0,0158

NOTE:	Reflection coefficients are available in the RPE‑LTP codec.  However, they are calculated after pre‑emphasis using a rectangular window and do not give good tone detection results.

5.2.7	Threshold adaptation
A check is made every 20 ms to determine whether the VAD decision threshold (thvad) should be changed. This adaptation is carried out according to the flowchart shown in figure 2.2. The constants used are given in table 2.5.
Adaptation takes place in two different situations: firstly whenever ACF[0] is very low and secondly whenever there is a very high probability that speech and information tones are not present.
In the first case, the threshold is adapted if the energy of the input signal is less than pth. The threshold is set to plev without carrying out any further tests because at these very low levels the effect of the signal quantization makes it impossible to obtain reliable results from these tests.
In the second case, the decision threshold (thvad) and the adaptive filter coefficients (rvad) are only updated with the rav1 values when there is a very high probability that speech and information tones are not present. Adaptation occurs if the following conditions are met over a number (adp) of signal frames:
‑	stationarity is detected in the frequency domain;
‑	the signal does not contain a periodic component;
‑	information tones are not present.
The step‑size by which the threshold is adapted is not constant but a proportion of the current value (determined by constants dec and inc). The adaptation begins by experimentally multiplying the threshold by a factor of (1‑1/dec). If the new threshold is now higher than or equal to Pvad times fac then the threshold needed to be decreased and it is left at this new lower level. If, on the other hand, the new threshold level is less than Pvad times fac then the threshold either needed to be increased or kept constant. In this case it is set to Pvad times fac unless this would mean multiplying it by more than a factor of (1+1/inc) (in which case it is multiplied by a factor of (1+1/inc)). The threshold is never allowed to be greater than Pvad+margin.
Table 2.5: Constants and variables for threshold adaptation
Constant
Value
Variable
Initial value
pth
plev
fac
adp
inc
dec
margin
300 000
800 000
3.0
8
16
32
80 000 000
adaptcount
thvad
rvad[0]
rvad[1]
rvad[2]
rvad[3] to
rvad[8]
0
1 000 000
6
‑4
1

All 0

 EMBED Designer  
Figure 2.2: Flow diagram for threshold adaptation

5.2.8	VAD decision
Prior to hangover the VAD decision condition is:
	vvad = pvad > thvad

5.2.9	VAD hangover addition
VAD hangover is only added to bursts of speech greater than or equal to burstconst blocks. The Boolean variable vad indicates the decision of the VAD with hangover included. The values of the constants are given in table 2.6. The hangover algorithm is as follows:
  if vvad then increment(burstcount) else burstcount = 0

  if burstcount >= burstconst then
  begin
    hangcount = hangconst;
    burstcount = burstconst
  end

  vad = vvad or (hangcount >= 0)

  if hangcount >= 0 then decrement(hangcount)

Table 2.6: Constants and variables for VAD hangover addition
Constant
Value
Variable
Initial value
burstconst
hangconst
3
5
burstcount
hangcount
0
‑1

6	Computational details
In the next paragraphs, the detailed description of the VAD algorithm follows the preceding high level description. This detailed description is divided in ten clauses related to the blocks of figure 2.1 (except periodicity updating) in the high level description of the VAD algorithm.
Those clauses are:
1)	adaptive filtering and energy computation;
2)	ACF averaging;
3)	predictor values computation;
4)	spectral comparison;
5)	periodicity detection;
6)	threshold adaptation;
7)	VAD decision;
8)	VAD hangover addition;
9)	periodicity updating;
10)	information tone detection.
The VAD algorithm takes as input the following variables of the RPE‑LTP encoder (see the detailed description of the RPE‑LTP encoder GSM 06.10):
‑	L_ACF[0..8], auto‑correlation function (GSM 06.10/4.2.4);
‑	scalauto, scaling factor to compute the L_ACF[0..8] (GSM 06.10/4.2.4);
‑	Nc, LTP lag (one for each sub‑segment, GSM 06.10/4.2.11);
‑	sof, offset compensated signal frame (GSM 06.10/4.2.2).
So four Nc values are needed for the VAD algorithm.
The VAD computation can start as soon as the L_ACF[0..8] and scalauto variables are known. This means that the VAD computation can take place after part 4.2.4 of GSM 06.10 (Auto‑correlation) of the LPC analysis clause of the RPE‑LTP encoder. This scheme will reduce the delay to yield the VAD information. The periodicity updating (included in subclause 2.2.5) and information tone detection, are done after the processing of the current speech encoder frame.
All the arithmetic operations and names of the variables follow the RPE‑LTP detailed description. To increase the precision within the fixed point implementation, a pseudo‑floating point representation of some variables is used. This stands for the following variables (and related constants) of the VAD algorithm:
pvad:	Energy of filtered signal;
thvad:	Threshold of the VAD decision;
acf0:	Energy of input signal.
For the representation of these variables, two integers (16 bits) are needed:
‑	one for the exponent (e_pvad, e_thvad, e_acf0);
‑	one for the mantissa (m_pvad, m_thvad, m_acf0).
The value e_pvad represents the lowest power of 2 just greater or equal to the actual value of pvad and the m_pvad value represents a integer which is always greater or equal to 16384 (normalized mantissa). It means that the pvad value is equal to:
embed Equation.2 
This scheme guarantees a large dynamic range for the pvad value and always keeps a precision of 16 bits. All the comparisons are easy to make by comparing the exponents of two variables and the VAD algorithm needs only one pseudo‑floating point addition. All the computations related to the pseudo‑floating point variables require very simple 16 or 32 bits arithmetic operations defined in the detailed description of the RPE‑LTP encoder. This pseudo‑floating point arithmetic is only used in subclauses 3.1 and 3.6.
Table 3.1 gives a list of all the variables of the VAD algorithm that must be initialized in the reset procedure and kept in memory for processing the subsequent frame of the RPE‑ LTP encoder. The types (16 or 32 bits) and initial values of all these variables are clearly indicated and their related subclause is also mentioned. The bit exact implementation uses other temporary variables that are introduced in the detailed description whenever it is needed.
Table 3.1: Initial values for variables to be stored in memory
Names of variables:
type (# of bits):
Initialization:
Subclause:
Adaptive filter coefficients:
rvad[0]
16
24 576
3.1, 3.6
rvad[1]
16
‑16 384
3.1, 3.6
rvad[2]
16
4 096
3.1, 3.6
rvad[3..8]
16
0
3.1, 3.6
Scaling factor of ravd[0..8]:
normrvad
16
7
3.1, 3.6
Delay line of the auto‑correlation coefficients:
L_sacf[0..26]
32
0
3.2
L_sav0[0..35]
32
0
3.2
Pointers on the delay lines:
pt_sacf
16
0
3.2
pt_sav0
16
0
3.2
Distance measure:
L_lastdm
32
0
3.4
Periodicity counters:
oldlagcount
16
0
3.5, 3.9
veryoldlagcount
16
0
3.5, 3.9
Adaptive threshold:
e_thvad (exponent)
16
20
3.6
m_thvad (mantissa)
16
31 250
3.6
Counter for adaptation:
adaptcount
16
0
3.6
Hangover flags:
burstcount
16
0
3.8
hangcount
16
‑1
3.8
LTP lag memory:
oldlag
16
40
3.9
Tone Detection
tone
16
0
3.10

6.1	Adaptive filtering and energy computation
This subclause computes the e_pvad and m_pvad variables which represent the pvad value. It needs the L_ACF[0..8] and scalauto variables of the RPE‑LTP algorithm and the rvad[0..8] and normrvad variables produced by subclause 3.6 of the VAD algorithm. It also computes a floating point representation of L_ACF[0] (e_acf0 and m_acf0) used in subclause 3.6.
Test if L_ACF[0] is equal to 0:
IF ( scalauto < 0 ) THEN scalvad = 0;
ELSE scalvad = scalauto;  / keep scalvad for use in subclause 3.2 /

IF ( L_ACF[0] == 0 ) THEN
						| e_pvad = ‑32768;
						| m_pvad = 0;
						| e_acf0 = ‑32768;
						| m_acf0 = 0;
						| EXIT  /continue with subclause 3.2/

Re‑normalization of the L_ACF[0..8]:

normacf = norm( L_ACF[0] );

| FOR i = 0 to 8:
|  sacf[i] = ( L_ACF[i] << normacf ) >> 19;
| NEXT i:

Computation of e_acf0 and m_acf0:

e_acf0 = add( 32, (scalvad << 1 ) );
e_acf0 = sub( e_acf0, normacf);
m_acf0 = sacf[0] << 3;

Computation of e_pvad and m_pvad:

e_pvad = add( e_acf0, 14 );
e_pvad = sub( e_pvad, normrvad );

L_temp = 0;

| FOR i = 1 to 8:
|  L_temp = L_add( L_temp, L_mult( sacf[i], rvad[i] ) );
| NEXT i:

L_temp = L_add( L_temp, L_mult( sacf[0], rvad[0] ) >> 1 );

IF ( L_temp <= 0 ) THEN L_temp = 1;

normprod = norm( L_temp );
e_pvad = sub( e_pvad, normprod );
m_pvad = ( L_temp << normprod ) >> 16;

6.2	ACF averaging
This subclause uses the L_ACF[0..8] and the scalvad variables to compute the array L_av0[0..8] and L_av1[0..8] used in subclause 3.3 and 3.4.
Computation of the scaling factor:

scal = sub( 10, (scalvad << 1) );

Computation of the arrays L_av0[0..8] and L_av1[0..8]:

| FOR i = 0 to 8:
|  L_temp = L_ACF[i] >> scal;
|  L_av0[i] = L_add( L_sacf[i], L_temp );
|  L_av0[i] = L_add( L_sacf[i+9], L_av0[i] );
|  L_av0[i] = L_add( L_sacf[i+18], L_av0[i] );
|  L_sacf[ pt_sacf + i ] = L_temp;
|  L_av1[i] = L_sav0[ pt_sav0 + i ];
|  L_sav0[ pt_sav0 + i] = L_av0[i];
| NEXT i:

Update of the array pointers:

IF ( pt_sacf == 18 ) THEN pt_sacf = 0;
ELSE pt_sacf = add( pt_sacf, 9);

IF ( pt_sav0 == 27 ) THEN pt_sav0 = 0;
ELSE pt_sav0 = add( pt_sav0, 9);

6.3	Predictor values computation
This subclause computes the array rav1[0..8] needed for the spectral comparison and the threshold adaptation. It uses the L_av1[0..8] computed in subclause 3.2, and is divided in the three following subclauses:
‑	Schur recursion to compute reflection coefficients.
‑	Step up procedure to obtain the aav1[0..8].
‑	Computation of the rav1[0..8].

6.3.1	Schur recursion to compute reflection coefficients
This subclause is identical to the one used in the RPE‑LTP algorithm. The array vpar[1..8] is computed with the array L_av1[0..8] as an input.
Schur recursion with 16 bits arithmetic:

IF( L_av1[0] == 0 ) THEN
                         |== FOR i = 1 to 8:
                         |    vpar[i] = 0;
                         |== NEXT i:
                         |    EXIT; /continue with subclause 3.3.2/
temp = norm( L_av1[0] );
|== FOR k=0 to 8:
|    sacf[k] = ( L_av1[k] << temp ) >> 16;
|== NEXT k:

Initialize array P[..] and  K[..] for the recursion:

|== FOR i=1 to 7:
|    K[9‑i] = sacf[i];
|== NEXT i:

|== FOR i=0 to 8:
|    P[i] = sacf[i];
|== NEXT i:

Compute reflection coefficients:

|== FOR n=1 to 8:
|    IF( P[0] < abs( P[1] ) ) THEN
|                                    |== FOR i = n to 8:
|                                    |    vpar[i] = 0;
|                                    |== NEXT i:
|                                    | EXIT; /continue with
|                                    |        subclause 3.3.2/
|    vpar[n] = div( abs( P[1] ), P[0] );
|    IF ( P[1] > 0 ) THEN vpar[n] = sub( 0, vpar[n] );
|    IF ( n == 8 ) THEN EXIT; /continue with subclause 3.3.2/
|
|  Schur recursion:
|
|    P[0] = add( P[0], mult_r( P[1], vpar[n] ) );
|==== FOR m=1 to 8‑n:
|      P[m] = add( P[m+1], mult_r( K[9‑m], vpar[n] ) );
|      K[9‑m] = add( K[9‑m], mult_r( P[m+1], vpar[n] ) );
|==== NEXT m:
|
|== NEXT n:

6.3.2	Step‑up procedure to obtain the aav1[0..8]
Initialization of the step‑up recursion:

L_coef[0] = 16384 << 15;
L_coef[1] = vpar[1] << 14;

Loop on the LPC analysis order:

|= FOR m = 2 to 8:
|== FOR i = 1 to m‑1:
|==  temp = L_coef[m‑i] >> 16;  / takes the msb /
|==  L_work[i] = L_add( L_coef[i], L_mult( vpar[m], temp ) );
|== NEXT i
|=
|== FOR i = 1 to m‑1:
|==  L_coef[i] = L_work[i];
|== NEXT i
|=
|= L_coef[m] = vpar[m] << 14;
|= NEXT m:

Keep the aav1[0..8] on 13 bits for next clause:

| FOR i = 0 to 8:
|  aav1[i] = L_coef[i] >> 19;
| NEXT i:

6.3.3	Computation of the rav1[0..8]
|= FOR i= 0 to 8:
|= L_work[i] = 0;
|== FOR k = 0 to 8‑i:
|==  L_work[i] = L_add( L_work[i], L_mult( aav1[k], aav1[k+i] ) );
|== NEXT k:
|= NEXT i:

IF ( L_work[0] == 0 ) THEN normrav1 =0;
ELSE normrav1 = norm( L_work[0] );

|= FOR i= 0 to 8:
|= rav1[i] = ( L_work[i] << normrav1 ) >> 16;
|= NEXT i:

Keep the normrav1 for use in subclause 3.4 and 3.6.

6.4	Spectral comparison
This subclause computes the variable stat needed for the threshold adaptation. It uses the array L_av0[0..8] computed in subclause 3.2 and the array rav1[0..8] computed in subclause 3.3.3.
Re‑normalize L_av0[0..8]:

IF ( L_av0[0] == 0 ) THEN
                          | FOR i = 0 to 8:
                          |  sav0[i] = 4095;
                          | NEXT i:
ELSE
    | shift = norm( L_av0[0] );
    |= FOR i = 0 to 8:
    |=  sav0[i] = ( L_av0[i] << shift‑3 ) >> 16;
    |= NEXT i:

Compute partial Σ of dm:

L_ Σ p = 0;
|= FOR i = 1 to 8:
|= L_ Σ p = L_add( L_ Σ p, L_mult( rav1[i], sav0[i] ) );
|= NEXT i:

Compute the division of partial Σ by sav0[0]:

IF ( L_ Σ p < 0 ) THEN L_temp = L_sub( 0, L_ Σ p );
ELSE L_temp = L_ Σ p;

IF ( L_temp == 0 ) THEN
                       | L_dm  = 0;
                       | shift = 0;
ELSE
     | sav0[0] = sav0[0] << 3;
     | shift = norm( L_temp );
     | temp  = ( L_temp << shift ) >> 16;
     | IF ( sav0[0] >= temp ) THEN
     |                            | divshift = 0;
     |                            | temp = div( temp, sav0[0] );
     | ELSE
     |     | divshift = 1;
     |     | temp = sub( temp, sav0[0] );
     |     | temp = div( temp, sav0[0] );
     |
     | IF( divshift == 1 ) THEN L_dm = 32768;
     | ELSE L_dm = 0;
     |
     | L_dm = L_add( L_dm, temp) << 1;
     | IF( L_ Σ p < 0 ) THEN L_dm = L_sub( 0,  L_dm);

Re‑normalization and final computation of L_dm:

L_dm = ( L_dm << 14 );
L_dm = L_dm >> shift;
L_dm = L_add( L_dm, ( rav1[0] << 11 ) );
L_dm = L_dm >> normrav1;

Compute the difference and save L_dm:

L_temp   = L_sub( L_dm, L_lastdm );
L_lastdm = L_dm;
IF ( L_temp < 0 ) THEN L_temp = L_sub( 0, L_temp );
L_temp = L_sub( L_temp, 3277 );

Evaluation of the stat flag:

IF ( L_temp < 0 ) THEN stat = 1;
ELSE stat = 0;

6.5	Periodicity detection
This subclause just sets the ptch flag needed for the threshold adaptation.
temp = add( oldlagcount, veryoldlagcount );
IF ( temp >= 4 ) THEN ptch = 1;
ELSE ptch = 0;

6.6	Threshold adaptation
This subclause uses the variables e_pvad, m_pvad, e_acf0 and m_acf0 computed in subclause 3.1. It also uses the flags stat (see subclause 3.4) and ptch (see subclause 3.5). It follows the flowchart represented on figure 2.2.
Some constants, represented by a floating point format, are needed and a symbolic name (in capital letter) for their exponent and mantissa is used; table 3.2 lists all these constants with the symbolic names associated and their numerical constant values.
Table 3.2: List of constants
Constant
Exponent
Mantissa
pth
margin
plev
E_PTH = 19
E_MARGIN = 27
E_PLEV = 20
M_PTH = 18 750
M_MARGIN = 19 531
M_PLEV = 25 000

NOTE:	Floating point representation of constants used in subclause 3.6:
pth	= 2(E_PTH)x(M_PTH/32768).	
margin	= 2(E_MARGIN)x(M_MARGIN/32768).	
plev	= 2(E_PLEV)x(M_PLEV/32768).
Test if acf0 < pth; if yes set thvad to plev:

comp = 0;
IF ( e_acf0 < E_PTH ) THEN  comp = 1;
IF ( e_acf0 == E_PTH ) THEN  IF ( m_acf0 < M_PTH ) THEN comp =1;
IF ( comp == 1 ) THEN
                      | e_thvad = E_PLEV;
                      | m_thvad = M_PLEV;
                      | EXIT; /continue with subclause 3.7/

Test if an adaptation is needed:

comp = 0;
IF ( ptch == 1 ) THEN comp = 1;
IF ( stat == 0 ) THEN comp = 1;
IF ( tone == 1 ) THEN comp = 1;
IF ( comp == 1 ) THEN
                      | adaptcount = 0;
                      | EXIT; /continue with subclause 3.7/

Incrementation of adaptcount:

adaptcount = add( adaptcount, 1 );
IF ( adaptcount <= 8 ) THEN EXIT; /continue with subclause 3.7/

Computation of thvad‑(thvad/dec):

m_thvad = sub( m_thvad, (m_thvad >> 5 ) );
IF ( m_thvad < 16384) THEN
                           | m_thvad = m_thvad << 1;
                           | e_thvad = sub( e_thvad, 1 );

Computation of pvad*fac:

L_temp = L_add( m_pvad, m_pvad );
L_temp = L_add( L_temp, m_pvad );
L_temp = L_temp >> 1;
e_temp = add( e_pvad, 1 );
IF ( L_temp > 32767 ) THEN
                          | L_temp = L_temp >> 1;
                          | e_temp = add( e_temp, 1 );
m_temp = L_temp;

Test if thvad < pvad*fac:

comp = 0;
IF ( e_thvad < e_temp) THEN comp = 1;
IF (e_thvad == e_temp) THEN  IF (m_thvad < m_temp) THEN comp =1;

Computation of minimum (thvad+(thvad/inc), pvad*fac) if comp = 1:

IF ( comp == 1 ) THEN
|  Compute thvad +(thvad/inc).
| L_temp = L_add( m_thvad, (m_thvad >> 4 ) );
| IF ( L_temp > 32767 ) THEN
|                           | m_thvad = L_temp >> 1;
|                           | e_thvad = add( e_thvad,1 );
| ELSE m_thvad = L_temp;
| comp2 = 0;
| IF ( e_temp < e_thvad) THEN comp2 = 1;
| IF (e_temp == e__hvad) THEN IF (m_temp> 1;
                               | e_temp = add( e_pvad, 1 );
ELSE
    | IF ( e_pvad > E_MARGIN ) THEN
    |     | temp = sub( e_pvad, E_MARGIN );
    |     | temp = M_MARGIN >> temp;
    |     | L_temp = L_add( m_pvad, temp );
    |     | IF ( L_temp > 32767) THEN
    |     |                           | e_temp = add( e_pvad, 1 );
    |     |                           | m_temp = L_temp >> 1;
    |     | ELSE
    |     |    | e_temp = e_pvad;
    |     |    | m_temp = L_temp;
    | ELSE
    |     | temp = sub( E_MARGIN, e_pvad );
    |     | temp = m_pvad >> temp;
    |     | L_temp = L_add( M_MARGIN, temp );
    |     | IF (L_temp > 32767) THEN
    |     |                          | e_temp = add( E_MARGIN, 1);
    |     |                          | m_temp = L_temp >> 1;
    |     | ELSE
    |     |    | e_temp = E_MARGIN;
    |     |    | m_temp = L_temp;

Test if thvad > pvad + margin:

comp = 0;
IF ( e_thvad > e_temp) THEN comp = 1;
IF (e_thvad == e_temp) THEN  IF (m_thvad > m_temp) THEN comp =1;

IF ( comp == 1 ) THEN
                      | e_thvad = e_temp;
                      | m_thvad = m_temp;

Initialize new rvad[0..8] in memory:

normrvad  = normrav1;

|= FOR i = 0 to 8:
|= rvad[i] = rav1[i];
|= NEXT i:

Set adaptcount to adp + 1:

adaptcount = 9;

6.7	VAD decision
This subclause only outputs the result of the comparison between pvad and thvad using the pseudo‑floating point representation of thvad and pvad. The values e_pvad and m_pvad are computed in subclause 3.1 and the values e_thvad and m_thvad are computed in subclause 3.6.
vvad = 0;
IF (e_pvad >  e_thvad) THEN vvad = 1;
IF (e_pvad == e_thvad) THEN IF (m_pvad > m_thvad) THEN vvad =1;

6.8	VAD hangover addition
This subclause finally sets the vad decision for the current frame to be processed.
IF ( vvad == 1 ) THEN burstcount = add( burstcount, 1 );
ELSE burstcount = 0;

IF ( burstcount >= 3 ) THEN 
                            | hangcount =  5;
                            | burstcount = 3;

vad = vvad;
IF ( hangcount >= 0 ) THEN
                           | vad = 1;
                           | hangcount = sub( hangcount, 1 );

6.9	Periodicity updating
This subclause must be delayed until the LTP lags are computed by the RPE‑LTP algorithm. The LTP lags called Nc in the speech encoder are renamed lags[0..3] (index 0 for the first sub‑ segment of the frame, 1 for the second and so on).
Loop on sub‑segments for the frame:
lagcount = 0;

|= FOR i = 0 to 3:
|=  Search the maximum and minimum of consecutive lags.
|= IF ( oldlag > lags[i] ) THEN
|=                              | minlag = lags[i];
|=                              | maxlag = oldlag;
|= ELSE
|=      | minlag = oldlag;
|=      | maxlag = lags[i] ;
|=
|= Compute smallag (modulo operation not defined ):
|=
|= smallag = maxlag;
|== | FOR j = 0 to 2:
|== |  IF (smallag >= minlag) THEN smallag =sub( smallag, minlag);
|== | NEXT j;
|=
|= Minimum of smallag and minlag ‑ smallag:
|=
|= temp = sub( minlag, smallag );
|= IF ( temp < smallag ) THEN smallag = temp;
|= IF ( smallag < 2 ) THEN lagcount = add( lagcount, 1 );
|= Save the current LTP lag.
|= oldlag = lags[i];
|= NEXT i:

Update the veryoldlagcount and oldlagcount:
veryoldlagcount = oldlagcount;
oldlagcount     = lagcount;

6.10	Tone detection
This subclause computes the tone variable needed for the threshold adaptation. Tone is only calculated for the VAD in the downlink. In the uplink VAD tone=0.
To reduce delay, this subclause should be calculated after the processing of the current speech encoder frame.
6.10.1	Windowing
This subclause applies a Hanning window to the input frame sof[0..159] to form the output frame sofh[0..159]. The input frame is the current offset compensated signal frame calculated in the RPE‑LTP codec. The array of constants hann[i] is defined in table 3.2.
Multiply signal frame by Hanning window:
|== FOR i = 0 to 79:
|    sofh[i] = mult_r( sof[i], hann[i] );
|    sofh[159‑i] = mult_r( sof[159‑i], hann[i] );
|== NEXT i;

6.10.2	Auto‑correlation
This subclause computes the auto‑correlation vector L_acfh[0..5] from the windowed input frame sofh[0..159]. The input frame must be scaled in order to avoid an overflow situation. This subclause is identical to the one used in the RPE‑LTP algorithm, with the exception that only five auto‑correlation values are calculated.
Dynamic scaling of the array sofh[0..159]:
Search for the maximum:
smax = 0;

|== FOR k = 0 to 159:
|    temp = abs( sofh[k] );
|    IF ( temp > smax ) THEN smax = temp;
|== NEXT k;

Computation of the scaling factor:
IF ( smax == 0 ) THEN scalauto = 0;
ELSE scalauto = sub( 4, norm( smax << 16));

Scaling of the array sofh[0..159]:
IF ( scalauto > 0 ) THEN
                        | temp = 16384 >> sub( scalauto,1);
                        |== FOR k = 0 to 159:
                        |    sofh[k] = mult_r( sofh[k], temp);
                        |== NEXT k:

Compute the L_ACF[..]:
|== FOR k=0 to 4:
|    L_acfh[k] = 0;
|==== FOR i=k to 159:
|      L_temp = L_mult( sofh[i], sofh[i‑k] );
|      L_acfh[k] = L_add( L_acfh[k], L_temp );
|==== NEXT i:
|== NEXT k:

6.10.3	Computation of the reflection coefficients
This subclause calculates the reflection coefficients rc[1..4] from the input array L_acfh[0..4]. This procedure is identical to the one in subclause 3.3.1 and the RPE‑LTP codec, with the exception that only four reflection coefficients are calculated.
Schur recursion with 16 bits arithmetic:
IF( L_acfh[0] == 0 ) THEN
                         |== FOR i = 1 to 4:
                         |    rc[i] = 0;
                         |== NEXT i:
                         | EXIT; /continue with subclause 3.10.4/
temp = norm( L_acfh[0] );
|== FOR k=0 to 4:
|    sacf[k] = ( L_acfh[k] << temp ) >> 16;
|== NEXT k:

Initialize array P[..] and  K[..] for the recursion:
|== FOR i=1 to 3:
|    K[5‑i] = sacf[i];
|== NEXT i:

|== FOR i=0 to 4:
|    P[i] = sacf[i];
|== NEXT i:

Compute reflection coefficients:
|== FOR n=1 to 4:
|    IF( P[0] < abs( P[1] ) ) THEN
|                                    |== FOR i = n to 4:
|                                    |    rc[i] = 0;
|                                    |== NEXT i:
|                                    | EXIT; /continue with subclause 3.10.4/
|    rc[n] = div( abs( P[1] ), P[0] );
|    IF ( P[1] > 0 ) THEN rc[n] = sub( 0, rc[n] );
|    IF ( n == 4 ) THEN EXIT; /continue with subclause 3.10.4/
|
Schur recursion:
|    P[0] = add( P[0], mult_r( P[1], rc[n] ) );
|==== FOR m=1 to 4‑n:
|      P[m] = add( P[m+1], mult_r( K[5‑m], rc[n] ) );
|      K[5‑m] = add( K[5‑m], mult_r( P[m+1], rc[n] ) );
|==== NEXT m:
|
|== NEXT n:

6.10.4	Filter coefficient calculation
This subclause calculates the direct form filter coefficients a[1..2] from the reflection coefficients rc[1..4].
Step‑up procedure to obtain the a[1..2]:
temp = rc[1] >> 2;  
a[1] = add( temp, mult_r( rc[2], temp ) );
a[2] = rc[2] >> 2;

6.10.5	Pole Frequency Test
This subclause uses the direct form filter coefficients a[1..2] to determine the pole frequency of the second order LPC analysis. If the pole frequency is less than 385 Hz tone is set to 0 and clause 3 terminates.
L_den = L_mult ( a[1], a[1] );

L_temp = a[2] << 16;
L_num = L_sub ( L_temp, L_den );

If pole is not complex then exit:
IF ( L_num <= 0 ) THEN 
                      | tone = 0;
                      | EXIT; /clause 3 complete/

If pole frequency is less than 385 Hz then exit:
IF ( a[1] < 0) THEN
                     | temp = L_den >> 16;
                     | L_den = L_mult ( temp, 3189 );
                     | L_temp = L_sub ( L_num, L_den );
                     | IF ( L_temp < 0 ) THEN
                                             | tone = 0;
                                             | EXIT; /clause 3 complete/

6.10.6	Prediction gain test
This subclause uses the reflection coefficients rc[1..4] to calculate the prediction gain. If the prediction gain is greater than 13,5 dB then tone is set to 1 otherwise tone is set to 0.
Calculate normalized prediction error:
prederr = 32767;

|== FOR i=1 to 4
|    temp = mult ( rc[i], rc[i] );
|    temp = sub ( 32767, temp);
|    prederr = mult( prederr, temp );
|== NEXT i;

Test if prediction error is smaller than threshold:
temp = sub ( prederr, 1464 );

IF ( temp < 0 ) THEN tone = 1;
 ELSE tone = 0;

Table 3.2: Values of the Hanning window array hann[i]
i 
hann 
i 
hann 
i 
hann 
i 
hann 
0 
0 
20 
4856 
40 
16545 
60 
28139 
1 
12 
21 
5325 
41 
17192 
61 
28581 
2 
51 
22 
5811 
42 
17838 
62 
29003 
3 
114 
23 
6314 
43 
18482 
63 
29406 
4 
204 
24 
6832 
44 
19122 
64 
29789 
5 
318 
25 
7365 
45 
19758 
65 
30151 
6 
458 
26 
7913 
46 
20389 
66 
30491 
7 
622 
27 
8473 
47 
21014 
67 
30809 
8 
811 
28 
9046 
48 
21631 
68 
31105 
9 
1025 
29 
9631 
49 
22240 
69 
31377 
10 
1262 
30 
10226 
50 
22840 
70 
31626 
11 
1523 
31 
10831 
51 
23430 
71 
31852 
12 
1807 
32 
11444 
52 
24009 
72 
32053 
13 
2114 
33 
12065 
53 
24575 
73 
32230 
14 
2444 
34 
12693 
54 
25130 
74 
32382 
15 
2795 
35 
13326 
55 
25670 
75 
32509 
16 
3167 
36 
13964 
56 
26196 
76 
32611 
17 
3560 
37 
14607 
57 
26707 
77 
32688 
18 
3972 
38 
15251 
58 
27201 
78 
32739 
19 
4405 
39 
15898 
59 
27679 
79 
32764 

7	Digital test sequences
This clause provides information on the digital test sequences that have been designed to help the verification of implementations of the Voice Activity Detector. Copies of these sequences are available (see clause A.2).
7.1	Test configuration
The VAD must be tested in conjunction with the speech encoder defined in GSM 06.10. The test configuration is shown in figure 4.1. The input signal to the speech encoder is the sop[...] signal as defined in GSM 06.10 table 5.1. The relevant parameters produced by the speech encoder are input to the VAD algorithm to produce the VAD output. This output has to be checked against some reference files.
The file format of the encoder output parameters given in GSM 06.10 table 5.1 is extended to carry the VAD information.
The VAD information is placed in the unused bit 15 (MSB) of the first encoded parameter:
    LAR(1): bit 15 = 1 if VAD on
            bit 15 = 0 if VAD 0ff

Furthermore, in order to facilitate approval testing over the air interface, the SP flag generated by the TX DTX handler (see GSM 06.31) on the basis of the VAD flag is placed in the MSB position of the second encoded parameter:

    LAR(2): bit 15 = 1 if SP on
            bit 15 = 0 if SP off

The output file will also contain the SID codeword and the comfort noise parameters as described in GSM 06.12 and GSM 06.31.
 EMBED Designer  
Figure 4.1: VAD test configuration

7.2	Test sequences
The test sequences are described in detail in clause A.2.
Annex A (informative):

A.1	Simplified block filtering operation
Consider an 8th order transversal filter with filter coefficients a0..a8, through which a signal is being passed, the output of the filter being:
 EMBED Equation.2  	(1)
If we apply block filtering over 20 ms segments, then this equation becomes:
 EMBED Equation.2  	(2)
If the energy of the filtered signal is then obtained for every 20 ms segment, the equation for this is:
 EMBED Equation.2  	(3)
We know that (see GSM 06.10, subclause 3.1.4):

 EMBED Equation.2  	(4)
If equation (3) is expanded and acf0..acf8 are substituted for sn then we arrive at the equations:
 EMBED Equation.2  	(5)
Where:
 EMBED Equation.2  	(6)

A.2	Description of digital test sequences
A.2.1	Test sequences
The VAD algorithm uses results from the full rate speech encoder defined in GSM 06.10. In the testing of the VAD, it is assumed that the relevant speech encoder functions have been verified by the test sequences defined in GSM 06.10.
The five types of input sequences are briefly described below.

Spectral comparison

The two kinds of statements of the spectral comparison algorithm (subclause 3.4), arithmetic statements and control statements, are tested by separate test sequences.
	Arithmetic statements:
		spec_a1.*	
		spec_a2.*	
	Control statements
		spec_c1.*
		spec_c2.*
		spec_c3.*
		spec_c4.*
Threshold adaptation

There are two types of tests to verify the threshold adaptation described in subclause 3.6:
		adapt_i1.*
		adapt_i2.*

The initial test sequences test the acf0 and VAD decision. A fault in the VAD decision will cause all the other sequences to fail, so it is recommended that this test is run before all other tests.
		adapt_m1.*
		adapt_m2.*

The main test sequences will check the basic threshold adaptation mechanism.
Periodicity detection

		pitch1.*
		pitch2.*

These sequences check the periodicity detection algorithm described in subclause 3.5.
Tone detection

The tone detector test sequences are only required for downlink VAD implementations. There are three types of test to verify the tone detection algorithm described in subclause 3.10. The first test sequence tests the operation of the tone detector by means of a frequency sweep:
		freq_sw.*
The following test sequences test the prediction gain calculation within the tone detector:
		pred1.*
		pred2.*
The following sequences test the second order pole frequency calculation within the tone detector:
		pole1.*
		pole2.*
"Safety" and initialization
		safety.*
This sequence checks that safety tests have been implemented to prevent zero values being passed to the norm function. It checks the functions described in the Adaptive Filtering and Energy Computation subclause (subclause 3.1), and the Predictor Values Computation (subclause 3.3). This sequence also checks the initialization of thvad and the rvad array.
Real speech
		good_sp.*
		bad_sp.*
Because the test sequences cannot be guaranteed to find every possible error, there is a small possibility that an implementation of the correct output for test sequences, but fail with real speech. Because of this, an extra set of sequences are included that consist of barely detectable speech and very clean speech.
There are 3 different file extensions:
	*.inp: speech encoder input sequences, binary files
	*.vad: output flag of the VAD algorithm, ASCII files 
*.cod: TX DTX handler output sequences, binary files for comparison with VAD/DTX handler output.
The *.cod files contain speech coder output information in the format described in clause 4.
It should be noted that there is no requirement in GSM 06.12 for a bit exact implementation of the averaging procedure to calculate the "LAR" and "xmax" parameters in the SID frames. Different implementations are allowed.
The algorithms used for the calculation of the LAR and xmax parameters of the SID frames are therefore reproduced below:
LAR averaging:

| FOR i = 1 to 8:
|  L_Temp = 2;	/* const. for rounding*/
|  | FOR n = 1 to 4:
|  |  L_Temp1 = LAR[j‑n](i);	/*conversion 16 ‑‑> 32 bit*/
|  |  L_Temp  = L_Add( L_Temp , L_Temp1 );
|  | NEXT n
|  L_Temp = L_temp >> 2; 
|  mean (LAR(i)) = L_Temp;	/*conversion 32 ‑‑> 16 bit*/
| NEXT i;

xmax averaging

L_Temp = 8;	/* const. for rounding*/

| FOR n = 1 to 4:
|  | FOR i = 1 to 4:
|  |  L_Temp1 = xmax[j‑n](i);	/*conversion 16 ‑‑> 32 bit*/
|  |  L_Temp  = L_Add( L_Temp , L_Temp1 );
|  | NEXT i
| NEXT n

L_Temp = L_Temp >> 4;

mean (xmax) = L_Temp;	/*conversion 32 ‑‑> 16 bit*/

A.2.2	File format description
All the *.inp and *.cod files are written in binary using 16 bit words, while all *.vad files are written in ASCII format. The sizes of the files are shown in table A.2.1, A.2.2 and A.2.3. The detailed format of the *.inp and *.cod files is in accordance with the descriptions given in GSM 06.10 clause 5.
Table A.2.1: File sizes for *.inp extension files
File:
Frames:
Size in bytes:
spec_a1.inp
22
7 040
spec_a2.inp
22
7 040
spec_c1.inp
48
15 360
spec_c2.inp
48
15 360
spec_c3.inp
48
15 360
spec_c4.inp
48
15 360
adapt_i1.inp
67
21 440
adapt_i2.inp
48
15 360
adapt_m1.inp
403
128 960
adapt_m2.inp
376
120 320
pitch1.inp
35
11 200
pitch2.inp
35
11 200
freq_sw.inp
560
179 200
pred1.inp
126
40 320
pred2.inp
126
40 320
pole1.inp
97
31 040
pole2.inp
42
13 440
safety.inp
5
16 00
good_sp.inp
312
99 840
bad_sp.inp
312
99 840

Table A.2.2: File sizes for *.cod extension files
File:
Frames:
Size in bytes:
spec_a1.cod
22
3 344
spec_a2.cod
22
3 344
spec_c1.cod
48
7 296
spec_c2.cod
48
7 296
spec_c3.cod
48
7 296
spec_c4.cod
48
7 296
adapt_i1.cod
67
10 184
adapt_i2.cod
48
7 296
adapt_m1.cod
403
61 256
adapt_m2.cod
376
57 152
pitch1.cod
35
5 320
pitch2.cod
35
5 320
freq_sw.cod
560
85 120
pred1.cod
126
19 152
pred2.cod
126
19 152
pole1.cod
97
14 744
pole2.cod
42
6 384
safety.cod
5
760
good_sp.cod
312
47 424
bad_sp.cod
312
47 424

Table A.2.3: File sizes for *.vad extension files
File:
Frames:
Size in bytes:
spec_a1.vad
22
88
spec_a2.vad
22
88
spec_c1.vad
48
192
spec_c2.vad
48
192
spec_c3.vad
48
192
spec_c4.vad
48
192
adapt_i1.vad
67
268
adapt_i2.vad
48
192
adapt_m1.vad
403
1 612
adapt_m2.vad
376
1504
pitch1.vad
35
140
pitch2.vad
35
140
freq_sw.inp
560
2 240
pred1.vad
126
504
pred2.vad
126
504
pole1.vad
97
388
pole2.vad
42
168
safety.vad
5
20
good_sp.vad
312
1 248
bad_sp.vad
312
1 248

A.3	VAD performance
In optimizing a VAD a difficult trade‑off has to be made between speech clipping which reduces the subjective performance of the system, and the average activity factor. The benefit of DTX is increased as the average activity factor is reduced. However, in general, a reduction of the activity will be associated with a greater risk for audible speech clipping.
In the optimization process, great emphasis has been placed on avoiding unnecessary speech clipping. However, it has been found that a VAD with virtually no audible clipping would result in a very high activity and very little DTX advantage.
The VAD specified in this technical specification introduces audible and possibly objectionable clipping in certain cases, mainly with low input levels. However, a comprehensive evaluation programme consisting of about 600 individual conversations conducted in a wide range of realistic conditions, it was found that about 90% of the conversations were free from objectionable clipping.
The voice activity performance of the VAD is summarized in table A.3.1. The activity figures are averages of a large number of conversations covering factors like different talkers, noise characteristics and locations. It should be noted that the actual activity of a particular talker in a specific conversation may vary considerably relative to the averages given. This is due both to the variation in talker behaviour as well as to the level dependency of the VAD (the channel activity has been found to decrease by about 0,5 points of percentage per dB level reduction). However, as mentioned above, a decreased speech input level increases the risk of objectionable speech clipping.
All the values given are activity figures, i.e. the % of time the radio channel has to be on.
Table A.3.1: Summary of channel activity
Telephone
instrument
Situation
Typical channel
activity factor:
Handset
Quiet location
55%
Handset
Moderate office
noise with
voice interference

60%
Handset
Strong voice
interference (e.g.
airport/railway station)


65‑70%
Handsfree/
handset
Variable vehicle
noise

60%

A.4	Pole frequency calculation
This annex describes the algorithm used to determine whether the pole frequency for a second order analysis of the signal frame is less than 385 Hz.
The filter coefficients for a second order synthesis filter are calculated from the first two unquantized reflection coefficients rc[1..2] obtained from the speech encoder. This is done using the routine described in subclause 3.10.4. If the filter coefficients a[0..2] are defined such that the synthesis filter response is given by:
	H(z) = 1 / (a[0] + a[1]z‑1   + a[2]z‑2 )									(1)
Then the positions of the poles in the Z‑plane are given by the solutions to the following quadratic:
	a[0]z2  + a[1]z + a[2] = 0,		a[0] = 1								(2)
The positions of the poles, z, are therefore:
	z = re ± j*sqrt(im),			j2  = ‑1									(3)
where:
	re = ‑ a[1] / 2 													(4)
	im = (4*a[2] ‑ a[1]2 ) / 4											(5)
If im is negative then the poles lie on the real axis of the Z‑plane and the signal is not a tone and the algorithm terminates. If re is negative then the poles lie in the left hand side of the Z‑plane and the frequency is greater than 2 000 Hz and the prediction error test can be performed.
If im is positive and re is positive then the poles are complex and lie in the right hand side of the Z‑plane and the frequency in Hz is related to re and im by the expression:
	freq = arctan (sqrt(im)/re ) * 4 000 / π									(6)
Having ensured that both im and re are positive, the test for a dominant frequency less than 385 Hz can be derived by substituting Equations 4 and 5 into Equation 6 and re‑arranging:
	(4*a[2] ‑ a[1]2 ) / a[1]2 < (tan(π*385/4 000))2								(7)
or
	(4*a[2] ‑ a[1]2 ) / a[1]2 < 0.0973										(8)
If this test is true then the signal is not a tone and the algorithm terminates, otherwise the prediction error test is performed.
Annex B (normative):
Test sequences
The test vectors are described in the present document are supplied in archive en_300965v080000p0.zip which accompanies the present document. The files contained in this archive are listed in clause A.2.
The full rate test vectors apply to both GSM Phase 1 and Phase 2. However, the files pole1.* pole2.* pred1.* pred2.* and freq_sw.* are not required for Phase 1 (uplink and downlink) and Phase 2 uplink implementations.
Annex C (informative):
Change history

Change history
SMG No.
TDoc. No.
CR. No.
Section affected
New version
Subject/Comments
SMG#09



4.0.5
ETSI Publication
SMG#17



4.2.1
ETSI Publication
SMG#23



4.3.1
ETSI Publication
SMG#23



5.0.3
Release 1996 version
SMG#27



6.0.0
Release 1997 version
SMG#29



7.0.0
Release 1998 version




7.0.1
Version update to 7.0.1 for Publication
SMG#31



8.0.0
Release 1999 version


Change history
DateTSG #TSG Doc.CRRevSubject/CommentOldNew03-200111Version for Release 44.0.006-200216Version for Release 54.0.05.0.012-200426Version for Release 65.0.06.0.006-200736Version for Release 76.0.07.0.012-200842Version for Release 87.0.08.0.012-200946Version for Release 98.0.09.0.0




















 STYLEREF ZA 3GPP TS 46.032 V9.0.0 (2009-12)
 PAGE 36
 STYLEREF ZGSM Release 9


3GPP




Version Control

Version Control

2 versions available

Download & Access

Technical Details

AI Classification

Category: 7. Testování a interoperabilita
Subcategory: 7.1 Conformance Testing
Function: Test specification

Version Information

Release: Rel-9
Version: 900
Series: 46_series
Published: 2009-12

Document Info

Type: Technical Specification
TSG: Services and System Aspects;

Keywords & Refs

Keywords:
UMTSLTEGSM

Partners

Contributors:
ARIBETSIATIS+3

File Info

File: 46032-900
Processed: 2025-06-22

3GPP Spec Explorer - Enhanced specification intelligence