##### Document Text Contents

Page 1

Zeros of the z-transform (ZZT)

representation and chirp group

delay processing for the analysis

of source and filter

characteristics of speech signals

Baris Bozkurt

Supervisor: Prof. Dr. Ir. Thierry Dutoit

Dissertation submitted to the Faculté Polytechnique de Mons

for the degree of Doctor of Philosophy in applied sciences

Page 2

2

Page 60

the unit circle are kept inside and further pushed away from the unit circle. This results in a zero-

gap on the unit circle and smooth group delay functions with characteristics matching the mixed-

phase model we have presented. Below in Fig. 25 (equivalent ZZT plots presented in Fig. 12 in

section 3), we present the group delay functions obtained from a real speech signal for six different

locations of the window.

Fig. 25: Effect of windowing location to group delay of a real speech signal.

Each (Blackman) window position is indicated on the signal on the top figure with reference

numbers. The group delay function of the resulting windowed data for each window is presented

with the window index indicated on the right-top corner of the figure.

Therefore two criteria are derived from these observations for reliable phase/group delay function

estimation: window center should be synchronized with GCI instants and the boundaries should

correspond to zero-crossings of the signal. The second condition may result in asymmetric

windowing when the distance from zero-crossing on two sides of the GCI are not the same.

Actually, this does not appear to be an important problem in the examples we have studied. Using a

smooth window function with zero boundaries removes such discontinuities. Matching with zero-

crossing is necessary only for windows with non-zero boundaries and asymmetric windows can be

used in that case, i.e. two sides of the window may have different lengths.

Effects of window size on group delay functions

Window size is also important. There is especially a big difference in group delay functions

obtained with a window size smaller than two pitch periods and a window size bigger than two

pitch periods. For windows larger than two pitch periods, the signal contains several periods, which

means an impulse train component can be considered to be included. This results in ZZT of

60

Page 61

impulse train to appear close to the unit circle introducing spikes in the group delay function. This

is demonstrated in Fig. 26 where window center is at GCI and we only vary the window size. A

window size in the T0-2T0 range appears to be a good choice for group delay processing.

Fig. 26: Effect of windowing size to group delay of a synthetic speech signal.

Each (Blackman ) window size is indicated on the window waveform on the top figure. The group

delay function of the resulting windowed data for each window is presented with the window size

indicated on the left-top corner of the figure.

Effects of window function on group delay functions

Windowing function is also important but comparatively less important than window size and

location once we limit ourselves with commonly used window functions listed above (see Fig. 27

for group delay functions obtained on the same data frame using different window functions). We

observed that three types of windowing functions provide best group delay functions: Blackman,

Gaussian and Hanning-Poisson.

The Hanning-Poisson windows provide the smoothest group delay functions since the Poisson

contribution of the window is composed of exponential functions. Windowing with a Hanning-

Poisson results in multiplication of exponentials of the Poisson function and the speech signal, thus

addition of decay coefficients of the window and the glottal flow and vocal tract responses (Eq.

3.11 and Eq. 3.12). This shifts zeros further away from the unit circle. For this reason, Hanning-

Poisson window is preferable in group delay based analysis methods. Hanning-Poisson and

Gaussian are the functions for which the smoothness of the representation can be adjusted to some

level with the decay coefficient (these two window functions have an independent user controlled

parameter for adjusting decay coefficients).

61

Page 120

120

[Yegnanarayana & Murthy, 1992] B. Yegnanarayana, H. A. Murthy, ‘Significance of group delay

functions in spectrum estimation.’ IEEE Trans. on Signal Processing, vol. 40, no. 9, Sept. 1992,

pp. 2281-2289.

[Yegnanarayana et al, 1998] B.Yegnanarayana, C.d'Alessandro, V. Darsinos, ‘An iterative algorithm

for decomposition of speech signals into periodic and aperiodic components.’ IEEE Transaction on

Speech and Audio Processing, vol. 6 (1):1-11, 1998.

[You, 2004] H. You, ‘Application of long-term filtering to formant estimation.’ Proc. Interspeech-

ICSLP, Korea, 2004.

[Zheng & Hasegawa-Johnson, 2003] Y. Zheng, M. Hasegawa-Johnson, ‘Particle filtering approach to

Bayesian formant tracking.’ Proc. IEEE Workshop on Statistical Signal Processing, 2003.

[Zhu & Paliwal, 2004] D. Zhu, K. K. Paliwal, ‘Product of power spectrum and group delay function for

speech recognition.’ Proc. ICASSP, Montreal, 2004, pp. 125-128.

[Zolfaghari et al, 2003] P. Zolfaghari, T. Nakatani, T. Irino, H. Kawahara, and F. Itakura, ‘Glottal

closure instant synchronous sinusoidal model for high quality speech analysis/synthesis.’ Proc. of

Eurospeech, Geneva, 2003. pp. 2441-2444.

[Zolfaghari & Robinson, 1996] P. Zolfaghari, T. Robinson. ‘Formant analysis using mixtures of

gaussians.’ Proc. ICSLP, Philadelphia, 1996, pp. 1229-1232.

[www-DarpaDBA] http://www.ldc.upenn.edu/readme_files/timit.readme.html

[www-Ellis] D. Ellis, Lecture notes on speech production : http://www.ee.columbia.edu/~dpwe/e6820/

[www-Mbrola] http://tcts.fpms.ac.be/synthesis/mbrola.html

[www-Strut] J.-M. Boite, L. Couvreur, S. Dupont and C. Ris, Speech Training and Recognition Unified

Tool (STRUT), http://tcts.fpms.ac.be/asr/project/strut.

[www-Praat] http://www.praat.org

[www-Voqual03] http://www.limsi.fr/VOQUAL

[www-WinSnoori] http://www.loria.fr/~laprie/WinSnoori/

http://www.ee.columbia.edu/%7Edpwe/e6820/

http://www.praat.org/

Page 121

121

Zeros of the z-transform (ZZT)

representation and chirp group

delay processing for the analysis

of source and filter

characteristics of speech signals

Baris Bozkurt

Supervisor: Prof. Dr. Ir. Thierry Dutoit

Dissertation submitted to the Faculté Polytechnique de Mons

for the degree of Doctor of Philosophy in applied sciences

Page 2

2

Page 60

the unit circle are kept inside and further pushed away from the unit circle. This results in a zero-

gap on the unit circle and smooth group delay functions with characteristics matching the mixed-

phase model we have presented. Below in Fig. 25 (equivalent ZZT plots presented in Fig. 12 in

section 3), we present the group delay functions obtained from a real speech signal for six different

locations of the window.

Fig. 25: Effect of windowing location to group delay of a real speech signal.

Each (Blackman) window position is indicated on the signal on the top figure with reference

numbers. The group delay function of the resulting windowed data for each window is presented

with the window index indicated on the right-top corner of the figure.

Therefore two criteria are derived from these observations for reliable phase/group delay function

estimation: window center should be synchronized with GCI instants and the boundaries should

correspond to zero-crossings of the signal. The second condition may result in asymmetric

windowing when the distance from zero-crossing on two sides of the GCI are not the same.

Actually, this does not appear to be an important problem in the examples we have studied. Using a

smooth window function with zero boundaries removes such discontinuities. Matching with zero-

crossing is necessary only for windows with non-zero boundaries and asymmetric windows can be

used in that case, i.e. two sides of the window may have different lengths.

Effects of window size on group delay functions

Window size is also important. There is especially a big difference in group delay functions

obtained with a window size smaller than two pitch periods and a window size bigger than two

pitch periods. For windows larger than two pitch periods, the signal contains several periods, which

means an impulse train component can be considered to be included. This results in ZZT of

60

Page 61

impulse train to appear close to the unit circle introducing spikes in the group delay function. This

is demonstrated in Fig. 26 where window center is at GCI and we only vary the window size. A

window size in the T0-2T0 range appears to be a good choice for group delay processing.

Fig. 26: Effect of windowing size to group delay of a synthetic speech signal.

Each (Blackman ) window size is indicated on the window waveform on the top figure. The group

delay function of the resulting windowed data for each window is presented with the window size

indicated on the left-top corner of the figure.

Effects of window function on group delay functions

Windowing function is also important but comparatively less important than window size and

location once we limit ourselves with commonly used window functions listed above (see Fig. 27

for group delay functions obtained on the same data frame using different window functions). We

observed that three types of windowing functions provide best group delay functions: Blackman,

Gaussian and Hanning-Poisson.

The Hanning-Poisson windows provide the smoothest group delay functions since the Poisson

contribution of the window is composed of exponential functions. Windowing with a Hanning-

Poisson results in multiplication of exponentials of the Poisson function and the speech signal, thus

addition of decay coefficients of the window and the glottal flow and vocal tract responses (Eq.

3.11 and Eq. 3.12). This shifts zeros further away from the unit circle. For this reason, Hanning-

Poisson window is preferable in group delay based analysis methods. Hanning-Poisson and

Gaussian are the functions for which the smoothness of the representation can be adjusted to some

level with the decay coefficient (these two window functions have an independent user controlled

parameter for adjusting decay coefficients).

61

Page 120

120

[Yegnanarayana & Murthy, 1992] B. Yegnanarayana, H. A. Murthy, ‘Significance of group delay

functions in spectrum estimation.’ IEEE Trans. on Signal Processing, vol. 40, no. 9, Sept. 1992,

pp. 2281-2289.

[Yegnanarayana et al, 1998] B.Yegnanarayana, C.d'Alessandro, V. Darsinos, ‘An iterative algorithm

for decomposition of speech signals into periodic and aperiodic components.’ IEEE Transaction on

Speech and Audio Processing, vol. 6 (1):1-11, 1998.

[You, 2004] H. You, ‘Application of long-term filtering to formant estimation.’ Proc. Interspeech-

ICSLP, Korea, 2004.

[Zheng & Hasegawa-Johnson, 2003] Y. Zheng, M. Hasegawa-Johnson, ‘Particle filtering approach to

Bayesian formant tracking.’ Proc. IEEE Workshop on Statistical Signal Processing, 2003.

[Zhu & Paliwal, 2004] D. Zhu, K. K. Paliwal, ‘Product of power spectrum and group delay function for

speech recognition.’ Proc. ICASSP, Montreal, 2004, pp. 125-128.

[Zolfaghari et al, 2003] P. Zolfaghari, T. Nakatani, T. Irino, H. Kawahara, and F. Itakura, ‘Glottal

closure instant synchronous sinusoidal model for high quality speech analysis/synthesis.’ Proc. of

Eurospeech, Geneva, 2003. pp. 2441-2444.

[Zolfaghari & Robinson, 1996] P. Zolfaghari, T. Robinson. ‘Formant analysis using mixtures of

gaussians.’ Proc. ICSLP, Philadelphia, 1996, pp. 1229-1232.

[www-DarpaDBA] http://www.ldc.upenn.edu/readme_files/timit.readme.html

[www-Ellis] D. Ellis, Lecture notes on speech production : http://www.ee.columbia.edu/~dpwe/e6820/

[www-Mbrola] http://tcts.fpms.ac.be/synthesis/mbrola.html

[www-Strut] J.-M. Boite, L. Couvreur, S. Dupont and C. Ris, Speech Training and Recognition Unified

Tool (STRUT), http://tcts.fpms.ac.be/asr/project/strut.

[www-Praat] http://www.praat.org

[www-Voqual03] http://www.limsi.fr/VOQUAL

[www-WinSnoori] http://www.loria.fr/~laprie/WinSnoori/

http://www.ee.columbia.edu/%7Edpwe/e6820/

http://www.praat.org/

Page 121

121