Download Zeros of the z-transform (ZZT) PDF

TitleZeros of the z-transform (ZZT)
Author
LanguageEnglish
File Size3.6 MB
Total Pages121
Table of Contents
                            Chapter I:                                                Introduction
	I.1. Motivations
		The (hi)story of this study
	I.2. Original contributions of the thesis
		ZZT Representation of signals
			Chirp group delay processing
			Applications of ZZT and chirp group delay
	I.3. Plan
	Chapter II:                                                State-of-the-art
		II.1. Introduction
		II.2. Glottal flow estimation and voice quality analysis
			Glottal flow signal estimation methods
				Glottal flow parameter estimation methods
				Applications of glottal flow estimation in voice quality analysis for concatenative TTS
		II.3. Formant Tracking
		II.4. Phase Processing of Speech
			Phase processing in sinusoidal/harmonic modeling
				Phase processing in speech perception
				Phase processing in speech analysis
				Phase processing in automatic speech recognition
	First Part                                                           Spectral Representation of Speech by Zeros of the Z-Transform (ZZT) and Chirp Group Delay
		Chapter III:                                                 Zeros of the z-transform (ZZT) representation of speech
			III.1. Introduction
			III.2. Definition
				Finding the roots of high degree polynomials
			III.3. ZZT representation of speech signals
				III.3.1. ZZT of some basic signals
				III.3.2. ZZT of the glottal flow signal
				III.3.3.  ZZT representation and source-filter model of speech
				III.3.4. ZZT of windowed synthetic speech signals
				III.3.5. ZZT of aperiodic components in speech
				III.3.6. Conclusion
		Chapter IV:                                                  Chirp group delay processing of signals
			IV.1. Introduction
			IV.2. Methods proposed by Yegnanarayana and Murthy for group delay processing
				Terminology
			IV.3. Phase processing of mixed-phase signals
			IV.4. Mixed-phase speech model
			IV.5. Effects of windowing on group delay functions
				Effects of window location on group delay functions
			IV.6. Chirp group delay processing of speech
			IV.7. Conclusion
	Second Part                                                                  Applications of ZZT and Chirp Group Delay Processing In Speech Analysis
		Chapter V:                                                  Applications of ZZT and Chirp Group Delay Processing in Speech Analysis
			V.1. ZZT-decomposition for source-filter separation of speech
				V.1.1. The ZZT-decomposition algorithm
				V.1.2. Examples and evaluation of the decomposition algorithm
				V.1.3. Mixed-phase decomposition using complex cepstrum
				V.1.4. Conclusions
			V.2. Application to glottal flow parameter estimation
				V.2.1. Testing the Fg estimation algorithm
				V.2.2. Conclusions
			V.3. Application to formant tracking
				V.3.1. Formant tracker – first version
				V.3.2. Formant tracker – second version (DPPT)
				V.3.3. Formant tracker – third version (Fast-DPPT)
			V.4. A Linear Prediction (LP) algorithm to estimate the glottal flow component from speech signals
				V.4.1. The MixLP algorithm
			V.5. Application to speech recognition
				V.5.1. Group delay based features
				V.5.2. ASR experiments
				V.5.3. Discussion and conclusion
		Chapter VI:                                                 Conclusion and Future Works
			VI.1. Conclusions
			VI.2. Future works
                        
Document Text Contents
Page 1

Zeros of the z-transform (ZZT)
representation and chirp group

delay processing for the analysis
of source and filter

characteristics of speech signals


Baris Bozkurt



Supervisor: Prof. Dr. Ir. Thierry Dutoit




















Dissertation submitted to the Faculté Polytechnique de Mons
for the degree of Doctor of Philosophy in applied sciences

Page 2

2

Page 60

the unit circle are kept inside and further pushed away from the unit circle. This results in a zero-
gap on the unit circle and smooth group delay functions with characteristics matching the mixed-
phase model we have presented. Below in Fig. 25 (equivalent ZZT plots presented in Fig. 12 in
section 3), we present the group delay functions obtained from a real speech signal for six different
locations of the window.


Fig. 25: Effect of windowing location to group delay of a real speech signal.

Each (Blackman) window position is indicated on the signal on the top figure with reference
numbers. The group delay function of the resulting windowed data for each window is presented
with the window index indicated on the right-top corner of the figure.


Therefore two criteria are derived from these observations for reliable phase/group delay function
estimation: window center should be synchronized with GCI instants and the boundaries should
correspond to zero-crossings of the signal. The second condition may result in asymmetric
windowing when the distance from zero-crossing on two sides of the GCI are not the same.
Actually, this does not appear to be an important problem in the examples we have studied. Using a
smooth window function with zero boundaries removes such discontinuities. Matching with zero-
crossing is necessary only for windows with non-zero boundaries and asymmetric windows can be
used in that case, i.e. two sides of the window may have different lengths.

Effects of window size on group delay functions

Window size is also important. There is especially a big difference in group delay functions
obtained with a window size smaller than two pitch periods and a window size bigger than two
pitch periods. For windows larger than two pitch periods, the signal contains several periods, which
means an impulse train component can be considered to be included. This results in ZZT of


60

Page 61

impulse train to appear close to the unit circle introducing spikes in the group delay function. This
is demonstrated in Fig. 26 where window center is at GCI and we only vary the window size. A
window size in the T0-2T0 range appears to be a good choice for group delay processing.


Fig. 26: Effect of windowing size to group delay of a synthetic speech signal.

Each (Blackman ) window size is indicated on the window waveform on the top figure. The group
delay function of the resulting windowed data for each window is presented with the window size
indicated on the left-top corner of the figure.

Effects of window function on group delay functions

Windowing function is also important but comparatively less important than window size and
location once we limit ourselves with commonly used window functions listed above (see Fig. 27
for group delay functions obtained on the same data frame using different window functions). We
observed that three types of windowing functions provide best group delay functions: Blackman,
Gaussian and Hanning-Poisson.

The Hanning-Poisson windows provide the smoothest group delay functions since the Poisson
contribution of the window is composed of exponential functions. Windowing with a Hanning-
Poisson results in multiplication of exponentials of the Poisson function and the speech signal, thus
addition of decay coefficients of the window and the glottal flow and vocal tract responses (Eq.
3.11 and Eq. 3.12). This shifts zeros further away from the unit circle. For this reason, Hanning-
Poisson window is preferable in group delay based analysis methods. Hanning-Poisson and
Gaussian are the functions for which the smoothness of the representation can be adjusted to some
level with the decay coefficient (these two window functions have an independent user controlled
parameter for adjusting decay coefficients).


61

Page 120

120



[Yegnanarayana & Murthy, 1992] B. Yegnanarayana, H. A. Murthy, ‘Significance of group delay
functions in spectrum estimation.’ IEEE Trans. on Signal Processing, vol. 40, no. 9, Sept. 1992,
pp. 2281-2289.

[Yegnanarayana et al, 1998] B.Yegnanarayana, C.d'Alessandro, V. Darsinos, ‘An iterative algorithm
for decomposition of speech signals into periodic and aperiodic components.’ IEEE Transaction on
Speech and Audio Processing, vol. 6 (1):1-11, 1998.

[You, 2004] H. You, ‘Application of long-term filtering to formant estimation.’ Proc. Interspeech-
ICSLP, Korea, 2004.

[Zheng & Hasegawa-Johnson, 2003] Y. Zheng, M. Hasegawa-Johnson, ‘Particle filtering approach to
Bayesian formant tracking.’ Proc. IEEE Workshop on Statistical Signal Processing, 2003.

[Zhu & Paliwal, 2004] D. Zhu, K. K. Paliwal, ‘Product of power spectrum and group delay function for
speech recognition.’ Proc. ICASSP, Montreal, 2004, pp. 125-128.

[Zolfaghari et al, 2003] P. Zolfaghari, T. Nakatani, T. Irino, H. Kawahara, and F. Itakura, ‘Glottal
closure instant synchronous sinusoidal model for high quality speech analysis/synthesis.’ Proc. of
Eurospeech, Geneva, 2003. pp. 2441-2444.

[Zolfaghari & Robinson, 1996] P. Zolfaghari, T. Robinson. ‘Formant analysis using mixtures of
gaussians.’ Proc. ICSLP, Philadelphia, 1996, pp. 1229-1232.

[www-DarpaDBA] http://www.ldc.upenn.edu/readme_files/timit.readme.html

[www-Ellis] D. Ellis, Lecture notes on speech production : http://www.ee.columbia.edu/~dpwe/e6820/

[www-Mbrola] http://tcts.fpms.ac.be/synthesis/mbrola.html

[www-Strut] J.-M. Boite, L. Couvreur, S. Dupont and C. Ris, Speech Training and Recognition Unified
Tool (STRUT), http://tcts.fpms.ac.be/asr/project/strut.

[www-Praat] http://www.praat.org

[www-Voqual03] http://www.limsi.fr/VOQUAL

[www-WinSnoori] http://www.loria.fr/~laprie/WinSnoori/

http://www.ee.columbia.edu/%7Edpwe/e6820/
http://www.praat.org/

Page 121

121

Similer Documents