ISCA - International Speech
Communication Association

ISCApad Archive » 2020 » ISCApad #261 » Recent Theses

ISCApad #261

Saturday, March 14, 2020 by Chris Wellekens

8 Recent Theses

8-1

George P. Kafentzis, 'Adaptive sinusoidal models for speech with applications in speech modifications and audio analysis'

Email address: gkafen@gmail.com
George P. Kafentzis has completed his PhD thesis entitled 'Adaptive sinusoidal models for
speech with applications in speech modifications and audio analysis' at University of
Crete and University of Rennes 1. It was supervised by Yannis Stylianou and Olivier
Boeffard. Link to the document: https://www.csd.uoc.gr/~kafentz/PhD_Thesis_Final.pdf

Back

Top

8-2

Xingyu, Na, ' Personalization of HMM-based Speech Synthesis'.

Email address: asr.naxingyu@gmail.com
Xingyu, Na has completed his PhD thesis entitled 'Personalization of HMM-based Speech
Synthesis' at Beijing Institute of Technology. It was supervised by Jingming Kuang. Link
to the document: http://it.is.a.chinese.thesis

Back

Top

8-3

Tuomo Raitio, 'Voice source modelling techniques for statistical parametric speech synthesis'

Tuomo Raitio, 'Voice source modelling techniques for statistical parametric speech synthesis'
Email address: tuomo.raitio@aalto.fi
Tuomo Raitio has comleted his PhD thesis entitled 'Voice source modelling techniques for statistical parametric speech synthesis? at Aalto University, Helsinki, Finland. It was supervised by Professor Paavo Alku.
Link to the document:
https://aaltodoc.aalto.fi/handle/123456789/15584

Back

Top

8-4

Bart Penning de Vries, 'Computerised Speaking Practice: The Role of Automatic Corrective Feedback in Learning L2 Grammar´

Bart Penning de Vries has completed his PhD thesis entitled
'Computerised Speaking Practice: The Role of Automatic Corrective
Feedback in Learning L2 Grammar´ at the Centre of Language and Speech
Technology (CLST) of the Radboud University Nijmegen, the Netherlands.
Promotor - Prof. Dr. R.W.N.M. van Hout
Copromotors - Dr. C. Cucchiarini, Dr. H. Strik

http://www.ru.nl/english/education/masters/historical-studies/?ActLbl=pagina&ActItmIdt=996617

Link to the document:
http://repository.ubn.ru.nl/bitstream/handle/2066/141390/141390.pdf

Email address: bardtpdv@gmail.com

Back

Top

8-5

Prasanna Kumar Muthukumar, 'Towards Integrated Acoustic Models for Speech Synthesis'

Prasanna Kumar Muthukumar, 'Towards Integrated Acoustic Models for Speech Synthesis',

CMU-LTI-15-019

Advisor; Alan W. Black

Language Technologies Institute,

School of Computer Science,

Carnegie Mellon University,

5000 Forbes Avenue,

Pittsburgh, PA 15213.

Thesis text can be found at http://www.cs.cmu.edu/~pmuthuku/publications/thesis/Prasanna_thesis.pdf

Back

Top

8-6

Stephen Bodnar,'Affective L2 learning experiences and ideal L2 selves in spoken CALL practice'

Stephen Bodnar has completed his PhD thesis entitled
'Affective L2 learning experiences and ideal L2 selves in spoken CALL practice'
at the Centre of Language and Speech Technology (CLST) of the Radboud University
Nijmegen, the Netherlands.
Promotor - Prof. Dr. R.W.N.M. van Hout
Copromotors - Dr. C. Cucchiarini, Dr. H. Strik

http://repository.ubn.ru.nl/handle/2066/157803

Links to the document:
http://repository.ubn.ru.nl/bitstream/handle/2066/157803/157803.pdf?sequence=1
http://www.lotpublications.nl/Documents/428_fulltext.pdf

Back

Top

8-7

Andreas Windmann, 'Optimization-based modeling of suprasegmental speech timing'

Name: Andreas Windmann
Email: windmannandreas@gmail.com
Gender: male
Thesis title: Optimization-based modeling of suprasegmental speech timing
Supervisor: Petra Wagner
University: Bielefeld University
URL: https://pub.uni-bielefeld.de/publication/2906598

Back

Top

8-8

Catharine Oertel Genannt Bierbach, 'Modeling Engagement in Multi-Party Conversations'

Name: Catharine Oertel Genannt Bierbach

Title: Modeling Engagement in Multi-Party Conversations (2017)

Gender: female

University: KTH

Supervizor: Prof. Joakim Gustafson

URL: http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1055854&dswid=-6303

Back

Top

8-9

Hardik B. Sailor,'Auditory Representation Learning'

Name: Hardik B. Sailor
Email: sailor_hardik@daiict.ac.in
Sex: male
Thesis title: Auditory Representation Learning
Supervisor: Prof. Hemant A. Patil
University: Dhirubhai Ambani Institute of Information and Communication
Technology (DA-IICT), Gandhinagar-382007, Gujarat, India
URL: https://drive.google.com/open?id=11l3d0imk2LuvYUoNgubdDzV3gocIw-pV

Back

Top

8-10

Philipp Aichinger, 'Diplophonic Voice - Definitions, models, and detection'

Subject: Philipp Aichinger, 'Diplophonic Voice - Definitions, models, and detection'

Body of the announcement:

Email address: philipp.aichinger@meduniwien.ac.at
Philipp Aichinger has completed his PhD thesis entitled 'Diplophonic Voice - Definitions, Models, and Detection', which was conducted at the Medical University of Vienna and the Graz University of Technology, Austria. Examiners were Gernot Kubin, Jean Schoentgen, and Berit Schneider-Stickler. Link to the document: https://www.researchgate.net/publication/271441725_Diplophonic_Voice_-_Definitions_models_and_detection

Back

Top

8-11

Yun Wang, 'Polyphonic Sound Event Detection with Weak Labeling'

Yun Wang graduated as a PhD from CMU October 2018

Thesis advisor: Prof. Florian Metze (CMU)

http://www.cs.cmu.edu/~yunwang/papers/cmu-thesis.pdf

Back

Top

8-12

Omid Ghahabi, 'Deep Learning for i-Vector Speaker and Language Recognition'

Omid Ghahabi, 'Deep Learning for i-Vector Speaker and Language Recognition'

email address: omid.ghahabi@eml.org

Omid Ghahabi completed his PhD thesis entitled 'Deep Learning for i-Vector Speaker and Language Recognition' at Universitat Politecnica de Catalunya (UPC), Barcelona, Spain. It was supervised by Prof. Javier Hernando at TALP Research Center, Department of Signal Theory and Communications.

Link to the document: https://theses.eurasip.org/theses/798/deep-learning-for-i-vector-speaker-and-language/

Back

Top

8-13

Neeraj Kumar Sharma, 'Information-rich Sampling of Time-varying Signals'

Thesis Author: Neeraj Kumar Sharma

Current Affiliation:

Post-Doctoral Fellow

Carnegie Mellon University

Pittsburgh 15213, USA

E-mail: neerajww@gmail.com

URL: neerajww.github.in

PhD Granting Institution:

Dept. of Electrical Communication

Engineering (ECE)

Indian Institute of Science

Bangalore 560012, India

Thesis Advisor:

Dr. Thippur V. Sreenivas

Professor, Dept. ECE

Indian Institute of Science

Bangalore 560012, India

E-mail: tvsree@iisc.ac.in

Thesis title: Information-rich Sampling of Time-varying Signals

Abstract: This thesis investigates three fundamental concepts of interest to time-varying signal analysis: sampling, modulations and modelling.

The underlying goal is speech/audio signal processing and the motivation is drawn by exploring how these information rich signals are represented in the human auditory system. The rich information content of speech naturally requires the signals to be highly time-varying, as is

evident in the joint time-frequency representation, such as the short-time Fourier transform. Although the theoretical bandwidth of such time-varying signals is infinite, the auditory nerves are known to carry only low rate sampled information of these signals to the cortex, and yet obtain a rich information content of these signals. Thus, it may be unnecessary to sample the signals at a uniform Nyquist rate, as is done in all current day technology applications. Further, the present day quasi-stationary models of speech/audio, based on a linear time-invariant system may be inadequate. Instead of these models, the thesis explores signal decomposition using time-varying

signal components, namely, the amplitude and frequency modulations (AM-FM). The contributions are presented in three parts, and combined these suggest an alternative analysis techniques for fine spectro-temporal analysis of time-varying signals.

In part 1, the thesis analyzes non-uniform event-triggered samples, namely zero-crossings (ZCs) and extrema of the signal. The extrema are the ZCs of the signal first derivative and similarly the ZCs of higher derivative of the signal, denoted HoZC-d; using the sparse signal reconstruction approach, the different 'd' HoZCs are compared for their efficiency to reconstruct the signal based on different signal models. It is found that HoZC-1 outperform others, and a combination of HoZC-1 and HoZC-2 provides acceptable reconstruction.

In part 2, analyzing an AM-FM signal, it is shown that extrema samples (HoZC-1) are better than ZCs or LCs, in estimating the AM and FM components through local polynomial regression. Similarly, HoZC-1 can provide better AM-FM estimation of sub-band speech, moving source Doppler signal, etc., compared to DESA-1 and analytic signal approach, with additional benefit of sub-sampling. Extending the analysis to arbitrary multi-component AM-FM signals, it is shown that the successive derivative operation aids in separating the highest FM component as the dominant AM-FM component out of the multiple components. This is referred to as the 'dominant

instantaneous frequency principle' and is used for sequential estimation of individual mono-component AM-FM signals in the multi-component mixture.

The part 3, focusing on speech signals, visits time-varying sinusoidal modeling of speech, and proposes an alternate model estimation approach. The estimation operates on the whole signal without any short-time analysis. The approach proceeds by extracting the fundamental frequency sinusoid (FFS) from speech signal. The instantaneous amplitude (IA) of the FFS is used for voiced/unvoiced stream segregation. The voiced stream is then demodulated

using a variant of in-phase and quadrature-phase demodulation carried at harmonics of the FFS. The result is a non-parametric time-varying sinusoidal representation, specifically, an additive mixture of quasi-harmonic sinusoids for voiced stream and a wideband mono-component sinusoid for unvoiced stream. The representation is evaluated for analysis-synthesis, and the bandwidth of IA and IF signals are found to be crucial in preserving the quality. Also, the obtained IA and IF signals are found to be carriers of perceived speech attributes, such as speaker characteristics and intelligibility. On comparing the proposed modeling framework with the existing approaches,

which operate on short-time segments, improvement is found in simplicity of implementation, objective-scores, and computation time. The listening test scores suggest that the quality preserves naturalness but does not yet beat the state-of-the-art short-time analysis methods. In summary, the proposed representation lends itself for high resolution temporal analysis of non-stationary speech signals, and also allows quality preserving modification and synthesis.

URL:

https://drive.google.com/open?id=17Olne0RBkVHRd2HcmJc0f44e17m47NZB

Back

Top

Organisation	Events	Membership	Help
> Board	> Interspeech	> Join - renew	> Sitemap
> Legal documents	> Workshops	> Membership directory	> Contact
> Logos			> FAQ
			> Privacy policy