ISCApad #261 |
Saturday, March 14, 2020 by Chris Wellekens |
8-1 | George P. Kafentzis, 'Adaptive sinusoidal models for speech with applications in speech modifications and audio analysis' Email address: gkafen@gmail.com
| |||||
8-2 | Xingyu, Na, ' Personalization of HMM-based Speech Synthesis'. Email address: asr.naxingyu@gmail.com
| |||||
8-3 | Tuomo Raitio, 'Voice source modelling techniques for statistical parametric speech synthesis' Tuomo Raitio, 'Voice source modelling techniques for statistical parametric speech synthesis'
| |||||
8-4 | Bart Penning de Vries, 'Computerised Speaking Practice: The Role of Automatic Corrective Feedback in Learning L2 Grammar´ Bart Penning de Vries has completed his PhD thesis entitled http://www.ru.nl/english/education/masters/historical-studies/?ActLbl=pagina&ActItmIdt=996617 Link to the document: Email address: bardtpdv@gmail.com
| |||||
8-5 | Prasanna Kumar Muthukumar, 'Towards Integrated Acoustic Models for Speech Synthesis' Prasanna Kumar Muthukumar, 'Towards Integrated Acoustic Models for Speech Synthesis', CMU-LTI-15-019 Advisor; Alan W. Black Language Technologies Institute, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213. Thesis text can be found at http://www.cs.cmu.edu/~pmuthuku/publications/thesis/Prasanna_thesis.pdf
| |||||
8-6 | Stephen Bodnar,'Affective L2 learning experiences and ideal L2 selves in spoken CALL practice' Stephen Bodnar has completed his PhD thesis entitled
| |||||
8-7 | Andreas Windmann, 'Optimization-based modeling of suprasegmental speech timing' Name: Andreas Windmann
| |||||
8-8 | Catharine Oertel Genannt Bierbach, 'Modeling Engagement in Multi-Party Conversations' Name: Catharine Oertel Genannt Bierbach
Title: Modeling Engagement in Multi-Party Conversations (2017)
Gender: female
University: KTH
Supervizor: Prof. Joakim Gustafson
| |||||
8-9 | Hardik B. Sailor,'Auditory Representation Learning' Name: Hardik B. Sailor
| |||||
8-10 | Philipp Aichinger, 'Diplophonic Voice - Definitions, models, and detection'
Subject: Philipp Aichinger, 'Diplophonic Voice - Definitions, models, and detection'Body of the announcement: Email address: philipp.aichinger@meduniwien.ac.at
| |||||
8-11 | Yun Wang, 'Polyphonic Sound Event Detection with Weak Labeling' Yun Wang graduated as a PhD from CMU October 2018 Thesis advisor: Prof. Florian Metze (CMU)
| |||||
8-12 | Omid Ghahabi, 'Deep Learning for i-Vector Speaker and Language Recognition' Omid Ghahabi, 'Deep Learning for i-Vector Speaker and Language Recognition' email address: omid.ghahabi@eml.org
Omid Ghahabi completed his PhD thesis entitled 'Deep Learning for i-Vector Speaker and Language Recognition' at Universitat Politecnica de Catalunya (UPC), Barcelona, Spain. It was supervised by Prof. Javier Hernando at TALP Research Center, Department of Signal Theory and Communications.
Link to the document: https://theses.eurasip.org/theses/798/deep-learning-for-i-vector-speaker-and-language/
| |||||
8-13 | Neeraj Kumar Sharma, 'Information-rich Sampling of Time-varying Signals'
Thesis Author: Neeraj Kumar Sharma
Current Affiliation:
Post-Doctoral Fellow
Carnegie Mellon University
Pittsburgh 15213, USA
E-mail: neerajww@gmail.com
URL: neerajww.github.in
PhD Granting Institution:
Dept. of Electrical Communication
Engineering (ECE)
Indian Institute of Science
Bangalore 560012, India
Thesis Advisor:
Dr. Thippur V. Sreenivas
Professor, Dept. ECE
Indian Institute of Science
Bangalore 560012, India
E-mail: tvsree@iisc.ac.in
Thesis title: Information-rich Sampling of Time-varying Signals
Abstract: This thesis investigates three fundamental concepts of interest to time-varying signal analysis: sampling, modulations and modelling.
The underlying goal is speech/audio signal processing and the motivation is drawn by exploring how these information rich signals are represented in the human auditory system. The rich information content of speech naturally requires the signals to be highly time-varying, as is
evident in the joint time-frequency representation, such as the short-time Fourier transform. Although the theoretical bandwidth of such time-varying signals is infinite, the auditory nerves are known to carry only low rate sampled information of these signals to the cortex, and yet obtain a rich information content of these signals. Thus, it may be unnecessary to sample the signals at a uniform Nyquist rate, as is done in all current day technology applications. Further, the present day quasi-stationary models of speech/audio, based on a linear time-invariant system may be inadequate. Instead of these models, the thesis explores signal decomposition using time-varying
signal components, namely, the amplitude and frequency modulations (AM-FM). The contributions are presented in three parts, and combined these suggest an alternative analysis techniques for fine spectro-temporal analysis of time-varying signals.
In part 1, the thesis analyzes non-uniform event-triggered samples, namely zero-crossings (ZCs) and extrema of the signal. The extrema are the ZCs of the signal first derivative and similarly the ZCs of higher derivative of the signal, denoted HoZC-d; using the sparse signal reconstruction approach, the different 'd' HoZCs are compared for their efficiency to reconstruct the signal based on different signal models. It is found that HoZC-1 outperform others, and a combination of HoZC-1 and HoZC-2 provides acceptable reconstruction.
In part 2, analyzing an AM-FM signal, it is shown that extrema samples (HoZC-1) are better than ZCs or LCs, in estimating the AM and FM components through local polynomial regression. Similarly, HoZC-1 can provide better AM-FM estimation of sub-band speech, moving source Doppler signal, etc., compared to DESA-1 and analytic signal approach, with additional benefit of sub-sampling. Extending the analysis to arbitrary multi-component AM-FM signals, it is shown that the successive derivative operation aids in separating the highest FM component as the dominant AM-FM component out of the multiple components. This is referred to as the 'dominant
instantaneous frequency principle' and is used for sequential estimation of individual mono-component AM-FM signals in the multi-component mixture.
The part 3, focusing on speech signals, visits time-varying sinusoidal modeling of speech, and proposes an alternate model estimation approach. The estimation operates on the whole signal without any short-time analysis. The approach proceeds by extracting the fundamental frequency sinusoid (FFS) from speech signal. The instantaneous amplitude (IA) of the FFS is used for voiced/unvoiced stream segregation. The voiced stream is then demodulated
using a variant of in-phase and quadrature-phase demodulation carried at harmonics of the FFS. The result is a non-parametric time-varying sinusoidal representation, specifically, an additive mixture of quasi-harmonic sinusoids for voiced stream and a wideband mono-component sinusoid for unvoiced stream. The representation is evaluated for analysis-synthesis, and the bandwidth of IA and IF signals are found to be crucial in preserving the quality. Also, the obtained IA and IF signals are found to be carriers of perceived speech attributes, such as speaker characteristics and intelligibility. On comparing the proposed modeling framework with the existing approaches,
which operate on short-time segments, improvement is found in simplicity of implementation, objective-scores, and computation time. The listening test scores suggest that the quality preserves naturalness but does not yet beat the state-of-the-art short-time analysis methods. In summary, the proposed representation lends itself for high resolution temporal analysis of non-stationary speech signals, and also allows quality preserving modification and synthesis.
URL: https://drive.google.com/open?id=17Olne0RBkVHRd2HcmJc0f44e17m47NZB
|