ISCA - International Speech
Communication Association

ISCApad Archive  »  2013  »  ISCApad #178  »  Jobs  »  (2012-12-16) Master project 2 IRISA Rennes France

ISCApad #178

Wednesday, April 10, 2013 by Chris Wellekens

6-18 (2012-12-16) Master project 2 IRISA Rennes France

Computer Science Internship


Title :

Unit-selection speech synthesis guided by a stochastic model of spectral and prosodic


A Text-To-Speech system (TTS) produces a speech signal corresponding to the vocalization

of a given text. Such a system is composed of a linguistic processing stage followed by an acoustic

one which complies as much as possible with the linguistic directives. Concerning the second step,

the most used approaches are

{ the corpus based synthesis approach which lies on the selection and concatenation of unit

sequences extracted from a large continuous speech corpus. It has been popular for 20

years, yielding an unmatched sound quality but still bearing some artefacts due to spectral


{ the statistical approach. The new generation of TTS systems has emerged in the last years,

reintroducing the rule based systems. The rules are no longer deterministic like in the

rst systems in the 1950's, but they are replaced by stochastic models. HTS, an HMMbased

speech synthesis system, is currently the most used statistical system. The HTS type

systems yield a good acoustic continuum but with a sound quality strongly depending on

the underlying acoustic model.

Recently, some hybrid synthesis systems have been proposed, combining the statistical approach

with the method of unit selection. It consists in using the acoustic descriptions and the

melodic contours generated by a statistical system in order to drive the cost function during the

natural speech unit selection phase, or also, substituting the poor quality natural speech units

by units derived from a statistical system.

The framework of this subject is the corpus based TTS. Considering the combinatorial problem

due to the search of an optimal unit sequence with a blind sequencing, the work consists

in determining heuristics to reduce the search space and satisfy a real time objective. These

assumptions, based on spectral and prosodic type parameters generated by HTS, will permit to

implement pre-selection lters or to propose new cost functions within the corpus based system

developped by the Cordial group. The production of the hybrid system will be evaluated and

compared via listening tests with standard systems like HTS and a corpus based system.

Keywords :

TTS, Corpus based speech synthesis, Statistical Learning, Experiments.

Contacts :

Olivier Boe
ard, Nelly Barbot, Damien Lolive (

Bibliography :

[1] A. W. Black and K. A. Lenzo,

Optimal data selection for unit selection synthesis, 4th ISCA

Tutorial and Research Workshop on Speech Synthesis, 2001.

[2] H. Kawai, T. Toda, J. Ni, M. Tsuzaki and K. Tokuda,

Ximera : a new tts from atr based on

corpus-based technologies

. ISCA Tutorial and Research Workshop on Speech Synthesis, 2004.

[3] S. Rouibia and O. Rosec,

Unit selection for speech synthesis based on a new acoustic target


, Interspeech, 2005.

[4] H. Zen, K. Tokuda and A. W. Black,

Statistical parametric speech synthesis. Speech Communication,

v.51, n.11, pages 1039{1064, 2009.

[5] H. Silen, E. Helander, J. Nurminen, K. Koppinen and M. Gabbouj,

Using Robust Viterbi

Algorithm and HMM-Modeling in Unit Selection TTS to Replace Units of Poor Quality


Interspeech 2010.


Back  Top

 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA