ISCA Services

ISCA - International Speech
Communication Association

ISCApad Archive » 2013 » ISCApad #176 » Jobs » (2012-12-16) Master project 3 IRISA Rennes France

ISCApad #176

Saturday, February 09, 2013 by Chris Wellekens

6-26 (2012-12-16) Master project 3 IRISA Rennes France

Computer Science Internship

CORDIAL group

Title: Grapheme-to-phoneme conversion adaptation using conditional random elds

Description:

Grapheme-to-phoneme conversion consists in generating possible pronuncia-

tions for an isolated word or for a sequence of words. More formally, this conversion is a translit-

eration of a sequence of graphemes, i.e., letters, into a sequence of phonemes, symbolic units to

represent elementary sounds of a language. Grapheme-to-phoneme converters are used in speech

processing

either to help automatic speech recognition systems to decode words from a speech signal

or as a mean to explain speech synthesizers how a written input should be acoustically

produced.

A problem with such tools is that they are trained on large and varied amounts of aligned

sequences of graphemes and phonemes, leading to generic manners of pronouncing words in a

given language. As a consequence, they are not adequate as soon as one wants to recognize

or synthesize specic voices, for instance, accentuated speech, stressed speech, dictating voices

versus chatting voices,

etc. [1].

While multiple methods have been proposed for grapheme-to-phoneme conversion [2, 3], the

primary goal of this internship is to propose a method to adapt grapheme-to-phoneme models

which can easily be adapted under conditions specied by the user. More precisely, the use of

conditional random elds (CRF) will be studied to model the generic French pronunciation and

variants of it [4]. CRFs are state-of-the-art statistical tools widely used for labelling problems

in natural language processing [5]. A further important goal is to be able to automatically

characterize pronunciation distinctive features of a given specic voice as compared to a generic

voice. This means highlighting and generalizing di
erences that can be observed between two

sequences of phonemes derived from a same sequence of graphemes.

Results of this internship would be integrated into the speech synthesis platform of the team

in order to easily and automatically simulate and imitate specic voices.

Technical skills:

C/C++ and a scripting language (e.g., Perl or Python)

Keywords:

Natural language processing, speech processing, machine learning, statistical learn-

ing

Contact:

Gwenole Lecorve (gwenole.lecorve@irisa.fr)

References:

[1] B. Hutchinson and J. Droppo. Learning non-parametric models of pronunciation. In

Pro-

ceedings of ICASSP

, 2011.

[2] M. Bisani and H. Ney. Joint-sequence models for grapheme-to-phoneme conversion. In

Speech

Communication

, 2008.

[3] S. Hahn, P. Lehnen, and Ney H. Powerful extensions to crfs for grapheme to phoneme

conversion. In

Proceedings of ICASSP, 2011.

[4] Irina Illina, Dominique Fohr, and Denis Jouvet. Multiple pronunciation generation using

grapheme-to-phoneme conversion based on conditional random elds. In

Proceedings of

SPECOM

, 2011.

[5] John D. La
erty, Andrew McCallum, and Fernando C. N. Pereira. Conditional random

elds: probabilistic models for segmenting and labeling sequence data. In

Proceedings of

ICML

, 2001.

Back

Top

Organisation	Events	Membership	Help
> Board	> Interspeech	> Join - renew	> Sitemap
> Legal documents	> Workshops	> Membership directory	> Contact
> Logos			> FAQ
			> Privacy policy