ISCApad #176 |
Saturday, February 09, 2013 by Chris Wellekens |
Computer Science Internship CORDIAL group Title: Grapheme-to-phoneme conversion adaptation using conditional random elds Description: Grapheme-to-phoneme conversion consists in generating possible pronuncia- tions for an isolated word or for a sequence of words. More formally, this conversion is a translit- eration of a sequence of graphemes, i.e., letters, into a sequence of phonemes, symbolic units to represent elementary sounds of a language. Grapheme-to-phoneme converters are used in speech processing
either to help automatic speech recognition systems to decode words from a speech signal
or as a mean to explain speech synthesizers how a written input should be acoustically produced. A problem with such tools is that they are trained on large and varied amounts of aligned sequences of graphemes and phonemes, leading to generic manners of pronouncing words in a given language. As a consequence, they are not adequate as soon as one wants to recognize or synthesize specic voices, for instance, accentuated speech, stressed speech, dictating voices versus chatting voices, etc. [1]. While multiple methods have been proposed for grapheme-to-phoneme conversion [2, 3], the primary goal of this internship is to propose a method to adapt grapheme-to-phoneme models which can easily be adapted under conditions specied by the user. More precisely, the use of conditional random elds (CRF) will be studied to model the generic French pronunciation and variants of it [4]. CRFs are state-of-the-art statistical tools widely used for labelling problems in natural language processing [5]. A further important goal is to be able to automatically characterize pronunciation distinctive features of a given specic voice as compared to a generic voice. This means highlighting and generalizing di sequences of phonemes derived from a same sequence of graphemes. Results of this internship would be integrated into the speech synthesis platform of the team in order to easily and automatically simulate and imitate specic voices. Technical skills: C/C++ and a scripting language (e.g., Perl or Python) Keywords: Natural language processing, speech processing, machine learning, statistical learn- ing Contact: Gwenole Lecorve (gwenole.lecorve@irisa.fr) References: [1] B. Hutchinson and J. Droppo. Learning non-parametric models of pronunciation. In Pro- ceedings of ICASSP , 2011. [2] M. Bisani and H. Ney. Joint-sequence models for grapheme-to-phoneme conversion. In Speech Communication , 2008. [3] S. Hahn, P. Lehnen, and Ney H. Powerful extensions to crfs for grapheme to phoneme conversion. In Proceedings of ICASSP, 2011. [4] Irina Illina, Dominique Fohr, and Denis Jouvet. Multiple pronunciation generation using grapheme-to-phoneme conversion based on conditional random elds. In Proceedings of SPECOM , 2011. [5] John D. La elds: probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML , 2001. |
Back | Top |