ISCA Services

ISCA - International Speech
Communication Association

ISCApad Archive » 2023 » ISCApad #299 » Jobs » (2022-12-18) Master internship@ LISNLab, Orsay, France

ISCApad #299

Monday, May 08, 2023 by Chris Wellekens

6-23 (2022-12-18) Master internship@ LISNLab, Orsay, France

Study and development of a vocal force model

Keywords:

Machine learning, voice strength, speech processing, expressive speech

Context

The project aims to model the vocal force (VF) estimation on speech recordings. VF is defined as the sound pressure level (SPL in C-weighted decibels) measured in free field, one meter away in front of the speaker’s mouth (Liénard, 2019). This SPL is unfortunately lost in the vast majority of recordings, though the human ear is able to estimate this information thanks to the spectral differences produced by the variations in vocal effort induced by these VF values. A corpus presenting a pair of calibrated/uncalibrated signals will be used to build a model capable of estimating the original value of VF (in dBC). Collaborations under development will benefit and extend this effort by expanding the collected corpus and applying the resulting model to other tasks (e.g., expressive synthesis, Evrard et al., 2015).

Objectives

The initial aim will be to increase the variational characteristics of the uncalibrated signal from the pair provided in this corpus. In practice, it will be necessary to apply a series of degradations corresponding to the variations in distance and positioning of the speaker with respect to the microphone. Moreover, other processing will be applied, such as those typically used in post-production (compression, gate, etc.). A model will then have to be trained from this calibrated/uncalibrated pair to reproduce a reliable estimate of the original VF from any recording. Different neural architectures will be evaluated, from simple feedforward neural networks to those based on complex representations (e.g., CNN, LSTM). Different feature extraction methods will also be considered: raw, perceptually filtered (e.g., Mel) spectrums, as well as self-supervised model-based (e.g., Baevski et al., 2020).

Tasks

• Reviewing speech corpus augmentation techniques

• Surveying learning architectures: neural and self-supervised for processing audio pairs

• Augmentation of the corpus through the application of acoustic degradations

• Building a model of voice strength restoration from the signal pairs

• Presenting an objective evaluation of the model’s performance, as well as a subjective evaluation via perceptual experiments

Profile

A second-year master’s student with:

• A solid background in machine learning

• Good academic writing skills in English

• A strong interest in expressive speech

Sous la tutelle de : www.lisn.upsaclay.fr | Twitter @LisnLab | LinkedIn LisnLab Site Belvédère : Campus Universitaire Bâtiment 507 Rue du Belvédère – 91405 Orsay Cedex Site Plaine : Campus Universitaire bâtiment 650 Rue Raimond Castaing – 91190 Gif-sur-Yvette M2-CS-Internship 2022-2023

Modalities The internship will take place from March 2023 for 5 to 6 months in the Department of Language Sciences and Technologies at the LISN laboratory. The LISN Belvedere’s site is located on the plateau de Saclay: University campus, building 507, rue du Belvédère, 91400 Orsay. The candidate will be supervised by Marc Evrard and Albert Rilliard. Allowance according to official standards (service-public.fr).

How to apply

Please send a CV and brief cover letter highlighting your interest in the project to Marc Evrard (marc.evrard@lisn.upsaclay.fr).

References

1. Baevski, A., Zhou, Y., Mohamed, A., & Auli, M. (2020). Wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in Neural Information Processing Systems, 33, 12449-12460.

2. Evrard, M., Delalez, S., d’Alessandro, C., & Rilliard, A. (2015). Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis. In Sixteenth Annual Conference of the International Speech Communication Association.

3. Liénard, J. S. (2019). Quantifying vocal effort from the shape of the one-third octave long-term-average spectrum of speech. The Journal of the Acoustical Society of America, 146(4), EL369-EL375.

Back

Top

Organisation	Events	Membership	Help
> Board	> Interspeech	> Join - renew	> Sitemap
> Legal documents	> Workshops	> Membership directory	> Contact
> Logos			> FAQ
			> Privacy policy