ISCA - International Speech
Communication Association


ISCApad Archive  »  2022  »  ISCApad #286  »  Jobs  »  (2022-04-04) PhD position at INRIA-LORIA, Nancy, France

ISCApad #286

Sunday, April 10, 2022 by Chris Wellekens

6-21 (2022-04-04) PhD position at INRIA-LORIA, Nancy, France
  

2022-04676 - PhD Position F/M Nongaussian models for deep learning based audio signal processing Level of qualifications required :Graduate degree or equivalent Fonction : PhD Position Context The PhD student will join the Multispeech team of Inria,that is the largest French research group in the field of speech processing. He/she will benefit from the research environment and the expertise in audio signal processing and machine learning of the team, which includes many researchers, PhD students, post-docs, and software engineers working in this field. He/she will be supervised by Emmanuel Vincent (Senior Researcher, Inria), and Paul Magron (Researcher, Inria). Assignment Audio signal processing and machine listeningsystems have achieved considerable progress over the past years, notably thanks to the advent of deep learning. Such systems usually process a timefrequency representation of the data, such as a magnitude spectrogram, and model its structure using a deep neural network (DNN). Generally speaking, these systems implicitly rely on the local Gaussian model [1],that is an elementary statistical model for the data. Even though it is convenient to manipulate, this model builds upon several hypotheses which are limiting in practice: (i) circular symmetry, which boils down t o discarding the phase information (= the argument of the complex-valued time-frequency coefficients); (ii) independence of the coefficients, which ignores the inherent structure of audio signals (temporal dynamics, frequency dependencies); and (iii)Gaussian density, which is not observed in practice. Statistical audio signal modeling is an active research field. However, recent advances in this field are usually not leveraged in deep learning-based approaches, thus their potential is currently underexploited. Besides, some of these advances are not mature enough to be fully deployed yet. Therefore, the objective of this PhD is to design advanced statistical signal models for audio which overcome the limitations of the local Gaussian model, while combining them with DNN-based spectrogram modeling. The developed approaches will be applied to audio source separation and speech enhancement. Main activities The main objectives of the PhD student will be: 1. To develop structured statistical models for audio signals, which alleviate the limitations of the local Gaussian model. In particular, t he PhD student will focus on designing models by leveraging properties that originate from signal analysis, such as the temporal continuity [2] or the consistency of the representation [3], in order to favor interpretability and meaningfulness of the models. For instance, alpha-stable distributions have been exploited in audio for their robustness [4]. Anisotropic models are an interesting research direction since they overcome the circular symmetry assumption, while enabling an interpretable parametrization of the statistical moments [5]. Finally, a careful design of the covariance matrix allows for explicitly incorporating time and frequency dependencies [6]. 2. To combine these statistical models withDNNs. This raises several technical difficulties regarding the design of, e.g., the neural architecture, the loss function, and the inference algorithm. The student will exploit and adapt the formalism developed in Bayesian deep learning, notably the variational autoencoding framework [7], as well as the inference procedures developed in DNN-free nongaussian models [8]. 3. To validate experimentally these methods on realistic sound datasets. To that end, the PhD student will use public datasets such as LibriMix (speech) and MUSDB (music), which are reference datasets for source separation and speech enhancement. The PhD student will disseminate his/her research results in international peer-reviewed journals and conferences. In order to promote reproducible research, these publications will be self-archived at each step of the publication lifecycle, and accessible through open access repositories (e.g., arXiv, HAL). The code will be integrated to Asteroid, that is the reference soDware for source separation and speech enhancement developed by Multispeech. Bibliography [1] E. Vincent, M. Jafari, S. Abdallah, M. Plumbley, M. Davies,Probabilistic modeling paradigms for audio source separation, Machine Audition: Principles, Algorithms and Systems, p.162–185, 2010. [2] T. Virtanen, Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 15, no. 3, pp.1066-1074, 2007. [3] J. Le Roux, N. Ono, S. Sagayama, Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction, Proc. SAPA, 2008. [4] S. Leglaive, U. Şimşekli, A. Liutkus, R. Badeau and G. Richard,Alpha-stable multichannel audio source separation, Proc. IEEE ICASSP, 2017. [5] P. Magron, R. Badeau, B. David, Phase-dependent anisotropic Gaussian model for audio source separation, Proc. IEEE ICASSP, 2017. [6] M. Pariente, Implicit and explicit phase modeling in deep learning-based source separation, PhD thesis - Université de Lorraine, 2021. [7] L. Girin, S. Leglaive, X. Bie,J. Diard, T. Hueber, X. Alameda-Pineda,Dynamical variational autoencoders: A comprehensive review, Foundations and Trends in Machine Learning, vol. 15, no. 1-2, 2021. General Information Theme/Domain : Language, Speech and Audio Town/city : Villers lès Nancy Inria Center : CRI Nancy - Grand Est Starting date : 2022-10-01 Duration of contract : 3 years Deadline to apply : 2022-05-02 Contacts Inria Team : MULTISPEECH PhD Supervisor : Magron Paul / paul.magron@inria.fr About Inria Inria is the French national research institute dedicated to digital science and technology. It employs 2,600 people. Its 200 agile project teams, generally run jointly with academic partners, include more than 3,500 scientists and engineers working to meet the challenges of digital technology, oDen at the interface with other disciplines. The Institute also employs numerous talents in over forty different professions. 900 research support staff contribute to the preparation and development of scientific and entrepreneurial projects that have a worldwide impact. The keys to success Upload your complete application data. Applications will be assessed on a rolling basis, thus it is advised to apply as soon as possible. Instruction to apply Defence Security : This position is likely to be situated in a restricted area (ZRR), as defined in Decree No. 2011-1425 relating to the protection of national scientific and technical potential (PPST).Authorisation to enter an area is granted by the director of the unit, following a favourable Ministerial decision, as defined in the decree of 3 July 2012 relating to the PPST. An unfavourable Ministerial decision in respect of a position situated in a ZRR would result in the cancellation of the appointment. Recruitment Policy : As part of its diversity policy, all Inria positions are accessible to people with disabilities. Warning : you must enter your e-mail address in order to save your application to Inria. Applications must be submitted online on the Inria website. Processing of applications sent from other channels is not guaranteed. [8] P. Magron, T. Virtanen, Complex ISNMF: a phase-aware model for monaural audio source separation, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 27, no. 1, pp. 20-31, 2019. Skills Master or engineering degree in computer science, data science, signal processing, or machine learning. Professional capacity in English (spoken, read, and written). Some programming experience in Python andin somedeep learning framework (e.g., PyTorch). Previous experience and/or interest for speech and audio processing is a plus. Benefits package Restauration subventionnée Transports publics remboursés partiellement Congés: 7 semaines de congés annuels + 10 jours de RTT (base temps plein) + possibilité d'autorisations d'absence exceptionnelle (ex : enfants malades, déménagement) Possibilité de télétravail (après 6 mois d'ancienneté) et aménagement du temps de travail Équipements professionnels à disposition (visioconférence, prêts de matériels informatiques, etc.) Prestations sociales, culturelles et sportives (Association de gestion des œuvres sociales d'Inria) Accès à la formation professionnelle Sécurité sociale Remuneration Salary: 1982€ gross/month for 1st and 2 year. 2085€ gross/month for 3rd year. Monthly salary after taxes : around 1594€ for 1st and 2 year. 1677€ for 3rd year


Back  Top


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA