ISCA - International Speech
Communication Association


ISCApad Archive  »  2022  »  ISCApad #295  »  Jobs  »  (2022-12-05) Real time speaker separation Master internship, Lille (France), 2022@SteelSeries France R&D team (former Nahimic R&D team), France

ISCApad #295

Monday, January 09, 2023 by Chris Wellekens

6-45 (2022-12-05) Real time speaker separation Master internship, Lille (France), 2022@SteelSeries France R&D team (former Nahimic R&D team), France
  

Real time speaker separation

Master internship, Lille (France), 2022

Advisors — Nathan Souviraà-Labastie, R&D Engineer, PhD, nathan.souviraa-labastie@steelseries.com — Damien Granger, R&D Engineer, damien.granger@steelseries.com Company description About GN Group GN was founded 150 years ago with a truly innovative and global mindset. Today, we honour that legacy with world-leading expertise in the human ear, sound and video processing, wireless technology, miniaturization and collaborations with leading technology partners. GN’s solutions are marketed by the brands ReSound, Beltone, Interton, Jabra, BlueParrott, SteelSeries and FalCom in 100 countries. The GN Group employs 6,500 people and is listed on Nasdaq Copenhagen (GN.CO). About SteelSeries SteelSeries is the worldwide leader in gaming and esports peripherals focused on premium quality, innovation, and functionality. SteelSeries’ family of professional and gaming enthusiasts are the driving force behind the company and help influence, design, and craft every single accessory and the brand’s software ecosystem, SteelSeries GG. In 2020, SteelSeries acquired Nahimic, the leader in 3D sound solutions for gaming. We are currently looking for a machine learning / audio signal processing intern to join the R&D team of SteelSeries’ Software & Services Business Unit in our French office (former Nahimic R&D team). Internship subject Audio source separation consists in extracting the different sound sources present in an audio signal, in particular by estimating their frequency distributions and/or spatial positions. Many applications are possible from karaoke generation to speech denoising. In 2020, our separation approaches [1, 2] were equaling the state of the art [3, 4] on a music separation task. Since then our speech denoising product has hit the market [5] and the team continue to explore many tracks of improvements (see for instance the following project [6, 7]). Real time speaker separation This internship targets speaker separation which is formalized in the scientific community as the task of separately retrieving a given number of speech/speaker signals from a monaural mixture signal. Most of the scientific challenges [8] compare offline (not real-time) approaches. The objective of the internship is to address the following targets (more or less ordered) : — Based on our current speech denoising trainsets, the candidate will create a trainset for the speaker separation task that match the same in-house requirement. Indeed, most of the available datasets in the scientific community lack quantity, audio quality of the groundtruths, high sampling rate, diversity of speakers/noise type. In addition, for the SteelSeries use cases, the overlap in time of the different speech sources might be lower than in the scenarii used by the scientific community and it statistical distribution will need to be well identified/defined. — Once our offline and online baseline algorithm have been trained on such a trainset, the candidate could benchmark on different scenarii (number of speaker, signal ratio between speakers, effect of additional noise, various and mixed languages) to potentially fulfill the weakness of the trainset. — The first subjective listening could bring the candidate to design complementary metrics, for instance representing false positive in speaker attribution or representating the statistics about the time needed by real-time DNN to correctly attribute a signal to the correct speaker after some silence. 1 — While all the above could be done using state-of-the-art loss functions, the candidate could also adapt our internal loss to be permutation invariant [9]. — The scientific community is very active in proposing new DNN architectures (offline [10, 8] and online [11, 12]. The candidate could also re-implement or propose her/his own architecture. In particular, a multi-task approach where the DNN also outputs the number of active speakers would be of great interest. Skills Who are we looking for ? Preparing an engineering degree or a master’s degree, you preferably have knowledge in the development and implementation of advanced machine learning algorithms. Digital audio signal processing skills is a plus. Whereas not mandatory, notions in the following additional various fields would be appreciated : Audio effects in general : compression, equalization, etc. - Statistics, probabilist approaches, optimization. - Programming language : Python, Pytorch, Keras, Tensorflow, Matlab. - Voice recognition, voice command. - Computer programming and development : Max/MSP, C/C++/C#. - Audio editing software : Audacity, Adobe Audition, etc. - Scientific publications and patent applications. - Fluent in English and French. - Demonstrate intellectual curiosity. Références [1] I. Alaoui Abdellaoui et N. Souviraà-Labastie. « Blending the attention mechanism in TasNet ». working paper or preprint. Nov. 2020. [2] E. Pierson Lancaster et N. Souviraà-Labastie. « A frugal approach to music source separation ». working paper or preprint. Nov. 2020. [3] F.-R. Stöter, A. Liutkus et N. Ito. « The 2018 signal separation evaluation campaign ». In : International Conference on Latent Variable Analysis and Signal Separation. Springer. 2018, p. 293-305. [4] N. Takahashi et Y. Mitsufuji. « Multi-scale Multi-band DenseNets for Audio Source Separation ». In : 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). 29 juin 2017. arXiv : 1706.09588. [5] ClearCast AI Noise Canceling - Promotion video. https : / / www . youtube . com / watch ? v = RD4eXKEw4Lg. [6] M. Vial et N. Souviraà-Labastie. Learning rate scheduling and gradient clipping for audio source separation. Rapp. tech. SteelSeries France, déc. 2022. [7] The torchcustoml rschedulersGitHubrepository. https : / / github . com / SteelSeries / torch _ custom_lr_schedulers. [8] Speech separation task referenced on the paperswithcode website. https://paperswithcode.com/ task/speech-separation. [9] X. Liu et J. Pons. « On permutation invariant training for speech source separation ». In : ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2021, p. 6-10. [10] Music separation task referenced on the paperswithcode website. https://paperswithcode.com/ sota/music-source-separation-on-musdb18. [11] DNS challenge on the paperswithcode website. https://paperswithcode.com/sota/speechenhancement-on-deep-noise-suppression. [12] H. Dubey et al. « Icassp 2022 deep noise suppression challenge ». In : ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2022, p. 9271-9275


Back  Top


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA