ISCA - International Speech
Communication Association


ISCApad Archive  »  2023  »  ISCApad #296  »  Jobs  »  (2023-12-05)Master internship- Advanced Selective Mutual Learning for audio source separation @SteelSeries France R&D team (former Nahimic R&D team), France

ISCApad #296

Tuesday, February 07, 2023 by Chris Wellekens

6-39 (2023-12-05)Master internship- Advanced Selective Mutual Learning for audio source separation @SteelSeries France R&D team (former Nahimic R&D team), France
  

Advanced Selective Mutal Learning for audio source separation

Master internship, Lille (France),

2022 Advisors — Nathan Souviraà-Labastie, R&D Engineer, PhD, nathan.souviraa-labastie@steelseries.com — Pierre Biret, R&D Engineer, pierre.biret@steelseries.com Company description About GN Group GN was founded 150 years ago with a truly innovative and global mindset. Today, we honour that legacy with world-leading expertise in the human ear, sound and video processing, wireless technology, miniaturization and collaborations with leading technology partners. GN’s solutions are marketed by the brands ReSound, Beltone, Interton, Jabra, BlueParrott, SteelSeries and FalCom in 100 countries. The GN Group employs 6,500 people and is listed on Nasdaq Copenhagen (GN.CO). About SteelSeries SteelSeries is the worldwide leader in gaming and esports peripherals focused on premium quality, innovation, and functionality. SteelSeries’ family of professional and gaming enthusiasts are the driving force behind the company and help influence, design, and craft every single accessory and the brand’s software ecosystem, SteelSeries GG. In 2020, SteelSeries acquired Nahimic, the leader in 3D sound solutions for gaming. We are currently looking for a machine learning / audio signal processing intern to join the R&D team of SteelSeries’ Software & Services Business Unit in our French office (former Nahimic R&D team). Internship subject Audio source separation consists in extracting the different sound sources present in an audio signal, in particular by estimating their frequency distributions and/or spatial positions. Many applications are possible from karaoke generation to speech denoising. In 2020, our separation approaches [1, 2] were equaling the state of the art [3, 4] on a music separation task. Since then our speech denoising product has hit the market [5] and the team continue to explore many tracks of improvements (see for instance the following project [6, 7]). Selective Mutual Learning (SML) Mutual learning (ML) [8] is a general idea related to knowledge distillation (KD) [9] where a group of untrained lightweight networks simultaneously learn and share knowledge to perform tasks together during training. The specificity of Selective Mutual Learning [10] is that the high-confidence predictions are used to guide the remaining network while the low-confidence predictions are ignored. This helps removing poor predictions of the networks during sharing knowledge. It can be noticed that the knowledge sharing is operated via loss functions that take into account the prediction of the other networks. The approach is simple and already shows benefits compared to KD and ML for boosting the performance of the networks for speech separation. Research axes The intern will be able to use existing internal trainsets and already implemented network architectures, which will facilitate drawing unbiased comparisons to our baseline approach. After re-implementing the SML approach, here is a list of possible axes of improvement of the SML approach : — tune the confidence factor (hyper-parameter c in [10]) to fit our speech denoising baseline (DNN and trainset) — extend and test the SML approach to more than 2 networks — adapt the SML loss formula to incorporate our internal loss (description upon request) 1 — additional tests with — new or already implemented networks : TasNet [11] ,E3net [12], DPRNN [13], transformer [1]) — various trainset (music separation , speech separation, ... ) Skills Who are we looking for ? Preparing an engineering degree or a master’s degree, you preferably have knowledge in the development and implementation of advanced machine learning algorithms. Digital audio signal processing skills is a plus. Whereas not mandatory, notions in the following additional various fields would be appreciated : Audio effects in general : compression, equalization, etc. - Statistics, probabilist approaches, optimization. - Programming language : Python, Pytorch, Keras, Tensorflow, Matlab. - Voice recognition, voice command. - Computer programming and development : Max/MSP, C/C++/C#. - Audio editing software : Audacity, Adobe Audition, etc. - Scientific publications and patent applications. - Fluent in English and French. - Demonstrate intellectual curiosity. Références [1] I. Alaoui Abdellaoui et N. Souviraà-Labastie. « Blending the attention mechanism in TasNet ». working paper or preprint. Nov. 2020. [2] E. Pierson Lancaster et N. Souviraà-Labastie. « A frugal approach to music source separation ». working paper or preprint. Nov. 2020. [3] F.-R. Stöter, A. Liutkus et N. Ito. « The 2018 signal separation evaluation campaign ». In : International Conference on Latent Variable Analysis and Signal Separation. Springer. 2018, p. 293-305. [4] N. Takahashi et Y. Mitsufuji. « Multi-scale Multi-band DenseNets for Audio Source Separation ». In : 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). 29 juin 2017. arXiv : 1706.09588. [5] ClearCast AI Noise Canceling - Promotion video. https : / / www . youtube . com / watch ? v = RD4eXKEw4Lg. [6] M. Vial et N. Souviraà-Labastie. Learning rate scheduling and gradient clipping for audio source separation. Rapp. tech. SteelSeries France, déc. 2022. [7] The torchcustoml rschedulersGitHubrepository. https : / / github . com / SteelSeries / torch _ custom_lr_schedulers. [8] Y. Zhang et al. « Deep mutual learning ». In : Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, p. 4320-4328. [9] G. Hinton, O. Vinyals, J. Dean et al. « Distilling the knowledge in a neural network ». In : arXiv preprint arXiv :1503.02531 2.7 (2015). [10] H. M. Tan et al. « Selective Mutual Learning : An Efficient Approach for Single Channel Speech Separation ». In : ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2022, p. 3678-3682. [11] Y. Luo et N. Mesgarani. « TasNet : time-domain audio separation network for real-time, singlechannel speech separation ». In : arXiv :1711.00541 [cs, eess] (1er nov. 2017). 4*. [12] M. Thakker et al. « Fast Real-time Personalized Speech Enhancement : End-to-End Enhancement Network (E3Net) and Knowledge Distillation ». In : arXiv preprint arXiv :2204.00771 (2022). [13] Y. Luo, Z. Chen et T. Yoshioka. « Dual-path rnn : efficient long sequence modeling for timedomain single-channel speech separation ». In : ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2020, p. 46-50.


Back  Top


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA