ISCApad #295 |
Monday, January 09, 2023 by Chris Wellekens |
The Bordeaux Computer Science Laboratory (LaBRI) is currently looking to fill a 1 year post-doctoral position in the framework of the FVLLMONTI European project (http://www.fvllmonti.eu) Details on the position are given below. — The University of Bordeaux invites applications for a 1 year full-time postdoctoral researcher in Automatic Speech Recognition. The position is part of the FVLLMONTI project on efficient speech-to-text translation on embedded autonomous devices, funded by the European Community. To apply, please send by email a single PDF file containing a full CV (including publication list), cover letter (describing your personal qualifications, research interests and motivation for applying), evidence for software development experience (active Github/Gitlab profile or similar), two of your key publications, contact information of two referees and academic certificates (PhD, Diploma/Master, Bachelor certificates). Details on the position are given below: Job description: Post-doctoral position in Automatic Speech Recognition Duration: 12 months Starting date: tentatively 03/01/2023 Project: European FETPROACT project FVLLMONTI (started January 2021) Location: Bordeaux Computer Science Lab. (LaBRI CNRS UMR 5800), Bordeaux, France (Image and Sound team) Salary: from 2 086,45€ to 2 304,88€/month (estimated net salary after taxes, according to experience) Contact: jean-luc.rouas@labri.fr Job description: The applicant will be in charge of optimizing our state-of-the-art Automatic Speech Recognition and Machine Translation systems for English and French built using the ESPNET framework (https://espnet.github.io/espnet/) for end-to-end deep neural networks. The objective is to match the specifications and constraints of the designed systems to the requirements of other partners of the project specialized in hardware (close work with EPFL https://www.epfl.ch/labs/esl/). In particular, the applicant will carry on the work of previous post-doctorates on compression techniques (e.g. pruning, quantization, etc.) applied to Transformer and Conformer based networks to reduce the memory and energy consumption while keeping an eye on the performances (WER and BLEU scores). When a satisfactory trade-off is reached, more exploratory work is to be carried out on using emotion/attitude/affect recognition on the speech samples to supply additional information to the translation system. Context of the project: The aim of the FVLLMONTI project is to build a lightweight autonomous in-ear device allowing speech-to-speech translation. Today, pocket-talk devices integrate IoT products requiring internet connectivity which, in general, is proven to be energy inefficient. While machine translation (MT) and Natural Language Processing (NLP) performances have greatly improved, an embedded lightweight energy-efficient hardware remains elusive. Existing solutions based on artificial neural networks (NNs) are computation-intensive and energy-hungry requiring server-based implementations, which also raises data protection and privacy concerns. Today, 2D electronic architectures suffer from 'unscalable' interconnect and are thus still far from being able to compete with biological neural systems in terms of real-time information-processing capabilities with comparable energy consumption. Recent advances in materials science, device technology and synaptic architectures have the potential to fill this gap with novel disruptive technologies that go beyond conventional CMOS technology. A promising solution comes from vertical nanowire field-effect transistors (VNWFETs) to unlock the full potential of truly unconventional 3D circuit density and performance. Required skills: - PhD in Automatic Speech Recognition (preferred) or Machine Translation using deep neural networks - Knowledge of most widely used toolboxes/frameworks (pytorch, espnet) - Good programming skills in Python - Good communication skills (frequent interactions with hardware specialists) - Interest in hardware design will be a plus Selected references: S. Karita et al., 'A Comparative Study on Transformer vs RNN in Speech Applications,' 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), SG, Singapore, 2019, pp. 449-456, doi: 10.1109/ASRU46091.2019.9003750. Leila Ben Letaifa, Jean−Luc Rouas. Transformer Model Compression for End−to−End Speech Recognition on Mobile Devices. 2022 30th European Signal Processing Conference (EUSIPCO), Aug 2022, Belgrade, Serbia.
Leila Ben Letaifa, Jean−Luc Rouas. Fine-grained analysis of the transformer model for efficient pruning. 2022 International Conference on Machine Learning and Applications (ICMLA), Dec 2022, Nassau, Bahamas.
|
Back | Top |