ISCA - International Speech
Communication Association

ISCApad Archive  »  2022  »  ISCApad #283  »  Jobs  »  (2021-11-27) Internship position at Telecom-Paris on Deep learning approaches for social computing, Telecom Paris, Palaiseau, France

ISCApad #283

Monday, January 10, 2022 by Chris Wellekens

6-25 (2021-11-27) Internship position at Telecom-Paris on Deep learning approaches for social computing, Telecom Paris, Palaiseau, France

Internship position at Telecom-Paris on Deep learning approaches for social computing


*Place of work* Telecom Paris, Palaiseau (Paris outskirt)


*Starting date* From February 2021(but can start later)


*Duration* 4-6 months



The intern will take part in the REVITALISE projectfunded by ANR.

The research activity of the internship will bring together the research topics of Prof. Chloé Clavel [Clavel] of the S2a [SSA] team at Telecom-Paris– social computing [SocComp] - and Dr. Mathieu Chollet [Chollet] from University of Glasgow – multimodal systems for social skills training, and Dr Beatrice Biancardi [Biancardi] – Social Behaviour Modelling from CESI Engineering School, Nanterre.


Candidate profile*

As a minimum requirement, the successful candidate should have:

• A master degree in one or more of the following areas: human-agent interaction, deep learning, computational linguistics, affective computing, reinforcement learning, natural language processing, speech processing

 Excellent programming skills (preferably in Python)

 Excellent command of English

 The desire to do an academic thesis at Telecom-Paris after the internship


*How to apply*

The application should be formatted as **a single pdf file** and should include:

 A complete and detailed curriculum vitae

 A cover letter

 The contact of two referees

The pdf file should be sent to the two supervisors: Chloé Clavel, Beatrice Biancardi and Mathieu Chollet:



Multimodal attention models for assessing and providing feedback on users’ public speaking ability


*Keywords* human-machine interaction, attention models, recurrent neural networks, Social Computing, natural language processing, speech processing, non-verbal behavior processing, multimodality, soft skills, public speaking


*Supervision* Chloé Clavel, Mathieu Chollet, Beatrice Biancardi


*Description* Oral communication skills are essential in many situations and have been identified as core skills of the 21st century. Technological innovations have enabled social skills training applications which hold great training potential: speakers’ behaviors can be automatically measured, and machine learning models can be trained to predict public speaking performance from these measurements and subsequently generate personalized feedback to the trainees.

The REVITALISE project proposes to study explainable machine learning models for the automatic assessment of public speaking and for automatic feedback production to public speaking trainees. In particular, the recruited intern will address the following points:

-   identify relevant datasets for training public speaking and prepare them for model training

-   propose and implement multimodal machine learning models for public speaking assessment and compare them to existing approaches in terms of predictive performance.

-   integrate the public assessment models to produce feedback a public speaking training interface, and evaluate the usefulness and acceptability of the produced feedback in a user study

The results of the project will help to advance the state of the art in social signal processing, and will further our understanding of the performance/explainability trade-off of these models.


The compared models will include traditional machine learning models proposed in previous work [Wortwein] and sequential neural approaches (recurrent networks) that integrate attention models as a continuation of the work done in [Hemamou], [BenYoussef]. The feedback production interface will extend a system developed in previous work [Chollet21].


Selected references of the team:

[Hemamou] L. Hemamou, G. Felhi, V. Vandenbussche, J.-C. Martin, C. Clavel, HireNet: a Hierarchical Attention Model for the Automatic Analysis of Asynchronous Video Job Interviews.  in AAAI 2019, to appear

[Ben-Youssef]  Atef Ben-Youssef, Chloé Clavel, Slim Essid, Miriam Bilac, Marine Chamoux, and Angelica Lim.  Ue-hri: a new dataset for the study of user engagement in spontaneous human-robot interactions.  In  Proceedings of the 19th ACM International Conference on Multimodal Interaction, pages 464–472. ACM, 2017.

[Wortwein] Torsten Wörtwein, Mathieu Chollet, Boris Schauerte, Louis-Philippe Morency, Rainer Stiefelhagen, and Stefan Scherer. 2015. Multimodal Public Speaking Performance Assessment. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (ICMI '15). Association for Computing Machinery, New York, NY, USA, 43–50.

[Chollet21] Chollet, M., Marsella, S., & Scherer, S. (2021). Training public speaking with virtual social interactions: effectiveness of real-time feedback and delayed feedback. Journal on Multimodal User Interfaces, 1-13.


Other references:









-Rasipuram, Sowmya, and Dinesh Babu Jayagopi. 'Automatic multimodal assessment of soft skills in social interactions: a review.' Multimedia Tools and Applications (2020): 1-24.

-Sharma, Rahul, Tanaya Guha, and Gaurav Sharma. 'Multichannel attention network for analyzing visual behavior in public speaking.' 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2018.

-Acharyya, R., Das, S., Chattoraj, A., & Tanveer, M. I. (2020, April). FairyTED: A Fair Rating Predictor for TED Talk Data. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 01, pp. 338-345).

Back  Top

 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA