ISCApad Archive » 2021 » ISCApad #274 » Jobs » (2021-01-07) Speech-NLP Master 2 Internship Year 2020-2021 at LISN (ex LIMSI), University Paris-Saclay, France |
ISCApad #274 |
Sunday, April 11, 2021 by Chris Wellekens |
Speech-NLP Master 2 Internship Year 2020-2021 Speech Segmentation and Automatic Detection of Conflicts in Political Interviews LISN – Université Paris-Saclay Internship for Last Year Engineer or Master 2 Students Keywords: Machine Learning, Diarization, Digital Humanities, Political Speech, Prosody, Expressive Speech Context This internship is part of the Ontology and Tools for the Annotation of Political Speech (OOPAIP 2018), a transdisciplinary project funded under the DIM-STCN (Text Sciences and New Knowledge) by the Regional Council of Ile de France. The project is carried out by the European Center for Sociology and Political Science (CESSP) of the University of Paris 1 Panthéon-Sorbonne, the National Audiovisual Institute (INA), and the LISN. Its objective is to design new approaches to develop detailed, qualitative, and quantitative analyzes of political speech in the French media. Part of the project concerns the study of the dynamics of conflicting interactions in interviews and political debates, which requires a detailed description and a large corpus to allow for the models’ generalization. Some of the main challenges concern the performance of speaker and speech style segmentation, e.g., improving the segmentation accuracy, detecting superimposed speech, measuring vocal effort and other expressive elements. Objectives The main objective of the internship is to improve the automatic segmentation of political interviews. In this context, we will be particularly interested in the detection of hubbub (strong and prolonged overlapped speech). More precisely, we would like to extract features from the speech signal (Eyben et al. 2015) correlated with the level of conflictual content in the exchanges, based, for example, on the arousal level in the speaker’s voice—intermediate level between the speech signal analysis and the expressivity description (Rilliard, d’Alessandro, and Evrard 2018)—or vocal effort (Liénard 2019). The internship will initially be based on two corpora of 30 political interviews manually annotated in speech turns and speech acts—within the framework of the OOPAIP project. It will begin with a state of the art review of speech diarization and overlapped speech detection (Chowdhury et al. 2019). The aim will then be to propose solutions based on recent frameworks (Bredin et al. 2020) to improve the precise localization of speaking segments, in particular when the frequency of speaker changes is high. In the second part of the internship, we will look at a more detailed measurement and prediction of the conflicting level of exchanges. We will search for the most relevant features to describe the conflicting level and by adapting or developing a neural network architecture for its modeling. The programming language used for this internship will be Python. The candidate will have access to the LISN computing resources (servers and clusters with recent generation GPUs).
Publications Depending on the degree of maturity of the work carried out, we expect the applicant to: • Distribute the tools produced under an open-source license • Write a scientific publication Conditions The internship will take place over a period of 4 to 6 months at the LISN (formerly LIMSI) in the TLP group (spoken language processing). The laboratory is located near the plateau de Saclay, university campus building 507, rue du Belvédère, 91400 Orsay. The candidate will be supervised by Marc Evrard (evrard@limsi.fr). Allowance under the official standards (service-public.fr). Applicant profile • Student in the last year of a 5-years diploma in the field of computer science (AI is a plus) • Proficiency in Python language and experience in using ML libraries (Scikit-Learn, Tensor- Flow, PyTorch) • Strong interest in digital humanities and political science in particular • Experience in automatic speech processing is preferred • Ability to carry out a bibliographic study from scientific articles written in English To apply: Send an email to evrard@limsi.fr including a résumé and a cover letter. Bibliography Bredin, Hervé, Ruiqing Yin, Juan Manuel Coria, Gregory Gelly, Pavel Korshunov, Marvin Lavechin, Diego Fustes, Hadrien Titeux, Wassim Bouaziz, and Marie-Philippe Gill. 2020. “Pyannote. Audio: Neural Building Blocks for Speaker Diarization.” In ICASSP. IEEE. Chowdhury, Shammur Absar, Evgeny A Stepanov, Morena Danieli, and Giuseppe Riccardi. 2019. “Automatic Classification of Speech Overlaps: Feature Representation and Algorithms.” Computer Speech & Language 55: 145–67. Eyben, Florian, Klaus R Scherer, Björn W Schuller, Johan Sundberg, Elisabeth André, Carlos Busso, Laurence Y Devillers, et al. 2015. “The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing.” IEEE Transactions on Affective Computing 7 (2): 190–202. Liénard, Jean-Sylvain. 2019. “Quantifying Vocal Effort from the Shape of the One-Third Octave Long-Term-Average Spectrum of Speech.” The Journal of the Acoustical Society of America 146 (4): EL369–75. OOPAIP. 2018. “(Ontologie Et Outil Pour l’annotation Des Interventions Politiques).” DIM STCN (Sciences du Texte et connaissances nouvelles) Conseil régional d’Ile de France. http://www.dim-humanites-numeriques.fr/projets/oopaip-ontologie-et-outilspour- lannotation-des-interventions-politiques/. Rilliard, Albert, Christophe d’Alessandro, and Marc Evrard. 2018. “Paradigmatic Variation of Vowels in Expressive Speech: Acoustic Description and Dimensional Analysis.” The Journal of the Acoustical Society of America 143 (1): 109–22. |
Back | Top |