ISCA - International Speech
Communication Association

ISCApad Archive  »  2021  »  ISCApad #277  »  Jobs  »  (2021-01-07) Speech-NLP Master 2 Internship Year 2020-2021 at LISN (ex LIMSI), University Paris-Saclay, France

ISCApad #277

Saturday, July 10, 2021 by Chris Wellekens

6-3 (2021-01-07) Speech-NLP Master 2 Internship Year 2020-2021 at LISN (ex LIMSI), University Paris-Saclay, France

Speech-NLP Master 2 Internship Year 2020-2021

Speech Segmentation and Automatic Detection of Conflicts in

Political Interviews

LISN – Université Paris-Saclay

Internship for Last Year Engineer or Master 2 Students

Keywords: Machine Learning, Diarization, Digital Humanities, Political Speech, Prosody,

Expressive Speech


This internship is part of the Ontology and Tools for the Annotation of Political Speech

(OOPAIP 2018), a transdisciplinary project funded under the DIM-STCN (Text Sciences and

New Knowledge) by the Regional Council of Ile de France. The project is carried out by the

European Center for Sociology and Political Science (CESSP) of the University of Paris 1

Panthéon-Sorbonne, the National Audiovisual Institute (INA), and the LISN. Its objective is to

design new approaches to develop detailed, qualitative, and quantitative analyzes of political

speech in the French media. Part of the project concerns the study of the dynamics of conflicting

interactions in interviews and political debates, which requires a detailed description and a

large corpus to allow for the models’ generalization. Some of the main challenges concern the

performance of speaker and speech style segmentation, e.g., improving the segmentation accuracy,

detecting superimposed speech, measuring vocal effort and other expressive elements.


The main objective of the internship is to improve the automatic segmentation of political

interviews. In this context, we will be particularly interested in the detection of hubbub (strong

and prolonged overlapped speech). More precisely, we would like to extract features from the

speech signal (Eyben et al. 2015) correlated with the level of conflictual content in the exchanges,

based, for example, on the arousal level in the speaker’s voice—intermediate level between

the speech signal analysis and the expressivity description (Rilliard, d’Alessandro, and Evrard

2018)—or vocal effort (Liénard 2019).

The internship will initially be based on two corpora of 30 political interviews manually annotated

in speech turns and speech acts—within the framework of the OOPAIP project. It will begin

with a state of the art review of speech diarization and overlapped speech detection (Chowdhury

et al. 2019). The aim will then be to propose solutions based on recent frameworks (Bredin

et al. 2020) to improve the precise localization of speaking segments, in particular when the

frequency of speaker changes is high.

In the second part of the internship, we will look at a more detailed measurement and prediction

of the conflicting level of exchanges. We will search for the most relevant features to describe the

conflicting level and by adapting or developing a neural network architecture for its modeling.

The programming language used for this internship will be Python. The candidate will have

access to the LISN computing resources (servers and clusters with recent generation GPUs).



Depending on the degree of maturity of the work carried out, we expect the applicant to:

Distribute the tools produced under an open-source license

Write a scientific publication


The internship will take place over a period of 4 to 6 months at the LISN (formerly LIMSI) in the

TLP group (spoken language processing). The laboratory is located near the plateau de Saclay,

university campus building 507, rue du Belvédère, 91400 Orsay. The candidate will be supervised

by Marc Evrard ( Allowance under the official standards (

Applicant profile

Student in the last year of a 5-years diploma in the field of computer science (AI is a plus)

Proficiency in Python language and experience in using ML libraries (Scikit-Learn, Tensor-

Flow, PyTorch)

Strong interest in digital humanities and political science in particular

Experience in automatic speech processing is preferred

Ability to carry out a bibliographic study from scientific articles written in English

To apply: Send an email to including a résumé and a cover letter.


Bredin, Hervé, Ruiqing Yin, Juan Manuel Coria, Gregory Gelly, Pavel Korshunov, Marvin

Lavechin, Diego Fustes, Hadrien Titeux, Wassim Bouaziz, and Marie-Philippe Gill. 2020.

“Pyannote. Audio: Neural Building Blocks for Speaker Diarization.” In ICASSP. IEEE.

Chowdhury, Shammur Absar, Evgeny A Stepanov, Morena Danieli, and Giuseppe Riccardi.

2019. “Automatic Classification of Speech Overlaps: Feature Representation and Algorithms.”

Computer Speech & Language 55: 145–67.

Eyben, Florian, Klaus R Scherer, Björn W Schuller, Johan Sundberg, Elisabeth André, Carlos

Busso, Laurence Y Devillers, et al. 2015. “The Geneva Minimalistic Acoustic Parameter Set

(GeMAPS) for Voice Research and Affective Computing.” IEEE Transactions on Affective

Computing 7 (2): 190–202.

Liénard, Jean-Sylvain. 2019. “Quantifying Vocal Effort from the Shape of the One-Third Octave

Long-Term-Average Spectrum of Speech.” The Journal of the Acoustical Society of America

146 (4): EL369–75.

OOPAIP. 2018. “(Ontologie Et Outil Pour l’annotation Des Interventions Politiques).”

DIM STCN (Sciences du Texte et connaissances nouvelles) Conseil régional d’Ile de



Rilliard, Albert, Christophe d’Alessandro, and Marc Evrard. 2018. “Paradigmatic Variation of

Vowels in Expressive Speech: Acoustic Description and Dimensional Analysis.” The Journal

of the Acoustical Society of America 143 (1): 109–22.

Back  Top

 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA