Speech-NLP Master 2 Internship Year 2020-2021

Speech-NLP Master 2 Internship Year 2020-2021

Speech Segmentation and Automatic Detection of Conflicts in

Political Interviews

LISN – Université Paris-Saclay

Internship for Last Year Engineer or Master 2 Students

Keywords: Machine Learning, Diarization, Digital Humanities, Political Speech, Prosody,

Expressive Speech


This internship is part of the Ontology and Tools for the Annotation of Political Speech

(OOPAIP 2018), a transdisciplinary project funded under the DIM-STCN (Text Sciences and

New Knowledge) by the Regional Council of Ile de France. The project is carried out by the

European Center for Sociology and Political Science (CESSP) of the University of Paris 1

Panthéon-Sorbonne, the National Audiovisual Institute (INA), and the LISN. Its objective is to

design new approaches to develop detailed, qualitative, and quantitative analyzes of political

speech in the French media. Part of the project concerns the study of the dynamics of conflicting

interactions in interviews and political debates, which requires a detailed description and a

large corpus to allow for the models’ generalization. Some of the main challenges concern the

performance of speaker and speech style segmentation, e.g., improving the segmentation accuracy,

detecting superimposed speech, measuring vocal effort and other expressive elements.


The main objective of the internship is to improve the automatic segmentation of political

interviews. In this context, we will be particularly interested in the detection of hubbub (strong

and prolonged overlapped speech). More precisely, we would like to extract features from the

speech signal (Eyben et al. 2015) correlated with the level of conflictual content in the exchanges,

based, for example, on the arousal level in the speaker’s voice—intermediate level between

the speech signal analysis and the expressivity description (Rilliard, d’Alessandro, and Evrard

2018)—or vocal effort (Liénard 2019).

The internship will initially be based on two corpora of 30 political interviews manually annotated

in speech turns and speech acts—within the framework of the OOPAIP project. It will begin

with a state of the art review of speech diarization and overlapped speech detection (Chowdhury

et al. 2019). The aim will then be to propose solutions based on recent frameworks (Bredin

et al. 2020) to improve the precise localization of speaking segments, in particular when the

frequency of speaker changes is high.

In the second part of the internship, we will look at a more detailed measurement and prediction

of the conflicting level of exchanges. We will search for the most relevant features to describe the

conflicting level and by adapting or developing a neural network architecture for its modeling.

The programming language used for this internship will be Python. The candidate will have

access to the LISN computing resources (servers and clusters with recent generation GPUs).



Depending on the degree of maturity of the work carried out, we expect the applicant to:

Distribute the tools produced under an open-source license

Write a scientific publication


The internship will take place over a period of 4 to 6 months at the LISN (formerly LIMSI) in the

TLP group (spoken language processing). The laboratory is located near the plateau de Saclay,

university campus building 507, rue du Belvédère, 91400 Orsay. The candidate will be supervised

by Marc Evrard ( Allowance under the official standards (

Applicant profile

Student in the last year of a 5-years diploma in the field of computer science (AI is a plus)

Proficiency in Python language and experience in using ML libraries (Scikit-Learn, Tensor-

Flow, PyTorch)

Strong interest in digital humanities and political science in particular

Experience in automatic speech processing is preferred

Ability to carry out a bibliographic study from scientific articles written in English

To apply: Send an email to including a résumé and a cover letter.


