Creation of a speech synthesis model from spontaneous speech


Speech synthesis, spontaneous speech, low ressource languages, Nigerian Pidgin Context Nigerian Pidgin is a large but under-resourced language that increasingly serves as the primary vernacular language of Africa’s most populous country. Once stigmatized as a “broken” variety of English spoken only by the uneducated, Nigerian Pidgin is now a source of pride for many speakers who view it as a home-grown vehicle for communication. It transcends class and ethnicity, lacking the tribal associations of indigenous languages and the colonial baggage associated with English. The language can now be seen and heard in college campuses, houses of worship, advertisements, Nigerian expat communities, and even on a local branch of the British Broadcasting Channel.


Despite Nigerian Pidgin’s growing prestige and a pool of speakers rivaling those of major languages like Turkish or Korean, the grammatical and intonational properties of the language are comparatively understudied. This internship is the extension of an ongoing research project aimed at better understanding its linguistic properties through the development and adaptation of NLP technologies. This research’s principal aim is to produce a natural-sounding text-tospeech (TTS) model that will allow researchers to conduct perception tests to determine how intonation influences the interpretation of meaning. Thanks partly to the recent explosion of neural network-based speech technologies, researchers can now produce high-quality synthesis from relatively simple datasets using models like TacoTron 2, complementing classical approaches such as those based on Hidden Markov Models. Specifically, the intern will assist in developing a text-to-speech platform trained on an existing database of Nigerian Pidgin recordings. In addition to producing natural-sounding speech, a central goal of this project will be to build a TTS model that will allow for the direct modification of intonational patterns via explicit parameters provided by researchers. The intern’s work will contribute to the exploration of the language’s melodic and tonal properties by allowing researchers to produce variations of novel utterances differing only by their intonational patterns.

Primary tasks

• Surveying existing TTS models and selecting the most suitable approach

• Training a model on a corpus of Nigerian Pidgin • Optimizing and evaluating the model


A second-year master’s student with:

• A solid background in machine learning (speech synthesis is a plus)

• Good academic writing skills in English

• An strong interest in language and linguistics

Sous la tutelle de : | Twitter @LisnLab | LinkedIn LisnLab Site Belvédère : Campus Universitaire Bâtiment 507 Rue du Belvédère – 91405 Orsay Cedex Site Plaine : Campus Universitaire bâtiment 650 Rue Raimond Castaing – 91190 Gif-sur-Yvette M2-CS-Intenship 2022-2023


The internship will take place from March 2023 for 5 to 6 months at the LISN lab in the Sciences and Language Technologies department, as well as in the MoDyCo lab at Paris Nanterre University (primarily at the location of shortest commute).

• The LISN’s Belvédère site is located in the plateau de Saclay: University campus building 507, rue du Belvédère, 91400 Orsay.

• The MoDyCo lab is located at the Université Paris Ouest Nanterre La Défense: Bâtiment A, 200, avenue de la République – 92001 Nanterre. The candidate will be supervised by Emmett Strickland (MoDyCo) and Marc Evrard (LISN). Allowance under the official standards (

To apply

Please send a CV and brief cover letter highlighting your interest in the project to the following: • Emmett Strickland ( • Marc Evrard (

Further reading

1. Tan, X., Qin, T., Soong, F., & Liu, T. Y. (2021). A survey on neural speech synthesis. arXiv preprint arXiv:2106.15561.

2. Ning, Y., He, S., Wu, Z., Xing, C., & Zhang, L. J. (2019). A Review of Deep Learning Based Speech Synthesis. Applied Sciences (2076-3417), 9(19). 3417/9/19/4050

3.Bigi, B., Caron, B., & Abiola, O. S. (2017). Developing resources for automated speech processing of the african language naija (nigerian pidgin). In 8th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (pp. 441-445).





