ISCA Services

ISCA - International Speech
Communication Association

ISCApad Archive » 2024 » ISCApad #314 » Academic and Industry Notes » The Speech Prosody Conference program 2024

ISCApad #314

Friday, August 09, 2024 by Chris Wellekens

4-10 The Speech Prosody Conference program 2024

The Speech Prosody Conference Program is now available, at https://www.universiteitleiden.nl/sp2024/program .

The online lecture series resumes next month, with a talk on speech synthesis needs by Zofia Malisz; details below, also at https://sprosig.org/index.html . After that the tentative schedule is

Gabriel Skantze, KTH, May 15.
Simon Roessig, York, September.
Sam Tilsen, Cornell, October.
Sasha Calhoun, Victoria University of Wellington, November.
Robert Xu, Stanford, December.

The speech synthesis phoneticians need is both realistic and controllable: A survey and a roadmap towards modern synthesis tools for phonetics.
Zofia Malisz, KTH Royal Institute of Technology.
April 17th, 2 pm (Brasilia time). viewing link

ABSTRACT
In the last decade, data and machine learning-driven methods to speech synthesis have greatly improved its quality. So much so, that the realism achievable by current neural synthesisers can rival natural speech. However, modern neural synthesis methods have not yet transferred as tools for experimentation in the speech and language sciences. This is because modern systems still lack the ability to manipulate low-level acoustic characteristics of the signal such as e.g.: formant frequencies.
In this talk, I survey recent advances in speech synthesis and discuss their potential as experimental tools for phonetic research. I argue that speech scientists and speech engineers would benefit from working more with each other again: in particular, in the pursuit of prosodic and acoustic parameter control in neural speech synthesis. I showcase several approaches to fine synthesis control that I have implemented with colleagues: the WavebenderGAN and a system that mimicks the source-filter model of speech production. These systems allow to manipulate formant frequencies and other acoustic parameters with the same (or better) accuracy as e.g.: Praat but with a far superior signal quality.
Finally, I discuss ways to improve synthesis evaluation paradigms, so that not only industry but also speech science experimentation benchmarks are met. My hope is to inspire more students and researchers to take up these research challenges and explore the potential of working at the intersection of the speech technology and speech science.

Outline: 1. I discuss briefly the history of advancements in speech synthesis starting in the formant synthesis era and explain where the improvements came from. 2. I show experiments that I have done that prove modern synthesis is processed not differently than natural speech by humans in a lexical decision task as evidence that the realism (“naturalness”) goal has been largely achieved. 3. I explain how realism came at the expense of controllability. I show how controllability is an indispensable feature for speech synthesis to be adopted in phonetic experimentation. I survey the current state of research on controllability in speech engineering - concentrating on prosodic and formant control. 4. I propose how we can fix this by explaining the work I have done with colleagues on several systems that feature both realism and control. 5. I sketch a roadmap to improve synthesis tools for phonetics - by placing focus on benchmarking systems according to scientific criteria.

TBD. Gabriel Skantze, KTH, May 15.
TBD. Simon Roessig, York, September.
TBD. Sam Tilsen. October.
TBD. Sasha Calhoun, November.
TBD, Robert Xu, December.

Nigel Ward, SProSIG Chair, Professor of Computer Science, University of Texas at El Paso

nigel@utep.edu https://www.cs.utep.edu/nigel/

Back

Top

Organisation	Events	Membership	Help
> Board	> Interspeech	> Join - renew	> Sitemap
> Legal documents	> Workshops	> Membership directory	> Contact
> Logos			> FAQ
			> Privacy policy