ISCA - International Speech
Communication Association

ISCApad Archive » 2015 » ISCApad #206 » Events » ISCA Events

ISCApad #206

Thursday, August 20, 2015 by Chris Wellekens

3-1 ISCA Events

3-1-1

(2015-09-06) Doctoral Consortium at Interspeech 2015

Call for Applications: Doctoral Consortium at Interspeech 2015

Prospective and current doctoral students from all speech-related disciplines are invited to apply for admission to the Doctoral Consortium to be held at Interspeech 2015 in Dresden, Germany. The doctoral consortium is aimed at providing students working on speech-related topics with an opportunity to discuss their doctoral research with experts from their fields, and to receive feedback from experts and peers on their PhD projects. The format of the Doctoral Consortium will be a one-day workshop prior to the main conference (6th September). It will involve short presentations by participants summarizing their projects, followed by an intensive discussion of these presentations. The Doctoral Consortium is a new format at Interspeech, and is held this year in exchange for the ?Students Meet Experts? lunch event that was held at previous Interspeech Conferences. It is organized by the Student Advisory Committee of the International Speech Communication Association (ISCA-SAC).

How to apply:
Students who are interested in participating are asked to submit an extended abstract of their thesis plan, between 2-4 pages according to Interspeech template format, to: doctoral.workshop@interspeech2015.org

Who should apply:
While we encourage applications from students at any stage of doctoral training, the doctoral consortium will benefit most those students who are in the middle of their PhD, i.e. those who have already received some initial results but would still benefit from feedback.

Important dates:
? Application deadline: June 20th, 2015
? Notification of acceptance: July 20th, 2015
? Date of the Doctoral Consortium: Sunday 6th September 2015

For further questions please contact: doctoral.workshop@interspeech2015.org

Back

Top

3-1-2

(2015-09-06) INTERSPEECH 2015 Special Session on Synergies of Speech and Multimedia Technologies

INTERSPEECH 2015
Call for paper: submission for INTERSPEECH 2015 Special Session on
Synergies of Speech and Multimedia Technologies

Paper submission deadline: March 20, 2015
Special Session page:
http://multimediaeval.org/files/Interspeech2015_specialSession_SynergiesOfSpeechAndMultimediaTechnologies.html
Motivation:

Growing amounts of multimedia content is being shared or stored in
online archives. Alternative research directions in the speech
processing and multimedia analysis communities are developing and
improving speech or multimedia processing technologies in parallel,
often using each others work as ?black boxes?. However, genuine
combination would appear to be a better strategy to exploit the
synergies between the modalities of content containing multiple
potential sources of information.

This session seeks to bring together the speech and multimedia research
communities to report on current work and to explore potential synergies
and opportunities for creative research collaborations between speech
and multimedia technologies. From the speech perspective the session
aims to explore how fundamentals of speech technology can be benefit
multimedia applications, and from the multimedia perspective to explore
the crucial role that speech can play in multimedia analysis.

The list of topics of interest includes (but is not limited to):

- Navigation in multimedia content using advanced speech analysis features;
- Large scale speech and video analysis
- Multimedia content segmentation and structuring using audio and visual
features;
- Multimedia content hyperlinking and summarization;
- Natural language processing for multimedia;
- Multimodality-enhanced metadata extraction, e.g. entity extraction,
keyword extraction, etc;
- Generation of descriptive text for multimedia;
- Multimedia applications and services using speech analysis features;
- Affective and behavioural analytics based on multimodal cues;
- Audio event detection and video classification;
- Multimodal speaker identification and clustering.

Important dates:

20 Mar 2015 paper submission deadline
01 Jun 2015 paper notification of acceptance/rejection
10 Jun 2015 paper camera-ready
20 Jun 2015 early registration deadline
6-10 Sept 2015 Interspeech 2015, Dresden, Germany

Submission takes place via the general Interspeech submission
system. Paper contributions must comply to the INTERSPEECH paper
submission guidelines, cf. http://interspeech2015.org/papers.
There will be no extension to the full paper submission deadline.
We are looking forward to receive your contribution!

Organizers:

- Maria Eskevich, Communications Multimedia Group, EURECOM, France
(maria.eskevich@eurecom.fr <mailto:maria.eskevich@eurecom.fr>)
- Robin Aly, Database Management Group, University of Twente, The
Netherlands (r.aly@utwente.nl <mailto:r.aly@utwente.nl>)
- Roeland Ordelman, Human Media Interaction Group, University of Twente,
The Netherlands (roeland.ordelman@utwente.nl
< mailto:roeland.ordelman@utwente.nl>)
- Gareth J.F. Jones, CNGL Centre for Global Intelligent Content, Dublin
City University, Ireland (gjones@computing.dcu.ie
< mailto:gjones@computing.dcu.ie>)

Back

Top

3-1-3

(2015-09-06) Announcement of the Pleanary sessions at Interspeech 2015

Dear Colleagues,

Interspeech-2015 will start in about 3 months and it is time to announce our plenary speakers.

Following the tradition of most past Interspeech conferences, the organizing committee of Interspeech-2015 has decided to present four keynote talks in Dresden, one on each day of the conference. It is also tradition that the first keynote talk will be presented by the ISCA medaillist just at the end of the opening ceremony on Monday morning, Sept. 7, 2015. Information on this year’s ISCA medaillist will be published later in June.

Information on the other three plenary speakers will be however very soon available on the Interspeech-2015 website and here we provide already a brief introduction of the candidates:

Prof. Klaus Scherer from the University of Geneva will present a keynote talk about vocal communication as major carrier of information about a person’s physique, enduring dispositions, strategic intention and current emotional state. He will also discuss the importance of voice quality in comparision to other modalities, e.g. facial expressions, in this context. Prof. Scherer is one of the most prominent researchers in the area of emotion psychology and was holder of an ERC Advanced Grant covering these research topics.

Prof. Katrin Amunts from the Institut of Neuroscience and Medicine at the Research Centre Juelich/Germany will deliver a talk about her research within the European FET-Flagship „The Human Brain Project“. The expectations from this presentation are twofold: Firstly, many of the Interspeech attendants might have heard about the huge „EU Flagship Projects“ which are funded with more than $ 1 Billion each, but does not know what exactly is going on in such a project and how it is organized and structured. This talk will introduce the FET-Flagship project that is mostly relevant to the Interspeech-Community, the „Human Brain Project“. Prof. Amunts is not only the leader of the Subproject 2 “Strategic Human Brain Data”, but is additionally very well-known for her work on how language is mapped in regions of the human brain and how to create a 3D atlas of the human brain.. From her talk we can expect the perfect combination of speech & language research with neural brain research within the probably most prominent project in this area.

Everybody in the Speech Community knows the popular „Personal Digital Assistants“, such as Siri, Cortana, or Google Now. However, many of those people might not know exactly what detailed technology is behind these highly commercially successful systems. The answer to this question will be given by Dr. Ruhi Sarikaya from Microsoft in his keynote address. His group has been building the language understanding and dialog management capabilities of both Cortana and Xbox One. In his talk, he will give an overview of personal digital assistants and describe the system design, architecture and the key components behind them. He will highlight challenges and describe best practices related to bringing personal assistants from laboratories to the real-world.

I hope you agree that we will have truly exciting plenary talks at this year’s Interspeech and the Interspeech-2015 team is looking forward to sharing this experience with you soon in September in Dresden.

Gerhard Rigoll

Plenary Chair, Interspeech-2015

Back

Top

3-1-4

(2015-09-06) Call for Applications: Doctoral Consortium at Interspeech 2015

Call for Applications: Doctoral Consortium at Interspeech 2015

Important dates:
? Application deadline: June 20th, 2015
? Notification of acceptance: July 20th, 2015
? Date of the Doctoral Consortium: Sunday 6th September 2015

For further questions please contact: doctoral.workshop@interspeech2015.org

Back

Top

3-1-5

(2016-09-08) INTERSPEECH 2016, San Francisco, CA, USA

Interspeech 2016 will take place

from September 8-12 2016 in San Francisco, CA, USA

General Chair is Nelson Morgan.

You may from now on be tempted by the nice pictures of the cover page of its tentative website http://www.interspeech2016.org

Back

Top

3-1-6

INTERSPEECH 2015 Update 1 (December 2014)

Updates from INTERSPEECH 2015

Dear colleague,

Interspeech 2015 in Dresden is approaching at an increasing pace, and the entire team of organizers is trying to ensure that you will get a conference which meets all, and hopefully surpasses some, of your expectations. Regarding the usual program of oral and poster sessions, special sessions and challenges, keynotes, tutorials and satellite workshops, the responsible team is working hard to ensure that you will get a program which is not only of respectable breadth and depth, but which also tackles a couple of innovative topics, some of them centered around the special topic of the conference “Speech beyond speech: Towards a better understanding of our most important biosignal”, some of them also addressing other emergent topics.

We would particularly like to draw your attention to the approaching deadlines:
-          30 Nov 2014: Special sessions submission deadline (passed)
-          15 Dec 2014: Notification of pre-selected special sessions
-          20 Mar 2015: Tutorial submission deadline
-          20 Mar 2015: Paper submission deadline (not extensible)
-          17 Apr 2015: Show and tell paper submission deadline.
Calls

In addition to regular papers, we will also experiment with a virtual attendance format for persons who are – mainly for visa or health reasons – not able to come to Dresden to present their paper. For these persons, a limited number of technology-equipped poster boards will be available where online presentations can be held. The number of virtual attendance slots is strictly limited (thus potentially leading to a lower acceptance rate). The corresponding papers have to pass the normal review process, but the deadline will most probably be around 14 days before the normal paper submission deadline. More details on this format will be announced soon.

In the upcoming months, we will keep you updated via this thread, and we will present some historical instruments and techniques related to speech technology which nicely illustrate that Dresden has a rich history in speech science and technology. Interspeech 2015 will hopefully contribute to this history with the latest scientific and technological advances. The entire organizing team is looking forward to welcoming you in Dresden.

On behalf of the organizing team,

Sebastian Möller (General Chair)

Back

Top

3-1-7

INTERSPEECH 2015 Update 2 (January 2015)

+++ INTERSPEECH 2015 Update – and a look back! +++

A View from Dresden onto the History of Speech Communication

Part 1: The historic acoustic-phonetic collection

Information Technology at the TU Dresden goes back to Heinrich Barkhausen (1881–1956), the 'father of the electron valve', who taught from 1911 to 1953. Speech research in a narrower sense started with the development of a vocoder in the 1950s. Walter Tscheschner (1927–2004) performed his extensive investigations on the speech signal using components of the vocoder. In 1969, a scientific unit for Communication and Measurement was founded in Dresden. It is the main root of the present Institute of Acoustics and Speech Communication. W. Tscheschner was appointed Professor of Speech Communication and started with research in speech synthesis and recognition, which today continues.

Numerous objects from the history of Speech Communication in Dresden, but also from other parts of Germany, are preserved at the historic acoustic-phonetic collection of the TU Dresden. Until the opening of Interspeech 2015, we will present interesting exhibits from the collection in this newsletter monthly. Today, we give an introduction.

The historic acoustic-phonetic collection of the TU Dresden consists of three parts:

• Objects that illustrate the development of acoustics and speech technology at the TU Dresden. The most interesting devices are speech synthesizers of various technologies.

• Objects illustrating the development of experimental phonetics from 1900 until the introduction of the computer. The items of this part were collected by D. Mehnert from different phonetics laboratories and rehabilitation units throughout Germany.

• Objects which were formerly collected at the Phonetics Institute of Hamburg University. This important collection, which was founded by Giulio Panconcelli-Calzia, was transferred to Dresden in 2005 in accordance with a contract due to the closing of the Hamburg institute.

The collection is presented in the Barkhausenbau at the main campus of the TU Dresden. Recently, it is moving to new rooms which are more convenient for the presentation. The newly installed collection will be re-opened at the opportunity of Interspeech 2015.

For this purpose, we cordially invite to a workshop on the history of speech communication, called HSCR2015, which will be held as a satellite event of Interspeech 2015 at September 4/5, 2015, in the Technical Museum of the City of Dresden. It is organized by the special interest group (SIG) on 'The History of Speech Communication Sciences', which is supported by the International Speech Communication Association (ISCA) and the International Phonetic Association (IPA). More information on the workshop is presented on http://www.sig-hist.org/.

Rüdiger Hoffmann (Local Chair)

Back

Top

3-1-8

INTERSPEECH 2015 Update 3 (February 2015)

+++ INTERSPEECH 2015 – February Update ++

A View from Dresden onto the History of Speech Communication

Part 2: Von Kempelen's 'Sprachmaschine' and the beginning of speech synthesis

Complete article including figures available at: http://interspeech2015.org/conference/historical-review/

The speaking machine of Wolfgang von Kempelen (1734-1804) can be considered as the first successful attempt of a mechanical speech synthesiser. The Austrian-Hungarian engineer is still famous for his 'chess turk' but it was his 'Sprachmaschine' that can count as a milestone in (speech) technology. In his book 'Mechanismus der menschlichen Sprache nebst der Beschreibung einer sprechenden Maschine' (published 1791, no English translation yet) he described the function of the machine which was intended to give a voice for deaf people. Contemporary personalities like Goethe confirmed the authenticity of a child voice when the speaking machine was played.

How does the machine work?

The machine consists of bellows that is connected with a tube to a wooden wind chest. On the other side of the wind chest a round wooden block represents the interface to an open rubber funnel (as the vocal tract). In the wind chest there are two modified recorders to produce the fricatives [s] and [S]. The voice generator is located inside the wooden block. The artificial voice is generated with the help of a reed pipe borrowed by the pipe organ. It has an ivory reed vibrating against a wooden hollow shallot (like in a clarinet). The trained human operator plays the machine like a musical instrument. The right elbows control the air pressure by pressing on the bellows, two fingers of the right hand close or open the access for stops and nasals, two other fingers of the right hand for the fricatives. Vowels are performed by the palm of left hand in different ways.

Replicas

Apart from parts of one of the originals that are hosted at the Deutsches Museum in Munich there are several reconstructions based on Kempelen's quite detailed descriptions. The replicas built in Budapest, Vienna, York and Saarbrücken allow a lively demonstration of the mechanical generation of speech as well its acoustic analysis but also perception tests with today's listeners. Interestingly, the art of constructing artificial voices led to the profession of 'voice makers' in Eastern-German Thuringia (more information in one of the next newsletters). Original products of the Thuringian 'Stimmenmacher' as well as one of the replicas located at TU Dresden are at display of the HAPS (Historische Akustisch-Phonetische Sammlung) available for ears, eyes (and hands) at the re-opening of HAPS at 4 Sept, which is also the start of the Interspeech satellite Workshop on The History of Speech Communication Research (HSCR 2015).

Jürgen Trouvain and Fabian Brackhane

Back

Top

3-1-9

INTERSPEECH 2015 Update 4 (March 2015)

A View from Dresden onto the History of Speech Communication

Part 3: Voices for toys - First commercial spin-offs in speech synthesis

Complete article including figures available at:
http://interspeech2015.org/conference/historical-review/

When Wolfgang von Kempelen died in 1804, his automata (including the speaking machine) came in ownership of Johann Nepomuk Maelzel (1772 – 1838), who demonstrated them at many tours in Europe and America. He was a clever mechanic and applied Kempelen’s ideas in a mechanical voice for puppets, which could pronounce “Mama” and “Papa”. He received a patent on it in 1824 (Figure 1).

The idea of speaking puppets and toys was continued mainly in the area of Sonneberg in Thuringia, Germany. This small town was the world capital of manufacturing puppets and toys in the 19th century. The voices consist of a bellow, a metal tongue for voicing, and a resonator. There are three reasons why we appreciate the mechanical voices as a milestone in the development of speech technology:

1. The mechanical voices established the first commercial spin-off in speech research. The toy manufacturers in Sonneberg recognized the importance of Mälzel’s invention and produced speaking puppets from 1852. The “Stimmenmacher” (voices maker) was a specific profession, and we find eight manufacturers for human and animal voices alone in Sonneberg in 1911. The most important of them was Hugo Hölbe (1844 – 1931), who developed mechanisms which were able to speak not only Mama/Papa (Figure 2), but also words like Emma, Hurrah, etc.

2. The mechanical voices were applied in the first book with multimodal properties. The bookseller Theodor Brand from Sonneberg received a patent for his “speaking picture book” in 1878. This book shows different animals. Pulling a knob, which corresponds to a picture, activates the voice of the animal (Figure 3). The picture book was published in several languages and was a huge commercial success all over the world.

3. The mechanical voices are the first attempt to support the rehabilitation of hard hearing people by means of speech technology. The German otologist Johannes Kessel (1839 – 1907) demonstrated Hölbe’s voices as a training tool in speech therapy at a conference in 1899. The quality of this kind of synthetic speech proved to be not sufficient for this purpose, however.

The samples from Kessel came to the Phonetic Laboratory of Panconcelli-Calzia in Hamburg, who mentioned them in his historic essays. Due to the transfer of the phonetic exhibits from Hamburg to Dresden in 2005, you can visit the mechanical voices in the HAPS of the TU Dresden now.

Rüdiger Hoffmann

Photographs Copyright TU Dresden / HAPS

Back

Top

3-1-10

INTERSPEECH 2015 Update 5 (April 2015)

A View from Dresden onto the History of Speech Communication

Part 4: Helmholtz Resonators Complete article including figures available at: http://interspeech2015.org/conference/historical-review/

Hermann von Helmholtz (1821 –1894) was a German physician and physicist who made important contributions in many areas of science. One of these areas was acoustics, where he published the famous book 'On the sensations of tone as a physiological basis for the theory of music' in 1863. There he described his invention of a special type of resonator, which is now known as Helmholtz resonator. These resonators were devised as highly sensitive devices to identify the harmonic components (partial tones) of sounds and allowed significant advances in the acoustic analysis of vowel sounds and musical instruments.

Before the invention of Helmholtz resonators, strong partial tones in a sound wave were typically identified by very thin, elastic membranes that were spanned on circular rings similar to drums. Such a membrane has a certain resonance frequency (in fact multiple frequencies) that depends on its material, tension, and radius. If the sound field around the membrane contains energy at this frequency, the membrane is excited and starts to oscillate. The tiny amplitudes of this oscillation can be visually detected when fine grained sand is distributed over its surface. When the membrane is excited with its lowest resonance frequency, the sand accumulates at the rim of the membrane or along specific lines on its surface, when higher order modes are excited. With a set of membranes tuned to different frequencies, a rough spectral analysis can be conducted.

It was also known that the sensitivity of this method could be improved when the membrane was spanned over the (removed) bottom of a bottle with an open neck end. The key idea of Helmholtz was to replace this bottle by a hollow sphere with an open neck at one 'end' and another small spiky opening at the opposite 'end'. The spiky opening had to be inserted into one ear canal. In this way, the eardrum was excited similarly to the membrane with the sand of the previous technique. However, due to the high sensitivity of the ear, partial tones could be detected much more easily. A further advantage of these resonators was that their resonance frequencies can be expressed analytically in terms of the volume of the sphere and the diameter and the length of the neck. Hence these resonators became important experimental tools for the subjective sound analysis in the late 19th century and the early 20th century.

The HAPS at the TU Dresden contains three sets of Helmholtz resonators. The biggest of these sets contains 11 resonators, which are tuned to frequencies between 128 Hz and 768 Hz. The HAPS also contains a related kind of resonators that were invented by Schaefer (1902). These resonators are tubes with one open end and one closed end. The closed end also has a small spiky opening that has to be inserted into the ear canal. These resonators maximally respond to frequencies of which the wavelength is four times the length of the tube.

Helmholtz used his resonators not only for sound analysis, but also for the synthesis of vowels. Therefore, he first had to analyze the resonances of the vocal tract for different vowels. He did this by means of a set of tuning forks, which he placed and excited directly in front of his open mouth when he silently articulated the different vowels. When the frequency of a tuning fork was close to a resonance of the vocal tract, the resulting sound became much louder than for the other frequencies. For each of the vowels /u/, /o/, and /a/, he was only able to detect a single resonance of the vocal tract at the frequencies 175 Hz (note f), 494 Hz (note b’) and 988 Hz (note b’’), respectively. For each of the other investigated German vowels, he even detected two resonances. The single resonances detected for /u/, /o/ and /a/ probably correspond to the clusters of the nearby first and second resonances of the corresponding vowels. Obviously, his method of analysis was not sensitive enough to separate the two individual resonances of each of the vowels.

To synthesize the vowels /u/, /o/, and /a/ with a single resonance, he simply connected a reed pipe to Helmholtz resonators tuned to the corresponding frequencies. For the vowels with two resonances, he selected a Helmholtz resonator for one of the resonances and attached a 6-10 cm long glass tube to the outer opening of the resonator to create the second resonance. These experiments showed that Helmholtz had surprising insight in the source-filter principle of speech production, which was fully elaborated by Gunnar Fant and others 100 years later.

Peter Birkholz

Back

Top

3-1-11

INTERSPEECH 2015 Update 6 (May 2015)

A View from Dresden onto the History of Speech Communication

Part 5: Artificial vocal fold models – The investigation of phonation

Complete article including figures available at:
http://interspeech2015.org/conference/historical-review/

The investigation of the larynx was (and is) one of the predominant topics in phonetic research. In the early times of experimental phonetics, mechanical models of the larynx or, at least, of the vocal folds have been utilized according to the paradigm of analysis-by-synthesis.

The first models used flat parallel elastic membranes or other simple elements to simulate the function of the vocal folds (Fig. 1). However, the geometry of these models was rather different from that of real human vocal folds. A substantial progress was made by Franz Wethlo (1877 – 1960), who worked at the Berlin university as an educationalist and special pedagogue. He realized that the vocal folds should not be modelled by flat parallel membranes, but that the three-dimensional shape of the vocal folds should be taken into account. Hence, he proposed a three-dimensional model, which was formed by two elastic cushions (Fig. 2). The cushions were filled with pressurized air, the pressure of which could be varied for experimental purposes. In particular, the air pressure in the cushion pipes was varied to adjust the tension of the vocal folds. The whole model was known as “Polsterpfeife” (cushion pipe). Wethlo described it in 1913.

The historical collection (HAPS) at the TU Dresden owns several cushion pipes from Wethlo in different sizes, modelling male, female, and children’s voices. A team from the TU Dresden repeated Wethlo’s experiments with his original equipment in 2004. Therefore, the cushion pipes were connected to a historical “vocal tract model”. This vocal tract model was actually a stack of wooden plates with holes of different diameters to model the varying cross-sectional area of the vocal tract between the glottis and the lips (Fig. 3). This “configurable” vocal tract model came to the HAPS collections from the Institute of Phonetics in Cologne. The artificial vocal folds were used to excite vocal tract configurations for the vowels /a/, /i/ and /u/, but listening experiments showed that these artificial vowels were rather difficult to discriminate.

Today, there is renewed interest in mechanical models of the vocal folds. Such models can be used in physical 3d robotic models of the speech apparatus (e. g., the Waseda talker series of talking robots: http://www.takanishi.mech.waseda.ac.jp/top/research/voice/), to evaluate the accuracy of low-dimensional digital vocal fold models (e. g., http://scitation.aip.org/content/asa/journal/jasa/121/1/10.1121/1.2384846) or to examine pathological voice production.

Rüdiger Hoffmann & Peter Birkholz

Back

Top

3-1-12

INTERSPEECH 2015 Update 7 (June 2015)

A View from Dresden onto the History of Speech Communication

Part 6: Measuring speech respiration

Complete article including figures available at: http://interspeech2015.org/conference/historical-review/

The respiratory behaviour of humans provides important bio-signals, in speech communication we are interested in respiration while speaking. The investigation of speech respiration mainly entails the observation of i) activity of muscles relevant for in- and exhalation, ii) lung volume, iii) airflow, iv) sub-glottal air pressure, and v) kinematic movements of the thorax and the abdomen.

In the 'early days' of experimental phonetics the measurements were mainly focused on lung volume and the kinematic behaviour of the rib cage and the belly. We present here three devices that are also part of the historic acoustic-phonetic collection (HAPS) which will be re-opened during the International Workshop on the History of Speech Communication Research.

The Atemvolumenmesser (Figure 1) is an instrument to measure the vital capacity of the lung and the phonatory flow, respectively. The human subject maximally inhales with the help of a mask put onto mouth and nose. The subsequent expelling air arrives into the bellows via a mouthpiece and a rubber tube. The resulting volume can be seen on a vertical scale. A stylus that is mounted on a small tube at the scale allows to register the temporal dynamics of speech breathing with the help of a kymograph.

Figure 2 shows a Gürtel-Pneumograph ('belt-pneumograph') which serves to investigate the respiratory movements. The wave-like surfaced rubber tubes are fixed around the upper part of the body of the human subject in order to measure the thoracic and abdominal respiration. Changes of the kinematics result in changes of the air pressure to be transmitted via tubes of so-called Marey capsules onto the stylus to be registered with a kymograph.

The kymograph was the core instrument of the experimental phonetic research until the 1930s. The 'wave-writer' graphically represents changes over time. A revolving drum was wrapped with a sheet of paper with soot (impure carbon) on the surface and a fine-grained stylus easily writes the measured changes.

A clockwork motor was responsible for the constant revolution of the drum. Time-relevant parameters like speech wave forms, air pressure changes of the pneumograph or air volume changes of the Atemvolumenmesser were transduced into kinematic parameters via the Marey capsules and registered on the time axis.

The drum in Figure 3 has a height of 180 mm and circumference of 500 mm. At the beginning the drum is at the top and sinks continuously downwards during the registration process. This spiral movement allows the graphical recording of longer curves on the paper. The speed of the revolution of the drum could be set between 0.1 and 250 mm per second.

Jürgen Trouvain and Dieter Mehnert

Back

Top

3-1-13

INTERSPEECH 2015 Update 8 (July 2015)

A View from Dresden onto the History of Speech Communication

Part 7: Early electronic demonstrators for speech synthesis

Complete article including figures available at:
http://interspeech2015.org/conference/historical-review/

The development of early vocoders had a large impact on speech research, starting with the patents of Karl Otto Schmidt in Berlin and with Homer Dudley’s vocoder at Bell Labs in the 1930s. In some other places, vocoder prototypes had been designed during and after World War II. In Dresden, the development of a prototype was performed in the framework of the Dr.-Ing. thesis of Eberhard Krocker in the 1950s.

The first channel vocoders had been large and expensive due to the application of electronic valves. There was some doubt whether they could be widely used in commercial applications. Krocker summarized: “The importance of the vocoder is less the frequency band compression than the potential for essential investigations on speech. The analyzer can be combined with registration equipment for the analysis of sounds, whereas the synthesizer can be combined with a control mechanism for the synthetic production of speech.”

This was an exact prognosis. The analysis-synthesis technology proved to be a very powerful tool in speech research. The synthesizer part of a channel vocoder could be used for early attempts in electronic speech synthesis. The analyzer part of the vocoder had to be replaced by a control unit. This was a manual/pedal control in the case of Dudley’s famous voder. Another way for controlling the synthesizer was the optoelectronical reading of a spectrogram. The first of these so-called pattern playback devices were developed at the Bell Labs and the Haskins Labs.

It became clear that there are more effective kinds of parameterization of the speech signal, and other vocoder types than the channel vocoder arose. Formant coding proved to be a very effective approach. Consequently, the early types of speech synthesis terminals followed the principle of formant synthesis. This development was strongly influenced by the work of Gunnar Fant, who developed the vowel synthesizer OVE, which was controlled by moving a pointer across the formant plane.

Both devices, the vowel synthesizer and the playback device, showed huge didactic value. When Walter Tscheschner (1927–2004) was appointed the chair for Electronic Speech Communication at the former Technische Hochschule (later Technische Universität) Dresden, he wanted to have own versions of these devices as demonstrators for his lectures. The vowel synthesizer (Figure 1) was built in 1962. It applies circuitry with electron valves to establish three formants. The two lower formants can be adjusted by a pointer (Figure 2).

The playback device was constructed in 1973 as an optical spectrogram reader, which controlled the 19 channels of the abovementioned vocoder. Because the device was very attractive as a demonstrator, a portable version (PBG 2, Figure 3) was implemented in 1982.

Both devices are now exhibits of the historic acoustic-phonetic collection (HAPS) of the TU Dresden. The most remarkable fact is that they are still working.

Rüdiger Hoffmann and Ulrich Kordon

Back

Top

Organisation	Events	Membership	Help
> Board	> Interspeech	> Join - renew	> Sitemap
> Legal documents	> Workshops	> Membership directory	> Contact
> Logos			> FAQ
			> Privacy policy