ISCApad #204 |
Tuesday, June 16, 2015 by Chris Wellekens |
3-1-1 | (2015-06-20) Call for participation at INTERSPEECH 2015- Early registration. Early registration deadline at a discounted fee is rapidly approaching: 20 June 2015! Call for Participation | INTERSPEECH 2015 INTERSPEECH is the world’s largest and most comprehensive conference on the science and technology of spoken language processing. INTERSPEECH conferences emphasize interdisciplinary approaches addressing all aspects of speech science and technology, ranging from basic theories to applications. Important Dates
Tutorials The Organizing Committee is announcing eight tutorials to be held on September 6:
Track 2
Track 3
Track 4
Special Sessions The following ten Special Sessions & Challenges (September 7-10) cover interdisciplinary topics and/or important new emerging areas of interest related to the main conference topics: Biosignal-based Spoken Communication Automatic Speaker Verification Spoofing and Countermeasures Zero Ressource Speech Technologies: Unsupervised Discovery of Linguistic Units Robust Speech Processing using Observation Uncertainty and Uncertainty Propagation Speech Science in End User Applications Synergies of Speech and Multimedia Technologies Speech and Language Processing of Children’s Speech Advanced Crowdsourcing for Speech and Beyond For further information about the Keynotes and Regular Sessions, the Doctoral Workshop, nine Satellite Workshops and two related conferences but also on post-venue tours, please refer to the program website: http://interspeech2015.org/program/ Sebastian Möller (General Chair), Oliver Jokisch (Publicity Chair)
| |||||||||||||||||||||||||||
3-1-2 | (2015-09-06) Call for Satellite Workshops of INTERSPEECH 2015, Dresden, Germany**** Call for Satellite Workshops ****
INTERSPEECH 2015 will be held in the beautiful city of Dresden, Germany, on September 6-10, 2015
The theme is 'Speech beyond Speech - Towards a Better Understanding of the Most Important
Biosignal'. The Organizing Committee of INTERSPEECH 2015 is now inviting proposals for
satellite workshops, which will be held in proximity to the main conference.
The Organizing Committee will work to facilitate the organization of such satellite workshops,
to stimulate discussion in research areas related to speech and language, at locations in Central
Europe, and around the same time as INTERSPEECH. We are particularly looking forward to
proposals from neighboring countries. If you are interested in organizing a satellite workshop,
or would like a planned event to be listed as an official satellite event, please contact the organizers
or the Satellite Workshop Chair at fmetze@cs.cmu.edu The Satellite Workshop coordinator along
with the INTERSPEECH team will help to connect (potential) workshop organizers with local
contacts in Germany, if needed, and will try to be helpful with logistics such as payment, publicity,
and coordination with ISCA or other events. Proposals should include:
* workshop name and acronym * organizers' name and contact info
* website (if already known)
* date and proposed location of the workshop
* estimated number of participants
* a short description of the motivation for the workshop
* an outline of the program and invited speakers
* a description of the submission process (e.g. deadlines, target acceptance rate)
* a list of the scientific committee members
Proposals for satellite workshops should be submitted by email to workshops@interspeech2015.org
by August 31st, 2014 We strongly recommend that organizers also apply for
ISCA approval/ sponsorship, which will greatly facilitate acceptance as an INTERSPEECH satellite
event. We plan to notify proposers no later than October 30, 2014. If you have any questions about
whether a potential event would be a good candidate for an INTERSPEECH 2015 satellite workshop
feel free to contact the INTERSPEECH 2015 Satellite Workshops Chair.
Sincerely,
Florian Metze
Satellite Workshops Chair fmetze@cs.cmu.edu
| |||||||||||||||||||||||||||
3-1-3 | (2015-09-06) 3rd and Final Call for INTERSPEECH 2015, Sep 6-10, Dresden, Germany. 3rd and Final Call for INTERSPEECH 2015, Sep 6-10, Dresden, Germany.
INTERSPEECH is the world’s largest and most comprehensive conference on the science and technology of spoken language processing. INTERSPEECH conferences emphasize interdisciplinary approaches addressing all aspects of speech science and technology, ranging from basic theories to applications.
INTERSPEECH 2015 in Dresden (Germany) will be organized around the theme Speech beyond Speech: Towards a Better Understanding of the Most Important Biosignal, which acknowledges the fact that speech is the most important biosignal humans can produce and perceive. It is evident that not all characteristics of speech are already fully understood. We therefore encourage contributions that analyze and model speech as a biosignal in a broad understanding, e.g. for extracting information about the speaker, for identifying processes leading to speech production, or for generating speech signals with specific bio-characteristics. Contributions to all other areas of speech science and technology are also welcome.
Important Dates --------------------- 20 March 2015 Paper: submission deadline 20 March 2015 Tutorial: submission deadline 17 April 2015 Show and Tell: submission deadline
10 June 2015 Paper: camera-ready 10 June 2015 Show and Tell: camera-ready 20 June 2015 Early registration deadline
6-10 Sep 2015 Conference in Dresden, Germany
INTERSPEECH 2015 hosts a wide range of Events, e.g. Special Sessions and Workshops -------------------------------------------------------------------------------------------------------------------
10 Special Sessions - Active Perception in Human and Machine Speech Communication - Biosignal-based Spoken Communication - Interspeech 2015 Computational Paralinguistics Challenge (ComParE): Degree of Nativeness, Parkinson’s & Eating Condition - Automatic Speaker Verification Spoofing and Countermeasures - Zero Resource Speech Technologies: Unsupervised Discovery of Linguistic Units - Robust Speech Processing using Observation Uncertainty and Uncertainty Propagation - Speech Science in End User Applications - Synergies of Speech and Multimedia Technologies - Speech and Language Processing of Children’s Speech - Advanced Crowdsourcing for Speech and Beyond
10 Satellite Workshops - Errors by Humans and Machines in multimedia, multimodal and multilingual data processing (ERRARE) - Speech and Language Processing for Assistive Technologies (SLPAT) - Workshop on Speech and Language Technology for Education (SLaTE) - International Workshop on the History of Speech Communication Research (HSCR) - Workshop on Speech and Audio Technologies for the Digital Humanities (SAT4DH) - Blizzard Challenge Workshop - Special Interest Group on Discourse and Dialogue (SIGDIAL) - International Workshop on Speech Robotics (IWSR) - The 1st Joint Conference on Facial Analysis, Animation and Audio-Visual Speech Processing (FAAVSP) - MediaEval Benchmarking Initiative for Multimedia Evaluation (MediaEval)
3 Related Events - Speech Technology and Human-Computer Dialogue (SpeD) - The 1st Joint Conference on Facial Analysis, Animation and Audio-Visual Speech Processing (FAAVSP) - International Conference on Text, Speech and Dialogue (TSD)
Visit www.interspeech2015.org
******************************************************** Dr. Tim Polzehl Quality and Usability Lab, Telekom Innovation Laboratories, SoftwareCampus Technische Universität Berlin E-mail: tim.polzehl@telekom.de
### visit INTERSPEECH 2015 in Dresden, Germany - http://www.interspeech2015.org ###
DEUTSCHE TELEKOM AG Aufsichtsrat: Prof. Dr. Ulrich Lehner (Vorsitzender) Vorstand: Timotheus Höttges (Vorsitzender), Reinhard Clemens, Niek Jan van Damme, Thomas Dannenfeldt, Dr. Thomas Kremer, Claudia Nemat, Prof. Dr. Marion Schick Handelsregister: Amtsgericht Bonn HRB 6794 Sitz der Gesellschaft Bonn
| |||||||||||||||||||||||||||
3-1-4 | (2015-09-06) Announcement of the Pleanary sessions at Interspeech 2015 Dear Colleagues,
Interspeech-2015 will start in about 3 months and it is time to announce our plenary speakers.
Following the tradition of most past Interspeech conferences, the organizing committee of Interspeech-2015 has decided to present four keynote talks in Dresden, one on each day of the conference. It is also tradition that the first keynote talk will be presented by the ISCA medaillist just at the end of the opening ceremony on Monday morning, Sept. 7, 2015. Information on this year’s ISCA medaillist will be published later in June.
Information on the other three plenary speakers will be however very soon available on the Interspeech-2015 website and here we provide already a brief introduction of the candidates:
Prof. Klaus Scherer from the University of Geneva will present a keynote talk about vocal communication as major carrier of information about a person’s physique, enduring dispositions, strategic intention and current emotional state. He will also discuss the importance of voice quality in comparision to other modalities, e.g. facial expressions, in this context. Prof. Scherer is one of the most prominent researchers in the area of emotion psychology and was holder of an ERC Advanced Grant covering these research topics.
Prof. Katrin Amunts from the Institut of Neuroscience and Medicine at the Research Centre Juelich/Germany will deliver a talk about her research within the European FET-Flagship „The Human Brain Project“. The expectations from this presentation are twofold: Firstly, many of the Interspeech attendants might have heard about the huge „EU Flagship Projects“ which are funded with more than $ 1 Billion each, but does not know what exactly is going on in such a project and how it is organized and structured. This talk will introduce the FET-Flagship project that is mostly relevant to the Interspeech-Community, the „Human Brain Project“. Prof. Amunts is not only the leader of the Subproject 2 “Strategic Human Brain Data”, but is additionally very well-known for her work on how language is mapped in regions of the human brain and how to create a 3D atlas of the human brain.. From her talk we can expect the perfect combination of speech & language research with neural brain research within the probably most prominent project in this area.
Everybody in the Speech Community knows the popular „Personal Digital Assistants“, such as Siri, Cortana, or Google Now. However, many of those people might not know exactly what detailed technology is behind these highly commercially successful systems. The answer to this question will be given by Dr. Ruhi Sarikaya from Microsoft in his keynote address. His group has been building the language understanding and dialog management capabilities of both Cortana and Xbox One. In his talk, he will give an overview of personal digital assistants and describe the system design, architecture and the key components behind them. He will highlight challenges and describe best practices related to bringing personal assistants from laboratories to the real-world.
I hope you agree that we will have truly exciting plenary talks at this year’s Interspeech and the Interspeech-2015 team is looking forward to sharing this experience with you soon in September in Dresden.
Gerhard Rigoll Plenary Chair, Interspeech-2015
| |||||||||||||||||||||||||||
3-1-5 | (2015-09-06) Call for Applications: Doctoral Consortium at Interspeech 2015Call for Applications: Doctoral Consortium at Interspeech 2015Prospective and current doctoral students from all speech-related disciplines are invited to apply for admission to the Doctoral Consortium to be held at Interspeech 2015 in Dresden, Germany. The doctoral consortium is aimed at providing students working on speech-related topics with an opportunity to discuss their doctoral research with experts from their fields, and to receive feedback from experts and peers on their PhD projects. The format of the Doctoral Consortium will be a one-day workshop prior to the main conference (6th September). It will involve short presentations by participants summarizing their projects, followed by an intensive discussion of these presentations. The Doctoral Consortium is a new format at Interspeech, and is held this year in exchange for the ?Students Meet Experts? lunch event that was held at previous Interspeech Conferences. It is organized by the Student Advisory Committee of the International Speech Communication Association (ISCA-SAC). How to apply: Who should apply: Important dates: For further questions please contact: doctoral.workshop@interspeech2015.org
| |||||||||||||||||||||||||||
3-1-6 | (2015-09-06) CfP Doctoral Consortium at Interspeech 2015Call for Applications: Doctoral Consortium at Interspeech 2015Prospective and current doctoral students from all speech-related disciplines are invited to apply for admission to the Doctoral Consortium to be held at Interspeech 2015 in Dresden, Germany. The doctoral consortium is aimed at providing students working on speech-related topics with an opportunity to discuss their doctoral research with experts from their fields, and to receive feedback from experts and peers on their PhD projects. The format of the Doctoral Consortium will be a one-day workshop prior to the main conference (6th September). It will involve short presentations by participants summarizing their projects, followed by an intensive discussion of these presentations. The Doctoral Consortium is a new format at Interspeech, and is held this year in exchange for the ?Students Meet Experts? lunch event that was held at previous Interspeech Conferences. It is organized by the Student Advisory Committee of the International Speech Communication Association (ISCA-SAC).
How to apply: Who should apply: Important dates: For further questions please contact: doctoral.workshop@interspeech2015.org
| |||||||||||||||||||||||||||
3-1-7 | (2015-09-06)) CfP INTERSPEECH 2015 Special Session on Synergies of Speech and Multimedia Technologies INTERSPEECH 2015
| |||||||||||||||||||||||||||
3-1-8 | (2016) INTERSPEECH 2016, San Francisco, CA, USA Interspeech 2016 will take place from September 8-12 2016 in San Francisco, CA, USA General Chair is Nelson Morgan. You may from now on be tempted by the nice pictures of the cover page of its tentative website http://www.interspeech2016.org
| |||||||||||||||||||||||||||
3-1-9 | INTERSPEECH 2015 Update (June 2015)
A View from Dresden onto the History of Speech Communication Part 6: Measuring speech respiration Complete article including figures available at: http://interspeech2015.org/conference/historical-review/ The respiratory behaviour of humans provides important bio-signals, in speech communication we are interested in respiration while speaking. The investigation of speech respiration mainly entails the observation of i) activity of muscles relevant for in- and exhalation, ii) lung volume, iii) airflow, iv) sub-glottal air pressure, and v) kinematic movements of the thorax and the abdomen. In the 'early days' of experimental phonetics the measurements were mainly focused on lung volume and the kinematic behaviour of the rib cage and the belly. We present here three devices that are also part of the historic acoustic-phonetic collection (HAPS) which will be re-opened during the International Workshop on the History of Speech Communication Research. The Atemvolumenmesser (Figure 1) is an instrument to measure the vital capacity of the lung and the phonatory flow, respectively. The human subject maximally inhales with the help of a mask put onto mouth and nose. The subsequent expelling air arrives into the bellows via a mouthpiece and a rubber tube. The resulting volume can be seen on a vertical scale. A stylus that is mounted on a small tube at the scale allows to register the temporal dynamics of speech breathing with the help of a kymograph. Figure 2 shows a Gürtel-Pneumograph ('belt-pneumograph') which serves to investigate the respiratory movements. The wave-like surfaced rubber tubes are fixed around the upper part of the body of the human subject in order to measure the thoracic and abdominal respiration. Changes of the kinematics result in changes of the air pressure to be transmitted via tubes of so-called Marey capsules onto the stylus to be registered with a kymograph. The kymograph was the core instrument of the experimental phonetic research until the 1930s. The 'wave-writer' graphically represents changes over time. A revolving drum was wrapped with a sheet of paper with soot (impure carbon) on the surface and a fine-grained stylus easily writes the measured changes. A clockwork motor was responsible for the constant revolution of the drum. Time-relevant parameters like speech wave forms, air pressure changes of the pneumograph or air volume changes of the Atemvolumenmesser were transduced into kinematic parameters via the Marey capsules and registered on the time axis. The drum in Figure 3 has a height of 180 mm and circumference of 500 mm. At the beginning the drum is at the top and sinks continuously downwards during the registration process. This spiral movement allows the graphical recording of longer curves on the paper. The speed of the revolution of the drum could be set between 0.1 and 250 mm per second. Jürgen Trouvain and Dieter Mehnert
| |||||||||||||||||||||||||||
3-1-10 | INTERSPEECH 2015 Update (April 2015)
A View from Dresden onto the History of Speech Communication Part 4: Helmholtz Resonators Complete article including figures available at: http://interspeech2015.org/conference/historical-review/ Hermann von Helmholtz (1821 –1894) was a German physician and physicist who made important contributions in many areas of science. One of these areas was acoustics, where he published the famous book 'On the sensations of tone as a physiological basis for the theory of music' in 1863. There he described his invention of a special type of resonator, which is now known as Helmholtz resonator. These resonators were devised as highly sensitive devices to identify the harmonic components (partial tones) of sounds and allowed significant advances in the acoustic analysis of vowel sounds and musical instruments. Before the invention of Helmholtz resonators, strong partial tones in a sound wave were typically identified by very thin, elastic membranes that were spanned on circular rings similar to drums. Such a membrane has a certain resonance frequency (in fact multiple frequencies) that depends on its material, tension, and radius. If the sound field around the membrane contains energy at this frequency, the membrane is excited and starts to oscillate. The tiny amplitudes of this oscillation can be visually detected when fine grained sand is distributed over its surface. When the membrane is excited with its lowest resonance frequency, the sand accumulates at the rim of the membrane or along specific lines on its surface, when higher order modes are excited. With a set of membranes tuned to different frequencies, a rough spectral analysis can be conducted. It was also known that the sensitivity of this method could be improved when the membrane was spanned over the (removed) bottom of a bottle with an open neck end. The key idea of Helmholtz was to replace this bottle by a hollow sphere with an open neck at one 'end' and another small spiky opening at the opposite 'end'. The spiky opening had to be inserted into one ear canal. In this way, the eardrum was excited similarly to the membrane with the sand of the previous technique. However, due to the high sensitivity of the ear, partial tones could be detected much more easily. A further advantage of these resonators was that their resonance frequencies can be expressed analytically in terms of the volume of the sphere and the diameter and the length of the neck. Hence these resonators became important experimental tools for the subjective sound analysis in the late 19th century and the early 20th century. The HAPS at the TU Dresden contains three sets of Helmholtz resonators. The biggest of these sets contains 11 resonators, which are tuned to frequencies between 128 Hz and 768 Hz. The HAPS also contains a related kind of resonators that were invented by Schaefer (1902). These resonators are tubes with one open end and one closed end. The closed end also has a small spiky opening that has to be inserted into the ear canal. These resonators maximally respond to frequencies of which the wavelength is four times the length of the tube. Helmholtz used his resonators not only for sound analysis, but also for the synthesis of vowels. Therefore, he first had to analyze the resonances of the vocal tract for different vowels. He did this by means of a set of tuning forks, which he placed and excited directly in front of his open mouth when he silently articulated the different vowels. When the frequency of a tuning fork was close to a resonance of the vocal tract, the resulting sound became much louder than for the other frequencies. For each of the vowels /u/, /o/, and /a/, he was only able to detect a single resonance of the vocal tract at the frequencies 175 Hz (note f), 494 Hz (note b’) and 988 Hz (note b’’), respectively. For each of the other investigated German vowels, he even detected two resonances. The single resonances detected for /u/, /o/ and /a/ probably correspond to the clusters of the nearby first and second resonances of the corresponding vowels. Obviously, his method of analysis was not sensitive enough to separate the two individual resonances of each of the vowels. To synthesize the vowels /u/, /o/, and /a/ with a single resonance, he simply connected a reed pipe to Helmholtz resonators tuned to the corresponding frequencies. For the vowels with two resonances, he selected a Helmholtz resonator for one of the resonances and attached a 6-10 cm long glass tube to the outer opening of the resonator to create the second resonance. These experiments showed that Helmholtz had surprising insight in the source-filter principle of speech production, which was fully elaborated by Gunnar Fant and others 100 years later. Peter Birkholz
| |||||||||||||||||||||||||||
3-1-11 | INTERSPEECH 2015 Update (December 2014) Updates from INTERSPEECH 2015 Dear colleague, Interspeech 2015 in Dresden is approaching at an increasing pace, and the entire team of organizers is trying to ensure that you will get a conference which meets all, and hopefully surpasses some, of your expectations. Regarding the usual program of oral and poster sessions, special sessions and challenges, keynotes, tutorials and satellite workshops, the responsible team is working hard to ensure that you will get a program which is not only of respectable breadth and depth, but which also tackles a couple of innovative topics, some of them centered around the special topic of the conference “Speech beyond speech: Towards a better understanding of our most important biosignal”, some of them also addressing other emergent topics. We would particularly like to draw your attention to the approaching deadlines: In addition to regular papers, we will also experiment with a virtual attendance format for persons who are – mainly for visa or health reasons – not able to come to Dresden to present their paper. For these persons, a limited number of technology-equipped poster boards will be available where online presentations can be held. The number of virtual attendance slots is strictly limited (thus potentially leading to a lower acceptance rate). The corresponding papers have to pass the normal review process, but the deadline will most probably be around 14 days before the normal paper submission deadline. More details on this format will be announced soon. In the upcoming months, we will keep you updated via this thread, and we will present some historical instruments and techniques related to speech technology which nicely illustrate that Dresden has a rich history in speech science and technology. Interspeech 2015 will hopefully contribute to this history with the latest scientific and technological advances. The entire organizing team is looking forward to welcoming you in Dresden. On behalf of the organizing team,
| |||||||||||||||||||||||||||
3-1-12 | INTERSPEECH 2015 Update (February 2015) +++ INTERSPEECH 2015 – February Update +++ Dear colleagues, The preparations for Interspeech 2015 in Dresden are running at an increasing pace, and I got the impression that we have a very active contribution from the community this year. Bernd Möbius and Elmar Nöth, our TCP Chairs, have set up a comprehensive and balanced group of Area Chairs for the new areas we have agreed upon with ISCA, and which will soon be published on our website. The ten preliminarily accepted Special Sessions and Challenges are active in collecting contributions; as an example, the session “Advanced Crowdsourcing for Speech and Beyond” has received 17 requests for research funds, which will now be evaluated according to their fit to the special session topic. And our sponsorship, industry and exhibition chairs, Tim Fingscheidt, Claudia Pohlink, Jimmy Kunzmann and Reinhold Häb-Umbach, are actively soliciting sponsoring money to make the event most affordable for you. The 2nd Call for Papers is out (deadline March 20): In addition there is a special Call for Papers with Virtual Presentation which solicits contributions to this special format we will experiment with at this year’s Interspeech for the first time, and which will be limited to exceptional cases which otherwise would not be able to participate: In addition, there is still the option to submit proposals for Tutorials (deadline March 20) and Show and Tell contributions (deadline April 17): All further information can be found on our Website which Tim Polzehl is eager to keep updated. For automatically receiving continuous updates, we recommend that you follow us on Twitter (@interspeech2015), or that you use social channels such as LinkedIn or Facebook. And: Please do not delete your Interspeech 2014 App, it will automatically receive an update for Interspeech 2015. Finally, Dresden is also polishing her historical charm, and for Interspeech attendants the most important aspect of this might be the second contribution to our historical series, which this time is dedicated to the world’s first successful attempt of a mechanical speech synthesiser. On behalf of the organizing team, A View from Dresden onto the History of Speech Communication Complete article including figures available at: http://interspeech2015.org/conference/historical-review/
The speaking machine of Wolfgang von Kempelen (1734-1804) can be considered as the first successful attempt of a mechanical speech synthesiser. The Austrian-Hungarian engineer is still famous for his 'chess turk' but it was his 'Sprachmaschine' that can count as a milestone in (speech) technology. In his book 'Mechanismus der menschlichen Sprache nebst der Beschreibung einer sprechenden Maschine' (published 1791, no English translation yet) he described the function of the machine which was intended to give a voice for deaf people. Contemporary personalities like Goethe confirmed the authenticity of a child voice when the speaking machine was played.
How does the machine work? The machine consists of bellows that is connected with a tube to a wooden wind chest. On the other side of the wind chest a round wooden block represents the interface to an open rubber funnel (as the vocal tract). In the wind chest there are two modified recorders to produce the fricatives [s] and [S]. The voice generator is located inside the wooden block. The artificial voice is generated with the help of a reed pipe borrowed by the pipe organ. It has an ivory reed vibrating against a wooden hollow shallot (like in a clarinet). The trained human operator plays the machine like a musical instrument. The right elbows control the air pressure by pressing on the bellows, two fingers of the right hand close or open the access for stops and nasals, two other fingers of the right hand for the fricatives. Vowels are performed by the palm of left hand in different ways.
Replicas Apart from parts of one of the originals that are hosted at the Deutsches Museum in Munich there are several reconstructions based on Kempelen's quite detailed descriptions. The replicas built in Budapest, Vienna, York and Saarbrücken allow a lively demonstration of the mechanical generation of speech as well its acoustic analysis but also perception tests with today's listeners. Interestingly, the art of constructing artificial voices led to the profession of 'voice makers' in Eastern-German Thuringia (more information in one of the next newsletters). Original products of the Thuringian 'Stimmenmacher' as well as one of the replicas located at TU Dresden are at display of the HAPS (Historische Akustisch-Phonetische Sammlung) available for ears, eyes (and hands) at the re-opening of HAPS at 4 Sept, which is also the start of the Interspeech satellite Workshop on The History of Speech Communication Research (HSCR 2015).
Jürgen Trouvain and Fabian Brackhane
| |||||||||||||||||||||||||||
3-1-13 | INTERSPEECH 2015 Update (January 2015)
+++ INTERSPEECH 2015 Update – and a look back! +++ Dear colleagues, The regular paper deadline for Interspeech 2015 in Dresden is only 2 months away, so we hope that you are preparing for your submissions. We have received an impressive number of Special Session and Challenges proposals. The list of preliminarily accepted proposals, together with more information on each session and its organizers, can be found under http://interspeech2015.org/events/special-sessions/. Thus, in case that your interests fall within the area of one of these Special Sessions or Challenges, consider submitting there. Please note that March 20 is – apart from the general paper deadline – also the deadline for tutorial proposals. More details on tutorial proposal submissions can be found under http://interspeech2015.org/calls/call-for-tutorials/. The deadline for Show & Tell papers is then April 17. The current list of Satellite Workshops will be updated successively and can be found under http://interspeech2015.org/events/workshops/. From now on, we will have a monthly view back to the history of speech communication and technology which happened in Dresden. On behalf of the organizing team, Sebastian Möller (General Chair) A View from Dresden onto the History of Speech Communication Part 1: The historic acoustic-phonetic collection Information Technology at the TU Dresden goes back to Heinrich Barkhausen (1881–1956), the 'father of the electron valve', who taught from 1911 to 1953. Speech research in a narrower sense started with the development of a vocoder in the 1950s. Walter Tscheschner (1927–2004) performed his extensive investigations on the speech signal using components of the vocoder. In 1969, a scientific unit for Communication and Measurement was founded in Dresden. It is the main root of the present Institute of Acoustics and Speech Communication. W. Tscheschner was appointed Professor of Speech Communication and started with research in speech synthesis and recognition, which today continues. Numerous objects from the history of Speech Communication in Dresden, but also from other parts of Germany, are preserved at the historic acoustic-phonetic collection of the TU Dresden. Until the opening of Interspeech 2015, we will present interesting exhibits from the collection in this newsletter monthly. Today, we give an introduction. The historic acoustic-phonetic collection of the TU Dresden consists of three parts: • Objects that illustrate the development of acoustics and speech technology at the TU Dresden. The most interesting devices are speech synthesizers of various technologies. • Objects illustrating the development of experimental phonetics from 1900 until the introduction of the computer. The items of this part were collected by D. Mehnert from different phonetics laboratories and rehabilitation units throughout Germany. • Objects which were formerly collected at the Phonetics Institute of Hamburg University. This important collection, which was founded by Giulio Panconcelli-Calzia, was transferred to Dresden in 2005 in accordance with a contract due to the closing of the Hamburg institute. The collection is presented in the Barkhausenbau at the main campus of the TU Dresden. Recently, it is moving to new rooms which are more convenient for the presentation. The newly installed collection will be re-opened at the opportunity of Interspeech 2015. For this purpose, we cordially invite to a workshop on the history of speech communication, called HSCR2015, which will be held as a satellite event of Interspeech 2015 at September 4/5, 2015, in the Technical Museum of the City of Dresden. It is organized by the special interest group (SIG) on 'The History of Speech Communication Sciences', which is supported by the International Speech Communication Association (ISCA) and the International Phonetic Association (IPA). More information on the workshop is presented on http://www.sig-hist.org/. Rüdiger Hoffmann (Local Chair)
| |||||||||||||||||||||||||||
3-1-14 | INTERSPEECH 2015 Update (March 2015) A View from Dresden onto the History of Speech Communication Part 3: Voices for toys - First commercial spin-offs in speech synthesis
Complete article including figures available at:
When Wolfgang von Kempelen died in 1804, his automata (including the speaking machine) came in ownership of Johann Nepomuk Maelzel (1772 – 1838), who demonstrated them at many tours in Europe and America. He was a clever mechanic and applied Kempelen’s ideas in a mechanical voice for puppets, which could pronounce “Mama” and “Papa”. He received a patent on it in 1824 (Figure 1).
The idea of speaking puppets and toys was continued mainly in the area of Sonneberg in Thuringia, Germany. This small town was the world capital of manufacturing puppets and toys in the 19th century. The voices consist of a bellow, a metal tongue for voicing, and a resonator. There are three reasons why we appreciate the mechanical voices as a milestone in the development of speech technology:
1. The mechanical voices established the first commercial spin-off in speech research. The toy manufacturers in Sonneberg recognized the importance of Mälzel’s invention and produced speaking puppets from 1852. The “Stimmenmacher” (voices maker) was a specific profession, and we find eight manufacturers for human and animal voices alone in Sonneberg in 1911. The most important of them was Hugo Hölbe (1844 – 1931), who developed mechanisms which were able to speak not only Mama/Papa (Figure 2), but also words like Emma, Hurrah, etc.
2. The mechanical voices were applied in the first book with multimodal properties. The bookseller Theodor Brand from Sonneberg received a patent for his “speaking picture book” in 1878. This book shows different animals. Pulling a knob, which corresponds to a picture, activates the voice of the animal (Figure 3). The picture book was published in several languages and was a huge commercial success all over the world.
3. The mechanical voices are the first attempt to support the rehabilitation of hard hearing people by means of speech technology. The German otologist Johannes Kessel (1839 – 1907) demonstrated Hölbe’s voices as a training tool in speech therapy at a conference in 1899. The quality of this kind of synthetic speech proved to be not sufficient for this purpose, however.
The samples from Kessel came to the Phonetic Laboratory of Panconcelli-Calzia in Hamburg, who mentioned them in his historic essays. Due to the transfer of the phonetic exhibits from Hamburg to Dresden in 2005, you can visit the mechanical voices in the HAPS of the TU Dresden now.
Rüdiger Hoffmann Photographs Copyright TU Dresden / HAPS
| |||||||||||||||||||||||||||
3-1-15 | INTERSPEECH 2015 Update (May 2015) A View from Dresden onto the History of Speech Communication
Part 5: Artificial vocal fold models – The investigation of phonation The investigation of the larynx was (and is) one of the predominant topics in phonetic research. In the early times of experimental phonetics, mechanical models of the larynx or, at least, of the vocal folds have been utilized according to the paradigm of analysis-by-synthesis.
The first models used flat parallel elastic membranes or other simple elements to simulate the function of the vocal folds (Fig. 1). However, the geometry of these models was rather different from that of real human vocal folds. A substantial progress was made by Franz Wethlo (1877 – 1960), who worked at the Berlin university as an educationalist and special pedagogue. He realized that the vocal folds should not be modelled by flat parallel membranes, but that the three-dimensional shape of the vocal folds should be taken into account. Hence, he proposed a three-dimensional model, which was formed by two elastic cushions (Fig. 2). The cushions were filled with pressurized air, the pressure of which could be varied for experimental purposes. In particular, the air pressure in the cushion pipes was varied to adjust the tension of the vocal folds. The whole model was known as “Polsterpfeife” (cushion pipe). Wethlo described it in 1913.
The historical collection (HAPS) at the TU Dresden owns several cushion pipes from Wethlo in different sizes, modelling male, female, and children’s voices. A team from the TU Dresden repeated Wethlo’s experiments with his original equipment in 2004. Therefore, the cushion pipes were connected to a historical “vocal tract model”. This vocal tract model was actually a stack of wooden plates with holes of different diameters to model the varying cross-sectional area of the vocal tract between the glottis and the lips (Fig. 3). This “configurable” vocal tract model came to the HAPS collections from the Institute of Phonetics in Cologne. The artificial vocal folds were used to excite vocal tract configurations for the vowels /a/, /i/ and /u/, but listening experiments showed that these artificial vowels were rather difficult to discriminate.
Today, there is renewed interest in mechanical models of the vocal folds. Such models can be used in physical 3d robotic models of the speech apparatus (e. g., the Waseda talker series of talking robots: http://www.takanishi.mech.waseda.ac.jp/top/research/voice/), to evaluate the accuracy of low-dimensional digital vocal fold models (e. g., http://scitation.aip.org/content/asa/journal/jasa/121/1/10.1121/1.2384846) or to examine pathological voice production.
Rüdiger Hoffmann & Peter Birkholz
|