ISCA - International Speech
Communication Association


ISCApad Archive  »  2011  »  ISCApad #160  »  Resources

ISCApad #160

Saturday, October 08, 2011 by Chris Wellekens

5 Resources
5-1 Books
5-1-1Alain Marchal, Christian Cave, L'imagerie medicale pour l'etude de la parole

Alain Marchal, Christian Cave

Eds Hermes Lavoisier

99 euros • 304 pages • 16 x 24 • 2009 • ISBN : 978-2-7462-2235-9

Du miroir laryngé à la vidéofibroscopie actuelle, de la prise d'empreintes statiques à la palatographie dynamique, des débuts de la radiographie jusqu'à l'imagerie par résonance magnétique ou la magnétoencéphalographie, cet ouvrage passe en revue les différentes techniques d'imagerie utilisées pour étudier la parole tant du point de vue de la production que de celui de la perception. Les avantages et inconvénients ainsi que les limites de chaque technique sont passés en revue, tout en présentant les principaux résultats acquis avec chacune d'entre elles ainsi que leurs perspectives d'évolution. Écrit par des spécialistes soucieux d'être accessibles à un large public, cet ouvrage s'adresse à tous ceux qui étudient ou abordent la parole dans leurs activités professionnelles comme les phoniatres, ORL, orthophonistes et bien sûr les phonéticiens et les linguistes.

Back  Top

5-1-2Christoph Draxler, Korpusbasierte Sprachverarbeitung

Author: Christoph Draxler
Title: Korpusbasierte Sprachverarbeitung
Publisher: Narr Francke Attempto Verlag Tübingen
Year: 2008
Link: http://www.narr.de/details.php?catp=&p_id=16394

Summary: Spoken language is a major area of linguistic research and speech technology development. This handbook presents an introduction to the technical foundations and shows how speech data is collected, annotated, analysed, and made accessible in the form of speech databases. The book focuses on web-based procedures for the recording and processing of high quality speech data, and it is intended as a desktop reference for practical recording and annotation work. A chapter is devoted to the Ph@ttSessionz database, the first large-scale speech data collection (860+ speakers, 40 locations in Germany) performed via the Internet. The companion web site (http://www.narr-studienbuecher.de/Draxler/index.html) contains audio examples, software tools, solutions to the exercises, important links, and checklists. 

Back  Top

5-1-3Robert M. Gray, Linear Predictive Coding and the Internet Protocol

Linear Predictive Coding and the Internet Protocol, by Robert M. Gray, a special edition hardback book from Foundations and Trends in Signal Processing (FnT SP). The book brings together two forthcoming issues of FnT SP, the first being a survey of LPC, the second a unique history of realtime digital speech on packet networks.

 

Volume 3, Issue 3                                                                                                                                                                                                 

A Survey of Linear Predictive Coding: Part 1 of LPC and the IP                                                                                                                                  

By Robert M. Gray (Stanford University)                                                                                                                                                                  

http://www.nowpublishers.com/product.aspx?product=SIG&doi=2000000029                                                                                                             

 

Volume 3, Issue  4

 

A History of Realtime Digital Speech on Packet Networks: Part 2 of LPC and the IP                                                                                                     

By Robert M. Gray (Stanford University)                                                                                                                                                                  

http://www.nowpublishers.com/product.aspx?product=SIG&doi=2000000036                                                                                                            

 

The links above will take you to the article abstracts.

Back  Top

5-1-4M. Embarki and M. Ennaji, Modern Trends in Arabic Dialectology

Modern Trends in Arabic Dialectology,
M. Embarki & M. Ennaji (eds.), Trenton (USA): The Red Sea Press.

Contents
Introduction
Mohamed Embarki and Moha Ennaji
vii
Part I: Theoretical and Hi storical Perspectives
and Methods in Arabic Di alectology
Chapter 1 : Arabic Dialects: A Discussion
Janet C. E. Watson p. 3
Chapter 2 : The Emergence of Western Arabic: A Likely Consequence of Creolization
Federrico Corriente p. 39
Chapter 3 : Acoustic Cues for the Classification of Arabic Dialects
Mohamed Embarki p. 47
Chapter 4 : Variation and Attitudes:
A Sociolinguistic Analysis of the Qaaf
Maher Bahloul p. 69

Part II : Eastern Arabic Di alects
Chapter 5 : Arabic Bedouin Dialects and their Classification
Judith Rosenhouse p. 97
Chapter 6 : Evolution of Expressive Structures in Egyptian Arabic
Amr Helmy Ibrahim p. 121
Chapter 7 : ?adram? Arabic Lexicon
Abdullah Hassan Al-Saqqaf p. 139

Part III: Western Arabic Di alects
Chapter 8 : Dialectal Variation in Moroccan Arabic
Moha Ennaji p. 171
Chapter 9 : Formation and Evolution of Andalusi Arabic and its
Imprint on Modern Northern Morocco
Ángeles Vicente p. 185
Chapter 10 : The Phonetic Implementation of Falling Pitch Accents
in Dialectal Maltese: A Preliminary Study
of the Intonation of Gozitan ?ebbu?i
Alexandra Vella p. 211
Index p. 239



Back  Top

5-1-5Gokhan Tur , R De Mori, Spoken Language Understanding: Systems for Extracting Semantic Information from Speech

Title: Spoken Language Understanding: Systems for Extracting Semantic Information from Speech

Editors: Gokhan Tur and Renato De Mori

Web: http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470688246.html

Brief Description (please use as you see fit):

Spoken language understanding (SLU) is an emerging field in between speech and language processing, investigating human/ machine and human/ human communication by leveraging technologies from signal processing, pattern recognition, machine learning and artificial intelligence. SLU systems are designed to extract the meaning from speech utterances and its applications are vast, from voice search in mobile devices to meeting summarization, attracting interest from both commercial and academic sectors.

Both human/machine and human/human communications can benefit from the application of SLU, using differing tasks and approaches to better understand and utilize such communications. This book covers the state-of-the-art approaches for the most popular SLU tasks with chapters written by well-known researchers in the respective fields. Key features include:

Presents a fully integrated view of the two distinct disciplines of speech processing and language processing for SLU tasks.

Defines what is possible today for SLU as an enabling technology for enterprise (e.g., customer care centers or company meetings), and consumer (e.g., entertainment, mobile, car, robot, or smart environments) applications and outlines the key research areas.

Provides a unique source of distilled information on methods for computer modeling of semantic information in human/machine and human/human conversations.

This book can be successfully used for graduate courses in electronics engineering, computer science or computational linguistics. Moreover, technologists interested in processing spoken communications will find it a useful source of collated information of the topic drawn from the two distinct disciplines of speech processing and language processing under the new area of SLU.

Back  Top

5-1-6Jody Kreiman, Diana Van Lancker Sidtis ,Foundations of Voice Studies: An Interdisciplinary Approach to Voice Production and Perception

Foundations of Voice Studies: An Interdisciplinary Approach to Voice Production and Perception
Jody Kreiman, Diana Van Lancker Sidtis
ISBN: 978-0-631-22297-2
Hardcover
512 pages
May 2011, Wiley-Blackwell

Foundations of Voice Studies provides a comprehensive description and analysis of the multifaceted role that voice quality plays in human existence.

•Offers a unique interdisciplinary perspective on all facets of voice perception, illustrating why listeners hear what they do and how they reach conclusions based on voice quality
•Integrates voice literature from a multitude of sources and disciplines
•Supplemented with practical and approachable examples, including a companion website with sound files, available on publication at www.wiley.com/go/voicestudies
•Explores the choice of various voices in advertising and broadcasting, and voice perception in singing voices and forensic applications
•Provides a straightforward and thorough overview of vocal physiology and control


Back  Top

5-1-7G. Nick Clements and Rachid Ridouane, Where Do Phonological Features Come From?

 

Where Do Phonological Features Come From?

Edited by G. Nick Clements and Rachid Ridouane

CNRS & Sorbonne-Nouvelle

This volume offers a timely reconsideration of the function, content, and origin of phonological features, in a set of papers that is theoretically diverse yet thematically strongly coherent. Most of the papers were originally presented at the International Conference 'Where Do Features Come From?' held at the Sorbonne University, Paris, October 4-5, 2007. Several invited papers are included as well. The articles discuss issues concerning the mental status of distinctive features, their role in speech production and perception, the relation they bear to measurable physical properties in the articulatory and acoustic/auditory domains, and their role in language development. Multiple disciplinary perspectives are explored, including those of general linguistics, phonetic and speech sciences, and language acquisition. The larger goal was to address current issues in feature theory and to take a step towards synthesizing recent advances in order to present a current 'state of the art' of the field.

 

 

Back  Top

5-1-8Dorothea Kolossa and Reinhold Haeb-Umbach: Robust Speech Recognition of Uncertain or Missing Data
Title: Robust Speech Recognition of Uncertain or Missing Data
Editors: Dorothea Kolossa and Reinhold Haeb-Umbach
Publisher: Springer
Year: 2011
ISBN 978-3-642-21316-8
Link:
http://www.springer.com/engineering/signals/book/978-3-642-21316-8?detailsPage=authorsAndEditors

Automatic speech recognition suffers from a lack of robustness with
respect to noise, reverberation and interfering speech. The growing
field of speech recognition in the presence of missing or uncertain
input data seeks to ameliorate those problems by using not only a
preprocessed speech signal but also an estimate of its reliability to
selectively focus on those segments and features that are most reliable
for recognition. This book presents the state of the art in recognition
in the presence of uncertainty, offering examples that utilize
uncertainty information for noise robustness, reverberation robustness,
simultaneous recognition of multiple speech signals, and audiovisual
speech recognition.

The book is appropriate for scientists and researchers in the field of
speech recognition who will find an overview of the state of the art in
robust speech recognition, professionals working in speech recognition
who will find strategies for improving recognition results in various
conditions of mismatch, and lecturers of advanced courses on speech
processing or speech recognition who will find a reference and a
comprehensive introduction to the field. The book assumes an
understanding of the fundamentals of speech recognition using Hidden
Markov Models.
Back  Top

5-1-9Mohamed Embarki et Christelle Dodane: La coarticulation

LA COARTICULATION

 

Mohamed Embarki et Christelle Dodane

Des indices à la représentation

La parole est faite de gestes articulatoires complexes qui se chevauchent dans l’espace et dans le temps. Ces chevauchements, conceptualisés par le terme coarticulation, n’épargnent aucun articulateur. Ils sont repérables dans les mouvements de la mâchoire, des lèvres, de la langue, du voile du palais et des cordesvocales. La coarticulation est aussi attendue par l’auditeur, les segments coarticulés sont mieux perçus. Elle intervient dans les processus cognitifs et linguistiques d’encodage et de décodage de la parole. Bien plus qu’un simple processus, la coarticulation est un domaine de recherche structuré avec des concepts et des modèles propres. Cet ouvrage collectif réunit des contributions inédites de chercheurs internationaux abordant lacoarticulation des points de vue moteur, acoustique, perceptif et linguistique. C’est le premier ouvrage publié en langue française sur cette question et le premier à l’explorer dans différentes langues.

 

 

Collection : Langue & Parole, L'Harmattan

ISBN : 978-2-296-55503-7 • 25 € • 260 pages

 

 

Mohamed Embarki

est maître de conférences-HDR en phonétique à l’université de Franche-Comté (Besançon) et membre du Laseldi (E.A. 2281). Ses travaux portent sur les aspects (co)articulatoires et acoustiques des parlers arabes modernes ainsi que sur leurs motivations sociophonétiques.

Christelle Dodane

est maître de conférences en phonétique à l’université Paul-Valéry (Montpellier 3) et elle est affiliée au laboratoire DIPRALANG (E.A. 739). Ses recherches portent sur la communication langagière chez le jeune enfant (12-36 mois) et notamment sur le rôle de la prosodie dans le passage du niveau pré-linguistique au niveau linguistique, dans la construction de la première syntaxe et dans le langage adressé à l’enfant.

Back  Top

5-1-10Ben Gold, Nelson Morgan, Dan Ellis :Speech and Audio Signal Processing: Processing and Perception of Speech and Music [Digital]

Speech and Audio Signal Processing: Processing and Perception of Speech and Music [Digital]  Ben GoldNelson Morgan, Dan Ellis

http://www.amazon.com/Speech-Audio-Signal-Processing-Perception/dp/product-description/1118142888

Back  Top

5-2 Database
5-2-1ELRA - Language Resources Catalogue - Update (2011-09)

*****************************************************************
ELRA - Language Resources Catalogue - Update
*****************************************************************

ELRA is happy to announce that 4 new Speech Resources from the GlobalPhone corpus are now available in its catalogue.
Moreover, an updated version of the Venice Italian Treebank (VIT) has also been released. 

1) New Language Resources:

The GlobalPhone Corpus:
The GlobalPhone corpus was designed to provide read speech data for the development and evaluation of large continuous speech recognition systems in the most widespread languages of the world, and to provide a uniform, multilingual speech and text database for language independent and language adaptive speech recognition as well as for language identification tasks. The entire GlobalPhone corpus enables the acquisition of acoustic-phonetic knowledge of the following 19 spoken languages Arabic (ELRA-S0192), Bulgarian (ELRA-S0319), Chinese-Mandarin (ELRA-S0193), Chinese-Shanghai (ELRA-S0194), Croatian (ELRA-S0195), Czech (ELRA-S0196), French (ELRA-S0197), German (ELRA-S0198), Japanese (ELRA-S0199), Korean (ELRA-S0200), Polish (ELRA-S0320), Portuguese (Brazilian) (ELRA-S0201), Russian (ELRA-S0202), Spanish (Latin America) (ELRA-S0203), Swedish (ELRA-S0204), Tamil (ELRA-S0205), Thai (ELRA-S0321), Turkish (ELRA-S0206), Vietnamese (ELRA-S0322). In each language about 100 sentences were read from each of the 100 speakers. The read texts were selected from national newspapers available via Internet to provide a large vocabulary (up to 65,000 words). The read articles cover national and international political news as well as economic news.

Special prices are offered for a combined purchase of several GlobalPhone languages (5 languages, 10 languages, 15 languages or 19 languages).

New 4 languages are available from the GlobalPhone corpus:
ELRA-S0319 GlobalPhone Bulgarian
For more information, see: http://catalog.elra.info/product_info.php?products_id=1141
ELRA-S0320 GlobalPhone Polish
For more information, see: http://catalog.elra.info/product_info.php?products_id=1142
ELRA-S0321 GlobalPhone Thai
For more information, see: http://catalog.elra.info/product_info.php?products_id=1143
ELRA-S0322 GlobalPhone Vietnamese
For more information, see: http://catalog.elra.info/product_info.php?products_id=1144


2) Update of ELRA-W0040 Venice Italian Treebank (VIT):
The new version of VIT has a totally revised constituent-based representation and a completely new dependency-based representation which has been achieved by semi-automatic procedures.

The VIT, Venice Italian Treebank contains about 272,000 words distributed over six different domains: bureaucratic, political, economic and financial, literary, scientific, and news. In addition, some 60,000 tokens of spoken dialogues in different Italian varieties were annotated.
The annotation follows general X-bar criteria with 29 constituency labels and 102 PoS tags. VIT is also made available in a broad annotation version with 10 constituency labels and 22 PoS tags for machine learning purposes. The format is plain text with square bracketing. However, a UPenn style version which is readable by the open source query language CorpusSearch is also provided.

For more information, see: http://catalog.elra.info/product_info.php?products_id=831

Back  Top

5-2-2ELRA - Language Resources Catalogue - Special Offer

*****************************************************************
ELRA - Language Resources Catalogue - Special Offer
*****************************************************************

ELRA is happy to announce that 2 Speecon Resources can be obtained at favourable conditions for a purchase before 31 October 2011.

ELRA-S0297 Hungarian Speecon database 
The Hungarian Speecon database comprises the recordings of 555 adult Hungarian speakers and 50 child Hungarian speakers who uttered respectively over 290 items and 210 items (read and spontaneous).
SPECIAL PRICES AVAILABLE.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1094

ELRA-S0298 Czech Speecon database
The Czech Speecon database comprises the recordings of 550 adult Czech speakers and 50 child Czech speakers who uttered respectively over 290 items and 210 items (read and spontaneous).
SPECIAL PRICES AVAILABLE.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1095


For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli@elda.org

Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/LRs-Announcements.html 

Back  Top

5-2-3LDC Newsletter (September 2011)

In this newsletter:

Cataloging the communication of Asian Elephants  -

New publications:

2006 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 1  -

2008 NIST Speaker Recognition Evaluation Training Set Part 2  -


French Gigaword Third Edition  -


Cataloging the communication of Asian Elephants

LDC distributes a broad selection of databases, the majority of which are used for human language research and technology development.  Our corpus catalog also includes the vocalizations of other animal species. We'd like to highlight the intriguing work behind one such animal communication corpus, Asian Elephant Vocalizations LDC2010S05.

Asian Elephant Vocalizations contains audio recordings of vocalizations by Asian Elephants (Elephas maximus) in Uda Walawe National Park, Sri Lanka.  The data was collected by Shermin de Silva as part of her doctoral thesis at the University of Pennsylvania. Recordings were made using a Fostex field recorder with a Sennheiser 'shot-gun' microphone.  In addition, de Silva utilized a second dictation microphone that allows observers to narrate what's happening without talking over the elephant recording.  The digital files were then downloaded and visualized using the Praat TextGrid Editor,  a tool originally developed for studying human speech which has since been adopted by elephant researchers.  With Praat, trained annotators are able to characterize call types and extract particular segments for later analysis. 

Until  recently, the majority of research on the behavior of wild elephants focused on one species - the African savannah elephant.  There has been comparatively less study of communication in Asian elephants, primarily because the habitat in which Asian elephants typically live makes them more difficult to study than African forest elephants. Asian and African elephants diverged from one another approximately six million years ago and  evolved separately in very distinct environments. de Silva's work has shown that Asian elephants have highly dynamic social lives, that are markedly different from that of African elephants.  Asian elephants tend to form smaller, fragmented groups on a day-to-day basis but maintain long-term pools of companions over many years.  Because communication in elephants appears to be largely socially-motivated, differences in social behavior and ecology may also be a source of differences in their vocal behavior and repertoire. 

de Silva and her colleagues study elephant communication as an opportunity to understand the evolution of social behavior and communication in a system that is very different from our own primate experience.  Human language is only one manifestation of communication in the natural world. Perhaps this is why it is fitting to place animal vocalizations side-by-side with human speech in LDC's catalog.   In this way, we can better understand how human language relates to the communicative capabilities of other species.

For further information on Shermin de Silva's current research at the Elephant Forest and Environment Conservation Trust visit:

Web:  http://elephantresearch.net
Blog: http://elephantresearch.net/fieldnotes/



New Publications

(1) 2006 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 1 was developed by researchers at the Department of Computer Science and Engineering, University of South Florida (USF), Tampa, Florida and the Multimodal Information Group at the National Institute of Standards and Technology (NIST). It contains approximately fifteen hours of meeting room video data collected in 2005 and 2006 and annotated for the VACE (Video Analysis and Content Extraction) 2006 face and person tracking tasks.

The VACE program was established to develop novel algorithms for automatic video content extraction, multi-modal fusion, and event understanding. During VACE Phases I and II, the program made significant progress in the automated detection and tracking of moving objects including faces, hands, people, vehicles and text in four primary video domains: broadcast news, meetings, street surveillance, and unmanned aerial vehicle motion imagery. Initial results were also obtained on automatic analysis of human activities and understanding of video sequences.

Three performance evaluations were conducted under the auspices of the VACE program between 2004 and 2007. In 2006, the VACE program and the European Union's Computers in the Human Interaction Loop (CHIL) collaborated to hold the CLassification of Events, Activities and Relationships (CLEAR) Evaluation. This was an international effort to evaluate systems designed to analyze people, their identities, activities, interactions and relationships in human-human interaction scenarios, as well as related scenarios. The VACE program contributed the evaluation infrastructure (e.g., data, scoring, tools) for a specific set of tasks, and the CHIL consortium, coordinated by the Karlsruhe Institute of Technology, contributed a separate set of evaluation infrastructure.

The meeting room data used for the 2006 test set was collected by the following sites in 2005 and 2006: Carnegie Mellon University (USA), University of Edinburgh (Scotland), IDIAP Research Institute (Switzerland), NIST (USA), Netherlands Organization for Applied Scientific Research (Netherlands) and Virginia Polytechnic Institute and State University (USA). Each site had its own independent camera setup, illuminations, viewpoints, people and topics. Most of the datasets included High-Definition (HD) recordings, but those were subsequently formatted to MPEG-2 for the evaluation.

2006 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 1 is distributed on 9 DVD-ROM.

2011 Subscription Members will automatically receive two copies of this corpus. 2011 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for $2500.

*


(2) 2008 NIST Speaker Recognition Evaluation Training Set Part 2 was developed by LDC and NIST (National Institute of Standards and Technology).  It contains 950 hours of multilingual telephone speech and English interview speech along with transcripts and other materials used as training data in the 2008 NIST Speaker Recognition Evaluation (SRE).  SRE is part of an ongoing series of evaluations conducted by NIST. These evaluations are an important contribution to the direction of research efforts and the calibration of technical capabilities. They are intended to be of interest to all researchers working on the general problem of text independent speaker recognition. To this end the evaluation is designed to be simple, to focus on core technology issues, to be fully supported, and to be accessible to those wishing to participate.

The 2008 evaluation was distinguished from prior evaluations, in particular those in 2005 and 2006, by including not only conversational telephone speech data but also conversational speech data of comparable duration recorded over a microphone channel involving an interview scenario.

The speech data in this release was collected in 2007 by LDC at its Human Subjects Data Collection Laboratories in Philadelphia and by the International Computer Science Institute (ICSI) at the University of California, Berkeley. This collection was part of the Mixer 5 project, which was designed to support the development of robust speaker recognition technology by providing carefully collected and audited speech from a large pool of speakers recorded simultaneously across numerous microphones and in different communicative situations and/or in multiple languages. Mixer participants were native English speakers and bilingual English speakers. The telephone speech in this corpus is predominately English; all interview segments are in English. Telephone speech represents approximately 523 hours of the data,  and microphone speech represents the other 427 hours.

The telephone speech segments include summed-channel excerpts in the range of 5 minutes from longer original conversations. The interview material includes single channel conversation interview segments of at least 8 minutes from a longer interview session.  English language transcripts were produced using an automatic speech recognition (ASR) system.

2008 NIST Speaker Recognition Evaluation Training Set Part 2 is distributed on 7 DVD-ROM.

2011 Subscription Members will automatically receive two copies of this corpus. 2011 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for $2000.

*

(3) French Gigaword Third Edition is a comprehensive archive of newswire text data that has been acquired over several years by LDC. This third edition updates French Gigaword Second Edition (LDC2009T28) and adds material collected from January 1, 2009 through December 31, 2010.

The two distinct international sources of French newswire in this edition, and the time spans of collection covered for each, are as follows:

Agence France-Presse (afp_fre) May 1994 - Dec. 2010

Associated Press French Service (apw_fre) Nov. 1994 - Dec. 2010

All text data are presented in SGML form, using a very simple, minimal markup structure; all text consists of printable ASCII, white space, and printable code points in the 'Latin1 Supplement' character table, as defined by the Unicode Standard (ISO 10646) for the 'accented' characters used in French. The Supplement/accented characters are presented in UTF-8 encoding.

The overall totals for each source are summarized below. Note that the 'Totl-MB' numbers show the amount of data when the files are uncompressed (i.e. approximately 15 gigabytes, total); the 'Gzip-MB' column shows totals for compressed file sizes as stored on the DVD-ROM; the 'K-wrds' numbers are simply the number of white space-separated tokens (of all types) after all SGML tags are eliminated.

Source

#Files

Gzip-MB

Totl-MB

K-wrds

#DOCs

afp_fre

195

1503

4255

641381

2356888

apw_fre

194

489

1446

221470

801075

TOTAL

389

1992

5701

862851

3157963

French Gigaword Third Edition is distributed on 1 DVD-ROM.

2011 Subscription Members will automatically receive two copies of this corpus. 2011 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US$4500.

Back  Top

5-2-4Speechocean October 2011 update

SpeechOcean China also has about 200+ large language resources and some of databases can be freely used to our members for academic research purpose.  As a ISCA member, we will be also glad to share these databases to other ISCA members,

www.speechocean.com

Speechocean - Language Resource Catalogue - New Released (2011-10)

Speechocean, as a global provider of language resources and data services, has more than 200 large-scale databases available 80+ languages and accents covering the fields of Text to Speech, Automatic Speech Recognition, Text, Machine Translation, Web Search, Videos, Images etc.

Speechocean is glad to announce that more Speech Resources has been released:

Turkish speech recognition Database (Desktop) --- 201 speakers 

This Turkish desktop speech recognition database was collected by Speechocean’s project team in Turkey. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections in 30 languages presently.
It contains the voices of 201 different native speakers (104 males, 97 females) who were balanced distributed by age, gender and regional accents. The script was specially designed to provide material for both training and testing for many classes of speech recognizers.  Each speaker was recorded in a quiet office environment and 300 phonetically rich sentences were randomly selected from a pool of sentences specially designed. 

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/789.html

 

Turkish speech recognition Database (In-car) --- 316 speakers

This Turkish in-car speech recognition database was collected by Speechocean’s project team in Turkey. This database is one of our databases of Speech Data---Car (SDC) Project, which contains the database collections in more than 30 languages presently.
It contains the voices of 316 different native speakers who were balanced distributed by age (mainly 16-30,31-45,46-60), gender (156males, 160 females) and regional accents.

The script was specially designed to provide material for both training and testing of many classes of speech recognizers, and contain 320 utterances covering 15 categories and 35 sub-categories for each speaker. Each speaker was recorded under two environments in three variations (Parked, City Driving and Highway driving) with kinds of recording conditions such as motor running, fan on/off, window up/down and etc. A total of 320 utterances were recorded for each speaker under two environments (160 utterances and spontaneous sentences per environment).

 

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/793.html

 

France French speech recognition Database (Desktop) --- 200 speakers

This France French desktop speech recognition database was collected by Speechocean’s project team in France. This database is one of our databases of Speech Data ----Desktop Project (SDD)which contains the database collections in 30 languages presently.
It contains the voices of 200 different native speakers (100 males, 100 females) who were balanced distributed by age, gender and regional accents. The script was specially designed to provide material for both training and testing for many classes of speech recognizers.  Each speaker was recorded in a quiet office environment and 500 utterances and spontaneous which includes 13 categories and about 40 sub-categories such as contact names, directory assistant names, application words, album titles, query words, etc. were recorded for each speaker.

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/796.html

 

Spain Spanish speech recognition Database (Desktop) --- 210 speakers

This Spain Spanish desktop speech recognition database was collected by Speechocean’s project team in Spain. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections in 30 languages presently. 
It contains the voices of 210 different native speakers (102males, 108 females) who were balanced distributed by age, gender and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognizers. Each speaker were recorded in a quiet office environment and 500 utterances and spontaneous which includes 13 categories and about 40 sub-categories such as contact names, directory assistant names, application words, album titles, query words, etc. were recorded for each speaker.

 

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/795.html

 

UK English speech recognition Database (Desktop) --- 200 speakers

This UK English desktop speech recognition database was collected by Speechocean’s project team in UK. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections in 30 languages presently. 
It contains the voices of 200 different native speakers (106 males, 94 females) who were balanced distributed by age, gender and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognizers, each speaker were recorded in a quiet office environment and 300 phonetically rich sentences which was randomly selected from a pool of sentences specially designed. 

 

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/792.html

 

Portugal Portuguese speech recognition Database (Desktop) --- 200 speakers

This Portugal Portuguese desktop speech recognition database was collected by Speechocean’s project team in Portugal. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections in 30 languages presently.
It contains the voices of 200 different native speakers (101 males, 99 females) who were balanced distributed by age, gender and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognizers, each speaker were recorded in a quiet office environment and 300 phonetically rich sentences which was randomly selected from a pool of sentences specially designed. 

 

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/791.html

 

Swedish speech recognition Database (Desktop) --- 200 speakers

This Swedish desktop speech recognition database was collected by Speechocean’s project team in Sweden. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections in 30 languages presently.
It contains the voices of 200 different native speakers (118 males, 82 females) who were balanced distributed by age, gender and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognizers, each speaker will be recorded in a quiet office environment and 300 phonetically rich sentences which was randomly selected from a pool of sentences specially designed. 

 

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/790.html

 

Canadian French Desktop speech recognition Corpus (200 speakers) was launched in Canada

Based on our client's urgent demands, the Canadian French desktop speech recognition database (200 speakers) was collected by Speechocean’s project team in Canada. This database belongs to Speechocean's Desktop Speech Data Project.
It contains the voices of 200 different native speakers (100males, 100females) who were balanced distributed by age, gender and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognizers. Each speaker were recorded in a quiet office environment and 500 utterances and spontaneous were recorded for each speaker.

 

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

 For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/733.html

 

Chinese Mandarin In-car Speech Recognition Database was Successful Released!

Chinese Mandarin In-car Speech Recognition Database was successfully released with the catalogue serial number of King-ASR-122 in our Catalogue. This database was made for the tuning and testing purpose of speech recognition system for car-using. It belongs to SPC’s Multi-language In-car Speech Data Project
The Database which were collected in China Mainland, contains the voices of 100 different native speakers (50males, 50females) who were balanced according by age(mainly 18 – 30(62),31 – 45(28),46 – 60(10)), gender (Male50%, Female50%) and regional accents (Northern60%, Wu10%, Xiang5%, Gan5%, Kejia 5%, Min5%, Cantonese10%).

The script was specially designed to provide material for both training and testing of many classes of speech recognizers which contain 320 utterances covering 15 categories and 35 sub-categories for each speaker.
Each speaker was recorded under two environments in three variations (Parked, City Driving and Highway driving) with various kinds of recording conditions such as motor running, fan on/off, window up/down, etc. In total, 320 utterances were recorded for each speaker under two environments (160 utterances and spontaneous sentences per environment) 

 

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/781.html

 

The American Spanish Mobile speech Recognition database was Successful Released!

 

The American Spanish Mobile speech Recognition database was successfully released with the catalogue serial number of King-ASR-119. This database was made for the tuning and testing purpose of speech recognition system for IVR / mobile. It belongs to SPC’s Multi-language Mobile Speech Data Project
The database which was collected in America, contains the voices of 40 different native speakers (21 males, 19 females) who were balanced according to age (mainly 16-30,31-45,46-60), gender and regional accents.

 

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/779.html

 

 

Visit our on-line Catalogue: http://www.speechocean.com/en-Product-Catalogue/Index.html

For more information about our Database and Services please visit our website www.Speechocen.com

If you have any inquiry regarding our databases and service please feel free to contact us:

XiangFeng Cheng mailto:Chengxianfeng@speechocean.com

Marta Gherardi mailto:Marta@speechocean.com

Back  Top

5-3 Software
5-3-1Matlab toolbox for glottal analysis

I am pleased to announce you that we made a Matlab toolbox for glottal analysis now available on the web at:

 

http://tcts.fpms.ac.be/~drugman/Toolbox/

 

This toolbox includes the following modules:

 

- Pitch and voiced-unvoiced decision estimation

- Speech polarity detection

- Glottal Closure Instant determination

- Glottal flow estimation

 

By the way, I am also glad to send you my PhD thesis entitled “Glottal Analysis and its Applications”:

http://tcts.fpms.ac.be/~drugman/files/DrugmanPhDThesis.pdf

 

where you will find applications in speech synthesis, speaker recognition, voice pathology detection, and expressive speech analysis.

 

Hoping that this might be useful to you, and to see you soon,

 

Thomas Drugman

Back  Top



 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA