ISCA - International Speech
Communication Association


ISCApad Archive  »  2012  »  ISCApad #173  »  Resources

ISCApad #173

Sunday, November 11, 2012 by Chris Wellekens

5 Resources
5-1 Books
5-1-1Dorothea Kolossa and Reinhold Haeb-Umbach: Robust Speech Recognition of Uncertain or Missing Data
Title: Robust Speech Recognition of Uncertain or Missing Data
Editors: Dorothea Kolossa and Reinhold Haeb-Umbach
Publisher: Springer
Year: 2011
ISBN 978-3-642-21316-8
Link:
http://www.springer.com/engineering/signals/book/978-3-642-21316-8?detailsPage=authorsAndEditors

Automatic speech recognition suffers from a lack of robustness with
respect to noise, reverberation and interfering speech. The growing
field of speech recognition in the presence of missing or uncertain
input data seeks to ameliorate those problems by using not only a
preprocessed speech signal but also an estimate of its reliability to
selectively focus on those segments and features that are most reliable
for recognition. This book presents the state of the art in recognition
in the presence of uncertainty, offering examples that utilize
uncertainty information for noise robustness, reverberation robustness,
simultaneous recognition of multiple speech signals, and audiovisual
speech recognition.

The book is appropriate for scientists and researchers in the field of
speech recognition who will find an overview of the state of the art in
robust speech recognition, professionals working in speech recognition
who will find strategies for improving recognition results in various
conditions of mismatch, and lecturers of advanced courses on speech
processing or speech recognition who will find a reference and a
comprehensive introduction to the field. The book assumes an
understanding of the fundamentals of speech recognition using Hidden
Markov Models.
Top

5-1-2Mohamed Embarki et Christelle Dodane: La coarticulation

LA COARTICULATION

 

Mohamed Embarki et Christelle Dodane

Des indices à la représentation

La parole est faite de gestes articulatoires complexes qui se chevauchent dans l’espace et dans le temps. Ces chevauchements, conceptualisés par le terme coarticulation, n’épargnent aucun articulateur. Ils sont repérables dans les mouvements de la mâchoire, des lèvres, de la langue, du voile du palais et des cordesvocales. La coarticulation est aussi attendue par l’auditeur, les segments coarticulés sont mieux perçus. Elle intervient dans les processus cognitifs et linguistiques d’encodage et de décodage de la parole. Bien plus qu’un simple processus, la coarticulation est un domaine de recherche structuré avec des concepts et des modèles propres. Cet ouvrage collectif réunit des contributions inédites de chercheurs internationaux abordant lacoarticulation des points de vue moteur, acoustique, perceptif et linguistique. C’est le premier ouvrage publié en langue française sur cette question et le premier à l’explorer dans différentes langues.

 

 

Collection : Langue & Parole, L'Harmattan

ISBN : 978-2-296-55503-7 • 25 € • 260 pages

 

 

Mohamed Embarki

est maître de conférences-HDR en phonétique à l’université de Franche-Comté (Besançon) et membre du Laseldi (E.A. 2281). Ses travaux portent sur les aspects (co)articulatoires et acoustiques des parlers arabes modernes ainsi que sur leurs motivations sociophonétiques.

Christelle Dodane

est maître de conférences en phonétique à l’université Paul-Valéry (Montpellier 3) et elle est affiliée au laboratoire DIPRALANG (E.A. 739). Ses recherches portent sur la communication langagière chez le jeune enfant (12-36 mois) et notamment sur le rôle de la prosodie dans le passage du niveau pré-linguistique au niveau linguistique, dans la construction de la première syntaxe et dans le langage adressé à l’enfant.

Top

5-1-3Ben Gold, Nelson Morgan, Dan Ellis :Speech and Audio Signal Processing: Processing and Perception of Speech and Music [Digital]

Speech and Audio Signal Processing: Processing and Perception of Speech and Music [2nd edition]  Ben GoldNelson Morgan, Dan Ellis

Digital copy:  http://www.amazon.com/Speech-Audio-Signal-Processing-Perception/dp/product-description/1118142888

Hardcopy available: http://www.amazon.com/Speech-Audio-Signal-Processing-Perception/dp/0470195363/ref=sr_1_1?s=books&ie=UTF8&qid=1319142964&sr=1-1

Top

5-1-4Video Proceedings ERMITES 2011
Actes vidéo des journées ERMITES 2011 'Décomposition Parcimonieuse, Contraction et Structuration pour l'Analyse de Scènes', sont en ligne sur :   http://glotin.univ-tln.fr/ERMITES11

On y retrouve (en .mpg) la vingtaine d'heure des conférences de :

Y. Bengio, Montréal
    «Apprentissage Non-Supervisé de Représentations Profondes »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Y_Bengio_1sur4.mp4 ...

S. Mallat, Paris
    « Scattering & Matching Pursuit for Acoustic Sources Separation »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Mallat_1sur3.mp4 ...

J.-P. Haton, Nancy
    « Analyse de Scène et Reconnaissance Stochastique de la Parole »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_JP_Haton_1sur4.mp4 ...

M. Kowalski, Paris
    « Sparsity and structure for audio signal: a *-lasso therapy »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Kowalski_1sur5.mp4 ...

O. Adam, Paris
    « Estimation de Densité de Population de Baleines par Analyse de
leurs Chants »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Adam.mp4

X. Halkias, New-York
    « Detection and Tracking of Dolphin Vocalizations »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Halkias.mp4

J. Razik, Toulon
    « Sparse coding : from speech to whales »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Razik.mp4

H. Glotin, Toulon
   « Suivi & reconstruction du comportement de cétacés par acoustique passive »

ps : ERMITES 2012 portera sur la vision (Y. Lecun, Y. Thorpe, P.
Courrieu, M Perreira, M. Van Gerven,...)
Top

5-1-5Zeki Majeed Hassan and Barry Heselwood (Eds): Instrumental Studies in Arabic Phonetics

Instrumental Studies in Arabic Phonetics
Edited by Zeki Majeed Hassan and Barry Heselwood
University of Gothenburg / University of Leeds
[Current Issues in Linguistic Theory, 319] 2011. xii, 365 pp.
Publishing status: Available
Hardbound – Available
ISBN 978 90 272 4837 4 | EUR 110.00 | USD 165.00
e-Book – Forthcoming Ordering information
ISBN 978 90 272 8322 1 | EUR 110.00 | USD 165.00
Brought together in this volume are fourteen studies using a range of modern instrumental methods – acoustic and articulatory – to investigate the phonetics of several North African and Middle Eastern varieties of Arabic. Topics covered include syllable structure, quantity, assimilation, guttural and emphatic consonants and their pharyngeal and laryngeal mechanisms, intonation, and language acquisition. In addition to presenting new data and new descriptions and interpretations, a key aim of the volume is to demonstrate the depth of objective analysis that instrumental methods can enable researchers to achieve. A special feature of many chapters is the use of more than one type of instrumentation to give different perspectives on phonetic properties of Arabic speech which have fascinated scholars since medieval times. The volume will be of interest to phoneticians, phonologists and Arabic dialectologists, and provides a link between traditional qualitative accounts of spoken Arabic and modern quantitative methods of instrumental phonetic analysis.

Acknowledgements  vii – viii
List of contributors  ix – x
Transliteration and transcription symbols for Arabic  xi – xii
Introduction
Barry Heselwood and Zeki Majeed Hassan 1 – 26
Part I. Issues in syntagmatic structure
Preliminary study of Moroccan Arabic word-initial consonant clusters and syllabification using electromagnetic articulography
Adamantios I. Gafos, Philip Hoole and Chakir Zeroual 27 – 46
An acoustic phonetic study of quantity and quantity complementarity in Swedish and Iraqi Arabic
Zeki Majeed Hassan 47 – 62
Assimilation of /l/ to /r/ in Syrian Arabic: An electropalatographic and acoustic study
Barry Heselwood, Sara Howard and Rawya Ranjous 63 – 98
Part II. Guttural consonants
A study of the laryngeal and pharyngeal consonants in Jordanian Arabic using nasoendoscopy, videofluoroscopy and spectrography
Barry Heselwood and Feda Al-Tamimi 99
A phonetic study of guttural laryngeals in Palestinian Arabic using laryngoscopic and acoustic analysis
Kimary N. Shahin 129 – 140
Airflow and acoustic modelling of pharyngeal and uvular consonants in Moroccan Arabic
Mohamed Yeou and Shinji Maeda 141 – 162
Part III. Emphasis and coronal consonants
Nasoendoscopic, videofluoroscopic and acoustic study of plain and emphatic coronals in Jordanian Arabic
Feda Al-Tamimi and Barry Heselwood 163 – 192
Acoustic and electromagnetic articulographic study of pharyngealisation: Coarticulatory effects as an index of stylistic and regional variation in Arabic
Mohamed Embarki, Slim Ouni, Mohamed Yeou, M. Christian Guilleminot and Sallal Al-Maqtari 193 – 216
Investigating the emphatic feature in Iraqi Arabic: Acoustic and articulatory evidence of coarticulation
Zeki Majeed Hassan and John H. Esling 217 – 234
Glottalisation and neutralisation in Yemeni Arabic and Mehri: An acoustic study
Janet C.E. Watson and Alex Bellem 235 – 256
The phonetics of localising uvularisation in Ammani-Jordanian Arabic: An acoustic study
Bushra Adnan Zawaydeh and Kenneth de Jong 257 – 276
EMA, endoscopic, ultrasound and acoustic study of two secondary articulations in Moroccan Arabic: Labial-velarisation vs. emphasis
Chakir Zeroual, John H. Esling and Philip Hoole 277 – 298
Part IV. Intonation and acquisition
Acoustic cues to focus and givenness in Egyptian Arabic
Sam Hellmuth 299 – 324
Acquisition of Lebanese Arabic and Yorkshire English /l/ by bilingual and monolingual children: A comparative spectrographic study
Ghada Khattab 325 – 354
Appendix: Phonetic instrumentation used in the studies  355 – 358

Top

5-1-6G. Bailly, P. Perrier & E. Vatikiotis-Batesonn eds : Audiovisual Speech Processing

'Audiovisual
Speech Processing' édité par G. Bailly, P. Perrier & E. Vatikiotis-Batesonn chez
Cambridge University Press ?

'When we speak, we configure the vocal tract which shapes the visible motions of the face
and the patterning of the audible speech acoustics. Similarly, we use these visible and
audible behaviors to perceive speech. This book showcases a broad range of research
investigating how these two types of signals are used in spoken communication, how they
interact, and how they can be used to enhance the realistic synthesis and recognition of
audible and visible speech. The volume begins by addressing two important questions about
human audiovisual performance: how auditory and visual signals combine to access the
mental lexicon and where in the brain this and related processes take place. It then
turns to the production and perception of multimodal speech and how structures are
coordinated within and across the two modalities. Finally, the book presents overviews
and recent developments in machine-based speech recognition and synthesis of AV speech. '


Top

5-1-7Fuchs, Susanne / Weirich, Melanie / Pape, Daniel / Perrier, Pascal (eds.): Speech Planning and Dynamics, Publisher P.Lang

Fuchs, Susanne / Weirich, Melanie / Pape, Daniel / Perrier, Pascal (eds.)

Speech Planning and Dynamics

Frankfurt am Main, Berlin, Bern, Bruxelles, New York, Oxford, Wien, 2012. 277 pp., 50 fig., 8 tables

Speech Production and Perception. Vol. 1

Edited by Susanne Fuchs and Pascal Perrier

Imprimé :

ISBN 978-3-631-61479-2 hb.

SFR 60.00 / €* 52.95 / €** 54.50 / € 49.50 / £ 39.60 / US$ 64.95

eBook :

ISBN 978-3-653-01438-9

SFR 63.20 / €* 58.91 / €** 59.40 / € 49.50 / £ 39.60 / US$ 64.95

Commander en ligne : www.peterlang.com

Top

5-1-8Video archive of Odyssey Speaker and Language Recognition Workshop, Singapore 2012
Odyssey Speaker and Language Recognition Workshop 2012, the workshop of ISCA SIG Speaker and Language Characterization, was held in Singapore on 25-28 June 2012. Odyssey 2012 is glad to announce that its video recordings have been included in the ISCA Video Archive. http://www.isca-speech.org/iscaweb/index.php/archive/video-archive
Top

5-2 Database
5-2-1ELRA - Language Resources Catalogue - Update (2012-07)

*****************************************************************
ELRA - Language Resources Catalogue - Update
*****************************************************************
ELRA is happy to announce that 2 new Speech     Telephone Resources are now available in its catalogue.
    Moreover, an updated version of the Bilingual Collocational     Dictionary (Horst Bogatz) has also been released.     
   
    1) New Language Resources:
     
      ELRA-S0343 VERIF1DE
   
The speech corpus VERIF1DE contains 20 recordings (sessions) of     150 German speakers each over the telephone network (10 sessions     over fixed network and 10 sessions over GSM). Each session contains  40 single recordings, mainly speech read from a prompt sheet.
  
For more information, see: http://catalog.elra.info/product_info.php?products_id=1169
   
    ELRA-S0344 LILA Hindi Belt database
   
The LILA Hindi Belt database comprises 2,023 Hindi speakers     (1,011 males and 1,012 females, all speakers with Hindi as first     language) recorded over the Indian mobile telephone network. Each  speaker uttered 83 read and spontaneous items.
   
For more information, see: http://catalog.elra.info/product_info.php?products_id=1170
   
    2) Updated Language Resource:
     
    ELRA-M0013 Bilingual Collocational Dictionary (Horst Bogatz)
   
This new release contains  69,000  English headwords (instead       of 40,000 for the previous release).
    The bilingual English-German collocational dictionary consists of     around 69,000 English headwords, including concepts expressed with     more than one word (e.g. 'the awareness of the environment' or 'lame     duck') and hyphenated compounds. It contains verbs, adjectives,     synonyms and phrases that collocate with the headword. It provides     the German equivalents for the headwords as well as their English     synonyms.
    For more information, see: http://catalog.elra.info/product_info.php?products_id=451
    
    For more information on the catalogue, please contact Valérie  Mapelli mailto:mapelli@elda.org
   
    Visit our On-line Catalogue: http://catalog.elra.info
    Visit the Universal Catalogue: http://universal.elra.info
    Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/LRs-Announcements.html

Top

5-2-2LDC Newsletter (October 2012)

In this newsletter:

   

-  Fall 2012 LDC Data Scholarship Recipients  -

   

-  LDC Exhibiting at NWAV 41  - 

   

-  LDC 20th Anniversary Workshop Wrap-up  -

   

-  LDC 20th Anniversary Podcasts              -

   

-  Language Resource Wiki  -

   

New publications:

   

-  GALE Chinese-English Word Alignment and  Tagging Training Part 2 -- Newswire  -

   

-  GALE Phase 2 Arabic Broadcast News Parallel Text  -

       

 

   


   

Fall 2012 LDC Data Scholarship Recipients

   

LDC is pleased to announce the student recipients  of the Fall 2012 LDC Data Scholarship program!  This program provides university and college students with access to LDC data  at no-cost. Students were asked to complete an application which         consisted of a proposal describing their intended use of the data, as well as a letter of support from their thesis adviser.  We received many solid applications and have chosen six  proposals to support.   The following students will receive no-cost copies of LDC data:

   

     

Jaffar Atwan - National University of Malaysia (Malaysia), Phd  candidate, Information Science and           Technology.  Jaffar has been awarded a copy of Arabic Newswire
          Part 1 (LDC2001T55) for his work in information retrieval.
         
          Sarath Chandar - Indian Institute of Technology, Madras (India), MS candidate, Computer Science and Engineering.  Sarath has been awarded a copy of Treebank-3 (LDC99T42) forhis work in grammar induction.
         
          Kuruvachan K. George - Amrita Vishwa Vidyapeetham (India), Phd Candidate, Electrical and Computer Engineering.  Kuruvachan   has been awarded a copy of Fisher English Part 2 (LDC2005S13/T19) and2008 NIST Speaker Recognition Evaluation data (LDC2011S05/07/08/11) for his work in speaker recognition.

     

Eduardo Motta - Pontifícia Universidade Católica do Rio de Janeiro (Brazil), Phd candidate, Information           Sciences.  Eduardo has been awarded a copy of English Web Treebank (LDC2012T13) for his work in machine learning.

     

Genevieve Sapijaszko - University of Central  Florida (USA), Phd Candidate, Electrical and Computer           Engineering.  Genevieve  been awarded a copy TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) and YOHO Speaker Verification (LDC94S16) for her work in digital signal processing.
         
John Steinberg - Temple University (USA), MS candidate, Electrical and Computer Engineering.  John has been awarded a copy of CALLHOME Mandarin Chinese Lexicon (LDC96L15) and CALLHOME Mandarin Chinese Transcripts (LDC96T16) for his  work in speech recognition.

   

   

LDC Exhibiting at NWAV 41

   

   

The conference runs from October 25-28 and the exhibition hall will be open from October 26-28, 2012. Please stop by to say hello!
     

   

     

   

LDC 20th Anniversary Workshop Wrap-up

   

In early September, LDC hosted a workshop entitled  “The Future of Language Resources” in celebration of  our 20th anniversary. Visit  the Program   page to browse speaker abstracts and to access pdfs of the   presentations. Thanks to the speakers and attendees for making the workshop a success!

   

     

   

LDC 20th Anniversary Podcasts

   

To further celebrate our 20th Anniversary, LDC is  conducting  interviews of long-time staff members for their unique perspectives on the  Consortium’s growth and evolution over the past two decades. The first interview podcast debuts this month and features Dave Graff, LDC’s Lead Programmer. Visit the LDC blog to access the podcast.

   

Other podcasts will  be published via the LDC blog, so stay tuned to that space.
       

   

Language Resource Wiki

   

The Language Resource Wiki  catalogs data, software, descriptive grammars and other resources for a variety of languages but especially those with a paucity of generally available resources for research. LDC is actively seeking editors knowledgeable in these and other languages to develop and maintain the pages, which are readable by anyone but  writable only by editors. The wiki currently has resource listings  for: Bengali, Berber, Breton, Ewe, Greek (Ancient), Indonesian, Hindi, Latin, Panjabi, Pashto, Sorani (Central Kurdish), Russian, Tagalog, Tamil, and Urdu, and for the following Sign Languages: American, British, Catalan, Dutch, Flemish, German, Japanese, New Zealand, Polish, Spanish, and Swiss German.
       
       

   

New
            publications

       

   

(1) GALE Chinese-English Word Alignment and Tagging Training Part 2 -- Newswire was  developed by LDC and contains 169,080 tokens of word aligned  Chinese and English parallel text enriched with linguistic tags. This material was used as training data in the DARPA  GALE (Global Autonomous Language Exploitation) program.

   

Some approaches to statistical machine translation include the incorporation of linguistic knowledge in word  aligned text as a means to improve automatic word alignment and  machine translation quality. This is accomplished with two annotation schemes: alignment and tagging. Alignment identifies minimum translation units and translation relations by using  minimum-match and attachment annotation approaches. A set of  word tags and alignment link tags are designed in the tagging  scheme to describe these translation units and relations. Tagging adds contextual, syntactic and language-specific features to the alignment annotation.

   

     

The Chinese word alignment tasks consisted of the following components:

     

Identifying, aligning, and tagging 8 different  types of links

     

Identifying, attaching, and tagging local-level  unmatched words

     

Identifying and tagging sentence/discourse-level unmatched words

     

Identifying and tagging all instances of Chinese  的(DE) except when they were a part of a semantic link.

   

   

GALE Chinese-English Word Alignment and Tagging Training Part 2 -- Newswire is distributed via web download.

   

2012 Subscription Members will automatically receive two copies of this data on disc.  2012 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US$1750.

   

*

   

(2) GALE Phase 2 Arabic Broadcast News Parallel Text was developed by LDC, and along with other corpora, the parallel text in this  release comprised training data for Phase 2 of the DARPA GALE (Global Autonomous Language Exploitation) Program. This corpus contains Modern Standard Arabic source text and corresponding  English translations selected from broadcast news (BN) data collected by LDC between 2005 and 2007 and transcribed by LDC or under its direction.

   

GALE Phase 2 Arabic Broadcast News Parallel Text includes seven source-translation pairs, comprising 29,210 words of Arabic source text and its English translation. Data is drawn from six distinct Arabic programs broadcast between 2005 and  2007 from Abu Dhabi TV, based in Abu Dhabi, United Arab Emirates; Al Alam News Channel, based in Iran; Aljazeera, a regional broadcast programmer based in Doha, Qatar; Dubai TV,  based in Dubai, United Arab Emirates; and Kuwait TV, a national television station based in Kuwait. The BN programming in this release focuses on current events topics.

   

The files in this release were transcribed by LDC  staff and/or transcription vendors under contract to LDC in accordance with the Quick Rich Transcription guidelines developed by LDC. Transcribers indicated sentence boundaries in addition to transcribing the    text. Data was manually selected for translation according to several criteria, including linguistic features, transcription         features and topic features. The transcribed and segmented files werethen reformatted into a human-readable translation format and assigned to translation vendors. Translators followed LDC's Arabic to English translation guidelines. Bilingual LDC staff performed quality control procedures on the completed  translations.

   

GALE Phase 2 Arabic Broadcast News Parallel Text is distributed via web download.
       
        2012 Subscription Members will automatically receive two copies of this data on disc.  2012 Standard Members may request a copy  as part of their 16 free membership corpora.  Non-members may  license this data for US$1750.

   

 

Top

5-2-3Speechocean January 2012 update

Speechocean - Language Resource Catalogue - New Released (01- 2012)

Speechocean, as a global provider of language resources and data services, has more than 200 large-scale databases available in 80+ languages and accents covering the fields of Text to Speech, Automatic Speech Recognition, Text, Machine Translation, Web Search, Videos, Images etc.

 

Speechocean is glad to announce that more Speech Resources has been released:

 

Chinese and English Mixing Speech Synthesis Database (Female)

The Chinese Mandarin TTS Speech Corpus contains the read speech of a native Chinese Female professional broadcaster recorded in a studio with high SNR (>35dB) over two channels (AKG C4000B microphone and Electroglottography (EGG) sensor). 
The Corpus includes the following categories:
1.    Basic Mandarin sub-corpus: including 5,000 utterances which were carefully designed considering all kinds of linguistic phenomena. All sentences were declarative and extracted from News channels of People's Daily, China Daily, etc. The prompts with negative words were carefully excluded. ONLY suitable length sentences were accepted (7~20 words, in average 14 words). This sub-corpus can be used for R&D of HMM-based TTS, Limit domain TTS and Small-scale concatenative TTS;
2.    Complementary Mandarin sub-corpus: including 10,000 utterances which were carefully designed considering all kinds of linguistic phenomena. All sentences were declarative and extracted from News channels of People's Daily, China Daily, etc. The prompts with negative words are carefully excluded. ONLY suitable length sentences were accepted (7~20 words, average 14 words). This sub-corpus is a complementary corpus for Basic Mandarin sub-corpus and can be used for R&D of Large-scale concatenative TTS;
3.    Mandarin Neutral sub-corpus: including 380 Chinese bi-syllable words which embedded in carrier sentences;
4.    Mandarin ERHUA sub-corpus: including 290 Chinese Erhua syllables which embedded in carrier sentences;
5.    Mandarin Digit-String sub-corpus: including 1250 utterances with 3-digit length which considered the different pronunciation of 1, i.e. “yi1” and “yao1”.
6.    Mandarin Question sub-corpus: including 300 question sentences with common used question mark, for example “吗”, “么”, “呢”, and etc.;
7.    Mandarin exclamatory sub-corpus: including 200 exclamatory sentences with common used exclamatory mark, for example “呀”, “啊”, “吧”, “啦”, and etc.;
8.    Chinese English sentence sub-corpus: including 1,000 sentences which were carefully designed considering bi-phone coverage. All sentences were extracted from News channels of Voice of America (VOA), and etc. The prompts with negative words are carefully excluded. ONLY suitable length sentences were accepted (7~20 words, in average 12 words) and phonetically annotated with SAMPA. This sub-corpus can be used for R&D of HMM-based TTS, Limit domain TTS and Small-scale concatenative TTS;
9.    Chinese English words sub-corpus: including about 6,000 commonly used English words which embedded in carrier sentence;
10.    Chinese English Abbreviation sub-corpus: including about 200 utterances which considered not only the alphabet coverage, but also the combination of character and digit, such as “MP4”;
11.    Chinese English Letter sub-corpus: including 26 carrier utterances with each letter embedded in the Beginning, Middle and End;
12.    Chinese Greek Letter sub-corpus: including 24 carrier utterances with each letter embedded in the Beginning, Middle and End.

All speech data are segmented and labeled on phone level. Pronunciation lexicon and pitch extract from EEG can also be provided based on demands.

 

France French Speech Recognition Corpus (desktop) – 50 speakers

This France French desktop speech recognition database was collected by SpeechOcean in France. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections for 30 languages presently. 

It contains the voices of 50 different native speakers who were balanced distributed by age (mainly 16 – 30, 31 – 45, 46 – 60), gender (28 males, 22 females) and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognition applications. Each speaker recorded 500 utterances in a quiet office environment through two professional microphones. Each utterance is stored as 44.1K 16Bit uncompressed PCM format and accompanied by an ASCII SAM label file which contains the relevant descriptive information.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

 

UK English Speech Recognition Corpus (desktop) – 50 speakers

This UK English desktop speech recognition database was collected by SpeechOcean in England. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections for 30 languages presently. 

It contains the voices of 50 different native speakers who were balanced distributed by age (mainly 16 – 30, 31 – 45, 46 – 60), gender (28 males, 22 females) and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognition applications. Each speaker recorded 500 utterances in a quiet office environment through two professional microphones. Each utterance is stored as 44.1K 16Bit uncompressed PCM format and accompanied by an ASCII SAM label file which contains the relevant descriptive information.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

 

US English Speech Recognition Corpus (desktop) – 50 speakers

This US English desktop speech recognition database was collected by SpeechOcean in America. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections for 30 languages presently. 

It contains the voices of 50 different native speakers who were balanced distributed by age (mainly 16 – 30, 31 – 45, 46 – 60), gender (25 males, 25 females) and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognition applications. Each speaker recorded 500 utterances in a quiet office environment through two professional microphones. Each utterance is stored as 44.1K 16Bit uncompressed PCM format and accompanied by an ASCII SAM label file which contains the relevant descriptive information.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

 

Italian Speech Recognition Corpus (desktop) – 50 speakers

This Italian desktop speech recognition database was collected by SpeechOcean in Italy. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections for 30 languages presently. 

It contains the voices of 50 different native speakers who were balanced distributed by age (mainly 16 – 30, 31 – 45, 46 – 60), gender (23 males, 27 females) and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognition applications. Each speaker recorded 500 utterances in a quiet office environment through two professional microphones. Each utterance is stored as 44.1K 16Bit uncompressed PCM format and accompanied by an ASCII SAM label file which contains the relevant descriptive information.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

 

For more information about our Database and Services please visit our website www.Speechocen.com or visit our on-line Catalogue at http://www.speechocean.com/en-Product-Catalogue/Index.html

If you have any inquiry regarding our databases and service please feel free to contact us:

Xianfeng Cheng mailto: Chengxianfeng@speechocean.com

Marta Gherardi mailto: Marta@speechocean.com

 

 

Top

5-2-4Appen ButlerHill

 

Appen ButlerHill 

A global leader in linguistic technology solutions

RECENT CATALOG ADDITIONS—MARCH 2012

1. Speech Databases

1.1 Telephony

1.1 Telephony

Language

Database Type

Catalogue Code

Speakers

Status

Bahasa Indonesia

Conversational

BAH_ASR001

1,002

Available

Bengali

Conversational

BEN_ASR001

1,000

Available

Bulgarian

Conversational

BUL_ASR001

217

Available shortly

Croatian

Conversational

CRO_ASR001

200

Available shortly

Dari

Conversational

DAR_ASR001

500

Available

Dutch

Conversational

NLD_ASR001

200

Available

Eastern Algerian Arabic

Conversational

EAR_ASR001

496

Available

English (UK)

Conversational

UKE_ASR001

1,150

Available

Farsi/Persian

Scripted

FAR_ASR001

789

Available

Farsi/Persian

Conversational

FAR_ASR002

1,000

Available

French (EU)

Conversational

FRF_ASR001

563

Available

French (EU)

Voicemail

FRF_ASR002

550

Available

German

Voicemail

DEU_ASR002

890

Available

Hebrew

Conversational

HEB_ASR001

200

Available shortly

Italian

Conversational

ITA_ASR003

200

Available shortly

Italian

Voicemail

ITA_ASR004

550

Available

Kannada

Conversational

KAN_ASR001

1,000

In development

Pashto

Conversational

PAS_ASR001

967

Available

Portuguese (EU)

Conversational

PTP_ASR001

200

Available shortly

Romanian

Conversational

ROM_ASR001

200

Available shortly

Russian

Conversational

RUS_ASR001

200

Available

Somali

Conversational

SOM_ASR001

1,000

Available

Spanish (EU)

Voicemail

ESO_ASR002

500

Available

Turkish

Conversational

TUR_ASR001

200

Available

Urdu

Conversational

URD_ASR001

1,000

Available

1.2 Wideband

Language

Database Type

Catalogue Code

Speakers

Status

English (US)

Studio

USE_ASR001

200

Available

French (Canadian)

Home/ Office

FRC_ASR002

120

Available

German

Studio

DEU_ASR001

127

Available

Thai

Home/Office

THA_ASR001

100

Available

Korean

Home/Office

KOR_ASR001

100

Available

2. Pronunciation Lexica

Appen Butler Hill has considerable experience in providing a variety of lexicon types. These include:

Pronunciation Lexica providing phonemic representation, syllabification, and stress (primary and secondary as appropriate)

Part-of-speech tagged Lexica providing grammatical and semantic labels

Other reference text based materials including spelling/mis-spelling lists, spell-check dictionar-ies, mappings of colloquial language to standard forms, orthographic normalization lists.

Over a period of 15 years, Appen Butler Hill has generated a significant volume of licensable material for a wide range of languages. For holdings information in a given language or to discuss any customized development efforts, please contact: sales@appenbutlerhill.com

3. Named Entity Corpora

Language

Catalogue Code

Words

Description

Arabic

ARB_NER001

500,000

These NER Corpora contain text material from a vari-ety of sources and are tagged for the following Named Entities: Person, Organization, Location, Na-tionality, Religion, Facility, Geo-Political Entity, Titles, Quantities

English

ENI_NER001

500,000

Farsi/Persian

FAR_NER001

500,000

Korean

KOR_NER001

500,000

Japanese

JPY_NER001

500,000

Russian

RUS_NER001

500,000

Mandarin

MAN_NER001

500,000

Urdu

URD_NER001

500,000

3. Named Entity Corpora

Language

Catalogue Code

Words

Description

Arabic

ARB_NER001

500,000

These NER Corpora contain text material from a vari-ety of sources and are tagged for the following Named Entities: Person, Organization, Location, Na-tionality, Religion, Facility, Geo-Political Entity, Titles, Quantities

English

ENI_NER001

500,000

Farsi/Persian

FAR_NER001

500,000

Korean

KOR_NER001

500,000

Japanese

JPY_NER001

500,000

Russian

RUS_NER001

500,000

Mandarin

MAN_NER001

500,000

Urdu

URD_NER001

500,000

4. Other Language Resources

Morphological Analyzers – Farsi/Persian & Urdu

Arabic Thesaurus

Language Analysis Documentation – multiple languages

 

For additional information on these resources, please contact: sales@appenbutlerhill.com

5. Customized Requests and Package Configurations

Appen Butler Hill is committed to providing a low risk, high quality, reliable solution and has worked in 130+ languages to-date supporting both large global corporations and Government organizations.

We would be glad to discuss to any customized requests or package configurations and prepare a cus-tomized proposal to meet your needs.

6. Contact Information

Prithivi Pradeep

Business Development Manager

ppradeep@appenbutlerhill.com

+61 2 9468 6370

Tom Dibert

Vice President, Business Development, North America

tdibert@appenbutlerhill.com

+1-315-339-6165

                                                         www.appenbutlerhill.com

Top

5-3 Software
5-3-1Matlab toolbox for glottal analysis

I am pleased to announce you that we made a Matlab toolbox for glottal analysis now available on the web at:

 

http://tcts.fpms.ac.be/~drugman/Toolbox/

 

This toolbox includes the following modules:

 

- Pitch and voiced-unvoiced decision estimation

- Speech polarity detection

- Glottal Closure Instant determination

- Glottal flow estimation

 

By the way, I am also glad to send you my PhD thesis entitled “Glottal Analysis and its Applications”:

http://tcts.fpms.ac.be/~drugman/files/DrugmanPhDThesis.pdf

 

where you will find applications in speech synthesis, speaker recognition, voice pathology detection, and expressive speech analysis.

 

Hoping that this might be useful to you, and to see you soon,

 

Thomas Drugman

Top

5-3-2ROCme!: a free tool for audio corpora recording and management

ROCme!: nouveau logiciel gratuit pour l'enregistrement et la gestion de corpus audio.

Le logiciel ROCme! permet une gestion rationalisée, autonome et dématérialisée de l’enregistrement de corpus lus.

Caractéristiques clés :
- gratuit
- compatible Windows et Mac
- interface paramétrable pour le recueil de métadonnées sur les locuteurs
- le locuteur fait défiler les phrases à l'écran et les enregistre de façon autonome
- format audio paramétrable

Téléchargeable à cette adresse :
www.ddl.ish-lyon.cnrs.fr/rocme

 
Top

5-3-3VocalTractLab 2.0 : A tool for articulatory speech synthesis

VocalTractLab 2.0 : A tool for articulatory speech synthesis

It is my pleasure to announce the release of the new major version 2.0 of VocalTractLab. VocalTractLab is an articulatory speech synthesizer and a tool to visualize and explore the mechanism of speech production with regard to articulation, acoustics, and control. It is available from http://www.vocaltractlab.de/index.php?page=vocaltractlab-download .
Compared to version 1.0, the new version brings many improvements in terms of the implemented models of the vocal tract, the vocal folds, the acoustic simulation, and articulatory control, as well as in terms of the user interface. Most importantly, the new version comes together with a manual.

If you like, give it a try. Reports on bugs and any other feedback are welcome.

Peter Birkholz

Top



 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA