ISCA - International Speech
Communication Association


ISCApad Archive  »  2012  »  ISCApad #174  »  Resources

ISCApad #174

Sunday, December 09, 2012 by Chris Wellekens

5 Resources
5-1 Books
5-1-1Ben Gold, Nelson Morgan, Dan Ellis :Speech and Audio Signal Processing: Processing and Perception of Speech and Music [Digital]

Speech and Audio Signal Processing: Processing and Perception of Speech and Music [2nd edition]  Ben GoldNelson Morgan, Dan Ellis

Digital copy:  http://www.amazon.com/Speech-Audio-Signal-Processing-Perception/dp/product-description/1118142888

Hardcopy available: http://www.amazon.com/Speech-Audio-Signal-Processing-Perception/dp/0470195363/ref=sr_1_1?s=books&ie=UTF8&qid=1319142964&sr=1-1

Back  Top

5-1-2Video Proceedings ERMITES 2011
Actes vidéo des journées ERMITES 2011 'Décomposition Parcimonieuse, Contraction et Structuration pour l'Analyse de Scènes', sont en ligne sur :   http://glotin.univ-tln.fr/ERMITES11

On y retrouve (en .mpg) la vingtaine d'heure des conférences de :

Y. Bengio, Montréal
    «Apprentissage Non-Supervisé de Représentations Profondes »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Y_Bengio_1sur4.mp4 ...

S. Mallat, Paris
    « Scattering & Matching Pursuit for Acoustic Sources Separation »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Mallat_1sur3.mp4 ...

J.-P. Haton, Nancy
    « Analyse de Scène et Reconnaissance Stochastique de la Parole »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_JP_Haton_1sur4.mp4 ...

M. Kowalski, Paris
    « Sparsity and structure for audio signal: a *-lasso therapy »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Kowalski_1sur5.mp4 ...

O. Adam, Paris
    « Estimation de Densité de Population de Baleines par Analyse de
leurs Chants »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Adam.mp4

X. Halkias, New-York
    « Detection and Tracking of Dolphin Vocalizations »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Halkias.mp4

J. Razik, Toulon
    « Sparse coding : from speech to whales »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Razik.mp4

H. Glotin, Toulon
   « Suivi & reconstruction du comportement de cétacés par acoustique passive »

ps : ERMITES 2012 portera sur la vision (Y. Lecun, Y. Thorpe, P.
Courrieu, M Perreira, M. Van Gerven,...)
Back  Top

5-1-3Zeki Majeed Hassan and Barry Heselwood (Eds): Instrumental Studies in Arabic Phonetics

Instrumental Studies in Arabic Phonetics
Edited by Zeki Majeed Hassan and Barry Heselwood
University of Gothenburg / University of Leeds
[Current Issues in Linguistic Theory, 319] 2011. xii, 365 pp.
Publishing status: Available
Hardbound – Available
ISBN 978 90 272 4837 4 | EUR 110.00 | USD 165.00
e-Book – Forthcoming Ordering information
ISBN 978 90 272 8322 1 | EUR 110.00 | USD 165.00
Brought together in this volume are fourteen studies using a range of modern instrumental methods – acoustic and articulatory – to investigate the phonetics of several North African and Middle Eastern varieties of Arabic. Topics covered include syllable structure, quantity, assimilation, guttural and emphatic consonants and their pharyngeal and laryngeal mechanisms, intonation, and language acquisition. In addition to presenting new data and new descriptions and interpretations, a key aim of the volume is to demonstrate the depth of objective analysis that instrumental methods can enable researchers to achieve. A special feature of many chapters is the use of more than one type of instrumentation to give different perspectives on phonetic properties of Arabic speech which have fascinated scholars since medieval times. The volume will be of interest to phoneticians, phonologists and Arabic dialectologists, and provides a link between traditional qualitative accounts of spoken Arabic and modern quantitative methods of instrumental phonetic analysis.

Acknowledgements  vii – viii
List of contributors  ix – x
Transliteration and transcription symbols for Arabic  xi – xii
Introduction
Barry Heselwood and Zeki Majeed Hassan 1 – 26
Part I. Issues in syntagmatic structure
Preliminary study of Moroccan Arabic word-initial consonant clusters and syllabification using electromagnetic articulography
Adamantios I. Gafos, Philip Hoole and Chakir Zeroual 27 – 46
An acoustic phonetic study of quantity and quantity complementarity in Swedish and Iraqi Arabic
Zeki Majeed Hassan 47 – 62
Assimilation of /l/ to /r/ in Syrian Arabic: An electropalatographic and acoustic study
Barry Heselwood, Sara Howard and Rawya Ranjous 63 – 98
Part II. Guttural consonants
A study of the laryngeal and pharyngeal consonants in Jordanian Arabic using nasoendoscopy, videofluoroscopy and spectrography
Barry Heselwood and Feda Al-Tamimi 99
A phonetic study of guttural laryngeals in Palestinian Arabic using laryngoscopic and acoustic analysis
Kimary N. Shahin 129 – 140
Airflow and acoustic modelling of pharyngeal and uvular consonants in Moroccan Arabic
Mohamed Yeou and Shinji Maeda 141 – 162
Part III. Emphasis and coronal consonants
Nasoendoscopic, videofluoroscopic and acoustic study of plain and emphatic coronals in Jordanian Arabic
Feda Al-Tamimi and Barry Heselwood 163 – 192
Acoustic and electromagnetic articulographic study of pharyngealisation: Coarticulatory effects as an index of stylistic and regional variation in Arabic
Mohamed Embarki, Slim Ouni, Mohamed Yeou, M. Christian Guilleminot and Sallal Al-Maqtari 193 – 216
Investigating the emphatic feature in Iraqi Arabic: Acoustic and articulatory evidence of coarticulation
Zeki Majeed Hassan and John H. Esling 217 – 234
Glottalisation and neutralisation in Yemeni Arabic and Mehri: An acoustic study
Janet C.E. Watson and Alex Bellem 235 – 256
The phonetics of localising uvularisation in Ammani-Jordanian Arabic: An acoustic study
Bushra Adnan Zawaydeh and Kenneth de Jong 257 – 276
EMA, endoscopic, ultrasound and acoustic study of two secondary articulations in Moroccan Arabic: Labial-velarisation vs. emphasis
Chakir Zeroual, John H. Esling and Philip Hoole 277 – 298
Part IV. Intonation and acquisition
Acoustic cues to focus and givenness in Egyptian Arabic
Sam Hellmuth 299 – 324
Acquisition of Lebanese Arabic and Yorkshire English /l/ by bilingual and monolingual children: A comparative spectrographic study
Ghada Khattab 325 – 354
Appendix: Phonetic instrumentation used in the studies  355 – 358

Back  Top

5-1-4G. Bailly, P. Perrier & E. Vatikiotis-Batesonn eds : Audiovisual Speech Processing

'Audiovisual
Speech Processing' édité par G. Bailly, P. Perrier & E. Vatikiotis-Batesonn chez
Cambridge University Press ?

'When we speak, we configure the vocal tract which shapes the visible motions of the face
and the patterning of the audible speech acoustics. Similarly, we use these visible and
audible behaviors to perceive speech. This book showcases a broad range of research
investigating how these two types of signals are used in spoken communication, how they
interact, and how they can be used to enhance the realistic synthesis and recognition of
audible and visible speech. The volume begins by addressing two important questions about
human audiovisual performance: how auditory and visual signals combine to access the
mental lexicon and where in the brain this and related processes take place. It then
turns to the production and perception of multimodal speech and how structures are
coordinated within and across the two modalities. Finally, the book presents overviews
and recent developments in machine-based speech recognition and synthesis of AV speech. '


Back  Top

5-1-5Fuchs, Susanne / Weirich, Melanie / Pape, Daniel / Perrier, Pascal (eds.): Speech Planning and Dynamics, Publisher P.Lang

Fuchs, Susanne / Weirich, Melanie / Pape, Daniel / Perrier, Pascal (eds.)

Speech Planning and Dynamics

Frankfurt am Main, Berlin, Bern, Bruxelles, New York, Oxford, Wien, 2012. 277 pp., 50 fig., 8 tables

Speech Production and Perception. Vol. 1

Edited by Susanne Fuchs and Pascal Perrier

Imprimé :

ISBN 978-3-631-61479-2 hb.

SFR 60.00 / €* 52.95 / €** 54.50 / € 49.50 / £ 39.60 / US$ 64.95

eBook :

ISBN 978-3-653-01438-9

SFR 63.20 / €* 58.91 / €** 59.40 / € 49.50 / £ 39.60 / US$ 64.95

Commander en ligne : www.peterlang.com

Back  Top

5-1-6Video archive of Odyssey Speaker and Language Recognition Workshop, Singapore 2012
Odyssey Speaker and Language Recognition Workshop 2012, the workshop of ISCA SIG Speaker and Language Characterization, was held in Singapore on 25-28 June 2012. Odyssey 2012 is glad to announce that its video recordings have been included in the ISCA Video Archive. http://www.isca-speech.org/iscaweb/index.php/archive/video-archive
Back  Top

5-2 Database
5-2-1ELRA - Language Resources Catalogue - Update (2012-07)

*****************************************************************
ELRA - Language Resources Catalogue - Update
*****************************************************************
ELRA is happy to announce that 2 new Speech     Telephone Resources are now available in its catalogue.
    Moreover, an updated version of the Bilingual Collocational     Dictionary (Horst Bogatz) has also been released.     
   
    1) New Language Resources:
     
      ELRA-S0343 VERIF1DE
   
The speech corpus VERIF1DE contains 20 recordings (sessions) of     150 German speakers each over the telephone network (10 sessions     over fixed network and 10 sessions over GSM). Each session contains  40 single recordings, mainly speech read from a prompt sheet.
  
For more information, see: http://catalog.elra.info/product_info.php?products_id=1169
   
    ELRA-S0344 LILA Hindi Belt database
   
The LILA Hindi Belt database comprises 2,023 Hindi speakers     (1,011 males and 1,012 females, all speakers with Hindi as first     language) recorded over the Indian mobile telephone network. Each  speaker uttered 83 read and spontaneous items.
   
For more information, see: http://catalog.elra.info/product_info.php?products_id=1170
   
    2) Updated Language Resource:
     
    ELRA-M0013 Bilingual Collocational Dictionary (Horst Bogatz)
   
This new release contains  69,000  English headwords (instead       of 40,000 for the previous release).
    The bilingual English-German collocational dictionary consists of     around 69,000 English headwords, including concepts expressed with     more than one word (e.g. 'the awareness of the environment' or 'lame     duck') and hyphenated compounds. It contains verbs, adjectives,     synonyms and phrases that collocate with the headword. It provides     the German equivalents for the headwords as well as their English     synonyms.
    For more information, see: http://catalog.elra.info/product_info.php?products_id=451
    
    For more information on the catalogue, please contact Valérie  Mapelli mailto:mapelli@elda.org
   
    Visit our On-line Catalogue: http://catalog.elra.info
    Visit the Universal Catalogue: http://universal.elra.info
    Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/LRs-Announcements.html

Back  Top

5-2-2LDC Newsletter (November 2012)

In this newsletter:

   

Spring 2013 LDC             Data Scholarship Program

   

 Invitation to Join             for Membership Year 2013

   

Why become an LDC member?

   

2012 User Survey Results
         

   

LDC to Close for             Thanksgiving Break

   

New


          publications:

   

Annotated English Gigaword

   

Chinese-English             Semiconductor Parallel Text
         

          GALE               Phase 2 Arabic Newswire Parallel Text

       

 

   


   

 

       

Spring 2013 LDC           Data Scholarship Program

   

Applications are now being accepted         through January 15, 2013, 11:59PM EST for the Spring 2013 LDC         Data Scholarship program! The LDC Data Scholarship program         provides university students with access to LDC data at no-cost.         During previous program cycles, LDC has awarded no-cost copies         of LDC data to over 25 individual students and student research         groups.

   

This program is open to students         pursuing both undergraduate and graduate studies in an         accredited college or university. LDC Data Scholarships are not         restricted to any particular field of study; however, students         must demonstrate a well-developed research agenda and a bona         fide inability to pay. The selection process is highly         competitive.

   

The application consists of two         parts:

   

(1) Data Use Proposal. Applicants         must submit a proposal describing their intended use of the         data. The proposal should state which data the student plans to         use and how the data will benefit their research project as well         as information on the proposed methodology or algorithm.

   

Applicants should consult the LDC Corpus Catalog         for a complete list of data distributed by LDC. Due to certain         restrictions, a handful of LDC corpora are restricted to members         of the Consortium. Applicants are advised to select a maximum of         one to two datasets; students may apply for additional datasets         during the following cycle once they have completed processing         of the initial datasets and publish or present work in some         juried venue.

   

(2) Letter of Support. Applicants         must submit one letter of support from their thesis adviser or         department chair. The letter must verify the student's need for         data and confirm that the department or university lacks the         funding to pay the full Non-member Fee for the data or to join         the consortium.      

   

For further information on         application materials and program rules, please visit the LDC Data Scholarship         page.

   

Students can email their applications to the LDC           Data Scholarship program. Decisions will be sent by email         from the same address.

   

The deadline for the Spring 2013         program cycle is January 15, 2013, 11:59PM EST.

   

 

   

Invitation to Join           for Membership Year 2013

   

Membership
        Year (MY) 2013 is open for joining!  We would like to invite all         current and previous members of LDC to renew their membership as         well as welcome new organizations to join the consortium.    For         MY2013, LDC is pleased to maintain membership fees at last         year’s rates – membership fees will not increase.  Additionally,         LDC will extend discounts on membership fees to members who keep         their membership current and who join early in the year.
       
        The details of our early renewal discounts for MY2013 are as         follows:

   

·         Organizations
        who joined for MY2012 will receive a 5% discount when renewing.         This discount will apply throughout 2013, regardless of time of         renewal. MY2012 members renewing before March 1, 2013 will         receive an additional 5% discount, for a total 10% discount off         the membership fee.

   

·         New
        members as well as organizations who did not join for MY2012,         but who held membership in any of the previous MYs (1993-2011),         will also be eligible for a 5% discount provided that they         join/renew before March 1, 2013.

   

The
        following table provides exact pricing information.

   

 

   

                                                                                                                                                                                                                                                                                                                                                                                                                                                               
           

                 

         
           

MY2013
                  Fee

         
           

MY2013
                  Fee
                  with 5% Discount*

         
           

MY2013
                  Fee
                  with 10% Discount** 

         
           

Not-for-Profit
                  /US Government

         
           

                 

         
           

                 

         
           

                 

         
           

                 

         
           

Standard
               

         
           

US$2400
               

         
           

US$2280
               

         
           

US$2160
               

         
           

                 

         
           

Subscription
               

         
           

US$3850
               

         
           

US$3658
               

         
           

US$3465
               

         
           

For-Profit
               

         
           

                 

         
           

                 

         
           

                 

         
           

                 

         
           

Standard
               

         
           

US$24000
               

         
           

US$22800
               

         
           

US$21600
               

         
           

                 

         
           

Subscription
               

         
           

US$27500
               

         
           

US$26125
               

         
           

US$24750
               

         

   


        *  For new members, MY2012 Members renewing for MY2013, and any         previous year Member who renews before March 1, 2013
       
        ** For MY2012 Members renewing before March 1, 2013
       
       
        Publications for MY2013 are still being planned; here are the         working titles of data sets we intend to provide:

   

                                                                                                                           
           

·     
                Arabic Treebank - Weblog

         
           

·        
                Hispanic-English Speech   

         
           

·       
                Chinese-English Biomedical Parallel Text

         
           

·        
                Maninkakan Lexicon

         
           

·        
                GALE data – all phases and tasks

         
           

·        
                OpenMT 2008-2012 Progress Set

         

   

         
        In addition to receiving new publications, current year members         of the LDC also enjoy the benefit of licensing older data at         reduced costs; current year for-profit members may use most data         for commercial applications.
       
        This past year, LDC members who joined early or kept their         membership current saved almost US$70,000 collectively on         membership fees.  Be sure to keep an eye on your mail - all         previous and current LDC members will be sent an invitation to         join letter and renewal invoice for MY2013.  Renew early for         MY2013 to save today!

   

 

   

Why become an LDC member?
       

   

LDC
        is offering early renewal discounts on membership fees for         Membership Year 2013 making now a good time to consider joining         or renewing membership.   LDC membership has the following         advantages:

   

         
  • LDC
              membership provides cost-effective access to an extensive and           growing catalog that spans 20 years and includes over 500           multilingual speech, text, and video resources. Even if your           organization only needs a few datasets from a given membership           year, membership is often the most economical way to obtain           current corpora. Additionally, the generous discounts that           member organizations receive on older corpora reduce the cost           of acquiring such datasets.
  •    

   

         
  • All
              members enjoy unlimited use of LDC data within their           organizations.  For universities, there is no difference in           cost between a departmental membership and one that is           university-wide. Departments can therefore combine resources           and establish one LDC membership for use by the entire           university community.  Likewise, for-profit members with           multiple branches can maintain one membership for use by their           entire organizations.
  •    

   

For-profit
        organizations are reminded that an LDC membership is a         pre-requisite for obtaining a commercial license to almost all         LDC databases.  Non-member organizations, including non-member         for-profit organizations, cannot use LDC data to develop or test         products for commercialization, nor can they use LDC data in any         commercial product or for any commercial purpose.  LDC data         users should consult corpus-specific license agreements for         limitations, including commercial restrictions, on the use of         certain corpora. In the case of a small group of corpora,         commercial licenses must be obtained separately from the owners         of the data.

   

     

   

2012 User Survey           Results

   

 Earlier         this year, LDC sent a survey to its user communities. Like         previous iterations in 2006 and 2007, the survey solicited         community input and suggestions on key LDC-related topics,         including:

   

  • Satisfaction levels with LDC’s data, homepage and         Catalog
  • Reflections on LDC’s 20th Anniversary         year
  • Suggestions for future publications
  • Speculations on the future of HLT-related fields,         specifically on mobile technologies, cloud computing, social         networking and open data

               

Survey respondents were         generally satisfied with LDC’s data, membership options,         homepage and Catalog, though there were requests for additional         data options and data acquisition methods. Some of the data         respondents requested are already in our pipeline for the end of         2012 or for Membership Year (MY) 2013, so please be on the         lookout for Publications updates. Respondents were also very         supportive of LDC’s 20th Anniversary, posting         testimonials and well-wishes in the 20th Anniversary         section.

   

LDC would like to thank         all survey participants. Survey participants will receive access         to full survey results shortly.

   

 

   

LDC to Close for           Thanksgiving Break

   


        LDC will be closed on Thursday, November 22, 2012 and Friday,         November 23, 2012 in observance of the US Thanksgiving Holiday.          Our offices will reopen on Monday, November 26, 2012.

   

 

   


     
       
        New publications

   

(1)
      Annotated English           Gigaword was developed by Johns Hopkins
          University's Human Language Technology Center of Excellence
. It adds         automatically-generated syntactic and discourse structure         annotation to English Gigaword Fifth Edition (LDC2011T07) and also contains an         API and tools for reading the dataset's XML files. The goal of         the annotation is to provide a standardized corpus for knowledge         extraction and distributional semantics which enables broader         involvement in large-scale knowledge-acquisition efforts by         researchers.

   

Annotated
        English Gigaword contains the nearly ten million documents (over         four billion words) of the original English Gigaword Fifth         Edition from seven news sources:

   

         
  • Agence
              France-Presse, English Service (afp_eng)
  •      
  • Associated
              Press Worldstream, English Service (apw_eng)
  •      
  • Central
              News Agency of Taiwan, English Service (cna_eng)
  •      
  • Los
              Angeles Times/Washington Post Newswire Service (ltw_eng)
  •      
  • Washington
              Post/Bloomberg Newswire Service (wpb_eng)
  •      
  • New
              York Times Newswire Service (nyt_eng)
  •      
  • Xinhua
              News Agency, English Service (xin_eng)
  •    

   

The
        following layers of annotation were added:

   

         
  • Tokenized
              and segmented sentences
  •      
  • Treebank-style
              constituent parse trees
  •      
  • Syntactic
              dependency trees
  •      
  • Named
              entities
  •      
  • In-document
              coreference chains
  •    

   

The
        annotation was performed in a three-step process: (1) the data         was preprocessed and sentences selected for annotation         (sentences with more than 100 tokens were excluded); (2)         syntactic parses were derived; and (3) the parsed output was         post-processed to derive syntactic dependencies, named entities         and coreference chains. Over 183 million sentences were parsed.        

   

Annotated
        English Gigaword is distributed on one hard drive.

   

2012
        Subscription Members will automatically receive one copy of this         data on hard drive.  2012 Standard Members may request a copy as         part of their 16 free membership corpora. 2011 Members who         licensed English Gigaword Fifth Edition (LDC2011T07) may request a no-cost         copy of Annotated English Gigaword. Non-member organizations who         licensed English Gigaword Fifth Edition may request a copy of         Annotated English Gigaword for the US$200 media fee. Non-member         organizations without a license to English Gigaword Fifth         Edition may obtain this data for US$6000.

   

*

   

(2) Chinese-English           Semiconductor Parallel Text         was developed by The MITRE Corporation. It consists of         parallel sentences from a collection of abstracts from         scientific articles on semiconductors published in Mandarin and         translated into English by translators with particular expertise         in the technical area. Translators were instructed to err on the         side of literal translation if required, but to maintain the         technical writing style of the source and to make the resulting         English as natural as possible. The translators followed         specific guidelines for translation, and those are included in         this distribution.

   

There
        are 2,169 lines of parallel Mandarin and English, with a total         of 125,302 characters of Mandarin and 64,851 words of English,         presented in a separate UTF-8 plain text file for each language.         The sentences were translated in sequential order and presented         in a scrambled order, such that parallel sentences at identical         line numbers are translations. For example, the 31st line of the         English file is a translation of the 31st line of the Mandarin         file. The original line sequence is not provided.

   

Chinese-English Semiconductor
        Parallel Text is distributed via web download.

   

2012
        Subscription Members will automatically receive two copies of         this data on disc.  2012 Standard Members may request a copy as         part of their 16 free membership corpora.  Non-members may         license this data for US$1500.

   

*
     

   

     

   

(3)
      GALE Phase 2 Arabic           Newswire Parallel Text         was developed by LDC.  Along         with other corpora, the parallel text in this release comprised         training data for Phase 2 of the DARPA GALE (Global Autonomous         Language Exploitation) Program. This corpus contains Modern         Standard Arabic source text and corresponding English         translations selected from newswire data collected in 2007 by         LDC and transcribed by LDC or under its direction.

   

GALE
        Phase 2 Arabic Newswire Parallel Text includes 400         source-translation pairs, comprising 181,704 tokens of Arabic         source text and its English translation. Data is drawn from six         distinct Arabic newswire sources.: Al Ahram, Al Hayat, Al-Quds         Al-Arabi, An Nahar, Asharq Al-Awsat and Assabah.

   

The
        files in this release were transcribed by LDC staff and/or         transcription vendors under contract to LDC in accordance with         the Quick Rich           Transcription guidelines developed         by LDC. Transcribers indicated sentence boundaries in addition         to transcribing the text. Data was manually selected for         translation according to several criteria, including linguistic         features, transcription features and topic features. The         transcribed and segmented files were then reformatted into a         human-readable translation format and assigned to translation         vendors. Translators followed LDC's Arabic to English         translation guidelines. Bilingual LDC staff performed quality         control procedures on the completed translations.

   

GALE
        Phase 2 Arabic Newswire Parallel Text is distributed via web         download.

   

2012
        Subscription Members will automatically receive two copies of         this data on disc.  2012 Standard Members may request a copy as         part of their 16 free membership corpora.  Non-members may         license this data for US$1750.
     

   

 

   


   

To unsubscribe visit:

https://secure.ldc.upenn.edu/intranet/

-- 
--

 

Back  Top

5-2-3Speechocean January 2012 update

Speechocean - Language Resource Catalogue - New Released (01- 2012)

Speechocean, as a global provider of language resources and data services, has more than 200 large-scale databases available in 80+ languages and accents covering the fields of Text to Speech, Automatic Speech Recognition, Text, Machine Translation, Web Search, Videos, Images etc.

 

Speechocean is glad to announce that more Speech Resources has been released:

 

Chinese and English Mixing Speech Synthesis Database (Female)

The Chinese Mandarin TTS Speech Corpus contains the read speech of a native Chinese Female professional broadcaster recorded in a studio with high SNR (>35dB) over two channels (AKG C4000B microphone and Electroglottography (EGG) sensor). 
The Corpus includes the following categories:
1.    Basic Mandarin sub-corpus: including 5,000 utterances which were carefully designed considering all kinds of linguistic phenomena. All sentences were declarative and extracted from News channels of People's Daily, China Daily, etc. The prompts with negative words were carefully excluded. ONLY suitable length sentences were accepted (7~20 words, in average 14 words). This sub-corpus can be used for R&D of HMM-based TTS, Limit domain TTS and Small-scale concatenative TTS;
2.    Complementary Mandarin sub-corpus: including 10,000 utterances which were carefully designed considering all kinds of linguistic phenomena. All sentences were declarative and extracted from News channels of People's Daily, China Daily, etc. The prompts with negative words are carefully excluded. ONLY suitable length sentences were accepted (7~20 words, average 14 words). This sub-corpus is a complementary corpus for Basic Mandarin sub-corpus and can be used for R&D of Large-scale concatenative TTS;
3.    Mandarin Neutral sub-corpus: including 380 Chinese bi-syllable words which embedded in carrier sentences;
4.    Mandarin ERHUA sub-corpus: including 290 Chinese Erhua syllables which embedded in carrier sentences;
5.    Mandarin Digit-String sub-corpus: including 1250 utterances with 3-digit length which considered the different pronunciation of 1, i.e. “yi1” and “yao1”.
6.    Mandarin Question sub-corpus: including 300 question sentences with common used question mark, for example “吗”, “么”, “呢”, and etc.;
7.    Mandarin exclamatory sub-corpus: including 200 exclamatory sentences with common used exclamatory mark, for example “呀”, “啊”, “吧”, “啦”, and etc.;
8.    Chinese English sentence sub-corpus: including 1,000 sentences which were carefully designed considering bi-phone coverage. All sentences were extracted from News channels of Voice of America (VOA), and etc. The prompts with negative words are carefully excluded. ONLY suitable length sentences were accepted (7~20 words, in average 12 words) and phonetically annotated with SAMPA. This sub-corpus can be used for R&D of HMM-based TTS, Limit domain TTS and Small-scale concatenative TTS;
9.    Chinese English words sub-corpus: including about 6,000 commonly used English words which embedded in carrier sentence;
10.    Chinese English Abbreviation sub-corpus: including about 200 utterances which considered not only the alphabet coverage, but also the combination of character and digit, such as “MP4”;
11.    Chinese English Letter sub-corpus: including 26 carrier utterances with each letter embedded in the Beginning, Middle and End;
12.    Chinese Greek Letter sub-corpus: including 24 carrier utterances with each letter embedded in the Beginning, Middle and End.

All speech data are segmented and labeled on phone level. Pronunciation lexicon and pitch extract from EEG can also be provided based on demands.

 

France French Speech Recognition Corpus (desktop) – 50 speakers

This France French desktop speech recognition database was collected by SpeechOcean in France. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections for 30 languages presently. 

It contains the voices of 50 different native speakers who were balanced distributed by age (mainly 16 – 30, 31 – 45, 46 – 60), gender (28 males, 22 females) and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognition applications. Each speaker recorded 500 utterances in a quiet office environment through two professional microphones. Each utterance is stored as 44.1K 16Bit uncompressed PCM format and accompanied by an ASCII SAM label file which contains the relevant descriptive information.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

 

UK English Speech Recognition Corpus (desktop) – 50 speakers

This UK English desktop speech recognition database was collected by SpeechOcean in England. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections for 30 languages presently. 

It contains the voices of 50 different native speakers who were balanced distributed by age (mainly 16 – 30, 31 – 45, 46 – 60), gender (28 males, 22 females) and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognition applications. Each speaker recorded 500 utterances in a quiet office environment through two professional microphones. Each utterance is stored as 44.1K 16Bit uncompressed PCM format and accompanied by an ASCII SAM label file which contains the relevant descriptive information.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

 

US English Speech Recognition Corpus (desktop) – 50 speakers

This US English desktop speech recognition database was collected by SpeechOcean in America. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections for 30 languages presently. 

It contains the voices of 50 different native speakers who were balanced distributed by age (mainly 16 – 30, 31 – 45, 46 – 60), gender (25 males, 25 females) and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognition applications. Each speaker recorded 500 utterances in a quiet office environment through two professional microphones. Each utterance is stored as 44.1K 16Bit uncompressed PCM format and accompanied by an ASCII SAM label file which contains the relevant descriptive information.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

 

Italian Speech Recognition Corpus (desktop) – 50 speakers

This Italian desktop speech recognition database was collected by SpeechOcean in Italy. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections for 30 languages presently. 

It contains the voices of 50 different native speakers who were balanced distributed by age (mainly 16 – 30, 31 – 45, 46 – 60), gender (23 males, 27 females) and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognition applications. Each speaker recorded 500 utterances in a quiet office environment through two professional microphones. Each utterance is stored as 44.1K 16Bit uncompressed PCM format and accompanied by an ASCII SAM label file which contains the relevant descriptive information.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

 

For more information about our Database and Services please visit our website www.Speechocen.com or visit our on-line Catalogue at http://www.speechocean.com/en-Product-Catalogue/Index.html

If you have any inquiry regarding our databases and service please feel free to contact us:

Xianfeng Cheng mailto: Chengxianfeng@speechocean.com

Marta Gherardi mailto: Marta@speechocean.com

 

 

Back  Top

5-2-4Appen ButlerHill

 

Appen ButlerHill 

A global leader in linguistic technology solutions

RECENT CATALOG ADDITIONS—MARCH 2012

1. Speech Databases

1.1 Telephony

1.1 Telephony

Language

Database Type

Catalogue Code

Speakers

Status

Bahasa Indonesia

Conversational

BAH_ASR001

1,002

Available

Bengali

Conversational

BEN_ASR001

1,000

Available

Bulgarian

Conversational

BUL_ASR001

217

Available shortly

Croatian

Conversational

CRO_ASR001

200

Available shortly

Dari

Conversational

DAR_ASR001

500

Available

Dutch

Conversational

NLD_ASR001

200

Available

Eastern Algerian Arabic

Conversational

EAR_ASR001

496

Available

English (UK)

Conversational

UKE_ASR001

1,150

Available

Farsi/Persian

Scripted

FAR_ASR001

789

Available

Farsi/Persian

Conversational

FAR_ASR002

1,000

Available

French (EU)

Conversational

FRF_ASR001

563

Available

French (EU)

Voicemail

FRF_ASR002

550

Available

German

Voicemail

DEU_ASR002

890

Available

Hebrew

Conversational

HEB_ASR001

200

Available shortly

Italian

Conversational

ITA_ASR003

200

Available shortly

Italian

Voicemail

ITA_ASR004

550

Available

Kannada

Conversational

KAN_ASR001

1,000

In development

Pashto

Conversational

PAS_ASR001

967

Available

Portuguese (EU)

Conversational

PTP_ASR001

200

Available shortly

Romanian

Conversational

ROM_ASR001

200

Available shortly

Russian

Conversational

RUS_ASR001

200

Available

Somali

Conversational

SOM_ASR001

1,000

Available

Spanish (EU)

Voicemail

ESO_ASR002

500

Available

Turkish

Conversational

TUR_ASR001

200

Available

Urdu

Conversational

URD_ASR001

1,000

Available

1.2 Wideband

Language

Database Type

Catalogue Code

Speakers

Status

English (US)

Studio

USE_ASR001

200

Available

French (Canadian)

Home/ Office

FRC_ASR002

120

Available

German

Studio

DEU_ASR001

127

Available

Thai

Home/Office

THA_ASR001

100

Available

Korean

Home/Office

KOR_ASR001

100

Available

2. Pronunciation Lexica

Appen Butler Hill has considerable experience in providing a variety of lexicon types. These include:

Pronunciation Lexica providing phonemic representation, syllabification, and stress (primary and secondary as appropriate)

Part-of-speech tagged Lexica providing grammatical and semantic labels

Other reference text based materials including spelling/mis-spelling lists, spell-check dictionar-ies, mappings of colloquial language to standard forms, orthographic normalization lists.

Over a period of 15 years, Appen Butler Hill has generated a significant volume of licensable material for a wide range of languages. For holdings information in a given language or to discuss any customized development efforts, please contact: sales@appenbutlerhill.com

3. Named Entity Corpora

Language

Catalogue Code

Words

Description

Arabic

ARB_NER001

500,000

These NER Corpora contain text material from a vari-ety of sources and are tagged for the following Named Entities: Person, Organization, Location, Na-tionality, Religion, Facility, Geo-Political Entity, Titles, Quantities

English

ENI_NER001

500,000

Farsi/Persian

FAR_NER001

500,000

Korean

KOR_NER001

500,000

Japanese

JPY_NER001

500,000

Russian

RUS_NER001

500,000

Mandarin

MAN_NER001

500,000

Urdu

URD_NER001

500,000

3. Named Entity Corpora

Language

Catalogue Code

Words

Description

Arabic

ARB_NER001

500,000

These NER Corpora contain text material from a vari-ety of sources and are tagged for the following Named Entities: Person, Organization, Location, Na-tionality, Religion, Facility, Geo-Political Entity, Titles, Quantities

English

ENI_NER001

500,000

Farsi/Persian

FAR_NER001

500,000

Korean

KOR_NER001

500,000

Japanese

JPY_NER001

500,000

Russian

RUS_NER001

500,000

Mandarin

MAN_NER001

500,000

Urdu

URD_NER001

500,000

4. Other Language Resources

Morphological Analyzers – Farsi/Persian & Urdu

Arabic Thesaurus

Language Analysis Documentation – multiple languages

 

For additional information on these resources, please contact: sales@appenbutlerhill.com

5. Customized Requests and Package Configurations

Appen Butler Hill is committed to providing a low risk, high quality, reliable solution and has worked in 130+ languages to-date supporting both large global corporations and Government organizations.

We would be glad to discuss to any customized requests or package configurations and prepare a cus-tomized proposal to meet your needs.

6. Contact Information

Prithivi Pradeep

Business Development Manager

ppradeep@appenbutlerhill.com

+61 2 9468 6370

Tom Dibert

Vice President, Business Development, North America

tdibert@appenbutlerhill.com

+1-315-339-6165

                                                         www.appenbutlerhill.com

Back  Top

5-3 Software
5-3-1Matlab toolbox for glottal analysis

I am pleased to announce you that we made a Matlab toolbox for glottal analysis now available on the web at:

 

http://tcts.fpms.ac.be/~drugman/Toolbox/

 

This toolbox includes the following modules:

 

- Pitch and voiced-unvoiced decision estimation

- Speech polarity detection

- Glottal Closure Instant determination

- Glottal flow estimation

 

By the way, I am also glad to send you my PhD thesis entitled “Glottal Analysis and its Applications”:

http://tcts.fpms.ac.be/~drugman/files/DrugmanPhDThesis.pdf

 

where you will find applications in speech synthesis, speaker recognition, voice pathology detection, and expressive speech analysis.

 

Hoping that this might be useful to you, and to see you soon,

 

Thomas Drugman

Back  Top

5-3-2ROCme!: a free tool for audio corpora recording and management

ROCme!: nouveau logiciel gratuit pour l'enregistrement et la gestion de corpus audio.

Le logiciel ROCme! permet une gestion rationalisée, autonome et dématérialisée de l’enregistrement de corpus lus.

Caractéristiques clés :
- gratuit
- compatible Windows et Mac
- interface paramétrable pour le recueil de métadonnées sur les locuteurs
- le locuteur fait défiler les phrases à l'écran et les enregistre de façon autonome
- format audio paramétrable

Téléchargeable à cette adresse :
www.ddl.ish-lyon.cnrs.fr/rocme

 
Back  Top

5-3-3VocalTractLab 2.0 : A tool for articulatory speech synthesis

VocalTractLab 2.0 : A tool for articulatory speech synthesis

It is my pleasure to announce the release of the new major version 2.0 of VocalTractLab. VocalTractLab is an articulatory speech synthesizer and a tool to visualize and explore the mechanism of speech production with regard to articulation, acoustics, and control. It is available from http://www.vocaltractlab.de/index.php?page=vocaltractlab-download .
Compared to version 1.0, the new version brings many improvements in terms of the implemented models of the vocal tract, the vocal folds, the acoustic simulation, and articulatory control, as well as in terms of the user interface. Most importantly, the new version comes together with a manual.

If you like, give it a try. Reports on bugs and any other feedback are welcome.

Peter Birkholz

Back  Top



 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA