ISCA - International Speech
Communication Association


ISCApad Archive  »  2013  »  ISCApad #175  »  Resources

ISCApad #175

Thursday, January 10, 2013 by Chris Wellekens

5 Resources
5-1 Books
5-1-1Ben Gold, Nelson Morgan, Dan Ellis :Speech and Audio Signal Processing: Processing and Perception of Speech and Music [Digital]

Speech and Audio Signal Processing: Processing and Perception of Speech and Music [2nd edition]  Ben GoldNelson Morgan, Dan Ellis

Digital copy:  http://www.amazon.com/Speech-Audio-Signal-Processing-Perception/dp/product-description/1118142888

Hardcopy available: http://www.amazon.com/Speech-Audio-Signal-Processing-Perception/dp/0470195363/ref=sr_1_1?s=books&ie=UTF8&qid=1319142964&sr=1-1

Back  Top

5-1-2Video Proceedings ERMITES 2011
Actes vidéo des journées ERMITES 2011 'Décomposition Parcimonieuse, Contraction et Structuration pour l'Analyse de Scènes', sont en ligne sur :   http://glotin.univ-tln.fr/ERMITES11

On y retrouve (en .mpg) la vingtaine d'heure des conférences de :

Y. Bengio, Montréal
    «Apprentissage Non-Supervisé de Représentations Profondes »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Y_Bengio_1sur4.mp4 ...

S. Mallat, Paris
    « Scattering & Matching Pursuit for Acoustic Sources Separation »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Mallat_1sur3.mp4 ...

J.-P. Haton, Nancy
    « Analyse de Scène et Reconnaissance Stochastique de la Parole »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_JP_Haton_1sur4.mp4 ...

M. Kowalski, Paris
    « Sparsity and structure for audio signal: a *-lasso therapy »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Kowalski_1sur5.mp4 ...

O. Adam, Paris
    « Estimation de Densité de Population de Baleines par Analyse de
leurs Chants »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Adam.mp4

X. Halkias, New-York
    « Detection and Tracking of Dolphin Vocalizations »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Halkias.mp4

J. Razik, Toulon
    « Sparse coding : from speech to whales »
     http://lsis.univ-tln.fr/~glotin/ERMITES_2011_Razik.mp4

H. Glotin, Toulon
   « Suivi & reconstruction du comportement de cétacés par acoustique passive »

ps : ERMITES 2012 portera sur la vision (Y. Lecun, Y. Thorpe, P.
Courrieu, M Perreira, M. Van Gerven,...)
Back  Top

5-1-3Zeki Majeed Hassan and Barry Heselwood (Eds): Instrumental Studies in Arabic Phonetics

Instrumental Studies in Arabic Phonetics
Edited by Zeki Majeed Hassan and Barry Heselwood
University of Gothenburg / University of Leeds
[Current Issues in Linguistic Theory, 319] 2011. xii, 365 pp.
Publishing status: Available
Hardbound – Available
ISBN 978 90 272 4837 4 | EUR 110.00 | USD 165.00
e-Book – Forthcoming Ordering information
ISBN 978 90 272 8322 1 | EUR 110.00 | USD 165.00
Brought together in this volume are fourteen studies using a range of modern instrumental methods – acoustic and articulatory – to investigate the phonetics of several North African and Middle Eastern varieties of Arabic. Topics covered include syllable structure, quantity, assimilation, guttural and emphatic consonants and their pharyngeal and laryngeal mechanisms, intonation, and language acquisition. In addition to presenting new data and new descriptions and interpretations, a key aim of the volume is to demonstrate the depth of objective analysis that instrumental methods can enable researchers to achieve. A special feature of many chapters is the use of more than one type of instrumentation to give different perspectives on phonetic properties of Arabic speech which have fascinated scholars since medieval times. The volume will be of interest to phoneticians, phonologists and Arabic dialectologists, and provides a link between traditional qualitative accounts of spoken Arabic and modern quantitative methods of instrumental phonetic analysis.

Acknowledgements  vii – viii
List of contributors  ix – x
Transliteration and transcription symbols for Arabic  xi – xii
Introduction
Barry Heselwood and Zeki Majeed Hassan 1 – 26
Part I. Issues in syntagmatic structure
Preliminary study of Moroccan Arabic word-initial consonant clusters and syllabification using electromagnetic articulography
Adamantios I. Gafos, Philip Hoole and Chakir Zeroual 27 – 46
An acoustic phonetic study of quantity and quantity complementarity in Swedish and Iraqi Arabic
Zeki Majeed Hassan 47 – 62
Assimilation of /l/ to /r/ in Syrian Arabic: An electropalatographic and acoustic study
Barry Heselwood, Sara Howard and Rawya Ranjous 63 – 98
Part II. Guttural consonants
A study of the laryngeal and pharyngeal consonants in Jordanian Arabic using nasoendoscopy, videofluoroscopy and spectrography
Barry Heselwood and Feda Al-Tamimi 99
A phonetic study of guttural laryngeals in Palestinian Arabic using laryngoscopic and acoustic analysis
Kimary N. Shahin 129 – 140
Airflow and acoustic modelling of pharyngeal and uvular consonants in Moroccan Arabic
Mohamed Yeou and Shinji Maeda 141 – 162
Part III. Emphasis and coronal consonants
Nasoendoscopic, videofluoroscopic and acoustic study of plain and emphatic coronals in Jordanian Arabic
Feda Al-Tamimi and Barry Heselwood 163 – 192
Acoustic and electromagnetic articulographic study of pharyngealisation: Coarticulatory effects as an index of stylistic and regional variation in Arabic
Mohamed Embarki, Slim Ouni, Mohamed Yeou, M. Christian Guilleminot and Sallal Al-Maqtari 193 – 216
Investigating the emphatic feature in Iraqi Arabic: Acoustic and articulatory evidence of coarticulation
Zeki Majeed Hassan and John H. Esling 217 – 234
Glottalisation and neutralisation in Yemeni Arabic and Mehri: An acoustic study
Janet C.E. Watson and Alex Bellem 235 – 256
The phonetics of localising uvularisation in Ammani-Jordanian Arabic: An acoustic study
Bushra Adnan Zawaydeh and Kenneth de Jong 257 – 276
EMA, endoscopic, ultrasound and acoustic study of two secondary articulations in Moroccan Arabic: Labial-velarisation vs. emphasis
Chakir Zeroual, John H. Esling and Philip Hoole 277 – 298
Part IV. Intonation and acquisition
Acoustic cues to focus and givenness in Egyptian Arabic
Sam Hellmuth 299 – 324
Acquisition of Lebanese Arabic and Yorkshire English /l/ by bilingual and monolingual children: A comparative spectrographic study
Ghada Khattab 325 – 354
Appendix: Phonetic instrumentation used in the studies  355 – 358

Back  Top

5-1-4G. Bailly, P. Perrier & E. Vatikiotis-Batesonn eds : Audiovisual Speech Processing

'Audiovisual
Speech Processing' édité par G. Bailly, P. Perrier & E. Vatikiotis-Batesonn chez
Cambridge University Press ?

'When we speak, we configure the vocal tract which shapes the visible motions of the face
and the patterning of the audible speech acoustics. Similarly, we use these visible and
audible behaviors to perceive speech. This book showcases a broad range of research
investigating how these two types of signals are used in spoken communication, how they
interact, and how they can be used to enhance the realistic synthesis and recognition of
audible and visible speech. The volume begins by addressing two important questions about
human audiovisual performance: how auditory and visual signals combine to access the
mental lexicon and where in the brain this and related processes take place. It then
turns to the production and perception of multimodal speech and how structures are
coordinated within and across the two modalities. Finally, the book presents overviews
and recent developments in machine-based speech recognition and synthesis of AV speech. '


Back  Top

5-1-5Fuchs, Susanne / Weirich, Melanie / Pape, Daniel / Perrier, Pascal (eds.): Speech Planning and Dynamics, Publisher P.Lang

Fuchs, Susanne / Weirich, Melanie / Pape, Daniel / Perrier, Pascal (eds.)

Speech Planning and Dynamics

Frankfurt am Main, Berlin, Bern, Bruxelles, New York, Oxford, Wien, 2012. 277 pp., 50 fig., 8 tables

Speech Production and Perception. Vol. 1

Edited by Susanne Fuchs and Pascal Perrier

Imprimé :

ISBN 978-3-631-61479-2 hb.

SFR 60.00 / €* 52.95 / €** 54.50 / € 49.50 / £ 39.60 / US$ 64.95

eBook :

ISBN 978-3-653-01438-9

SFR 63.20 / €* 58.91 / €** 59.40 / € 49.50 / £ 39.60 / US$ 64.95

Commander en ligne : www.peterlang.com

Back  Top

5-1-6Video archive of Odyssey Speaker and Language Recognition Workshop, Singapore 2012
Odyssey Speaker and Language Recognition Workshop 2012, the workshop of ISCA SIG Speaker and Language Characterization, was held in Singapore on 25-28 June 2012. Odyssey 2012 is glad to announce that its video recordings have been included in the ISCA Video Archive. http://www.isca-speech.org/iscaweb/index.php/archive/video-archive
Back  Top

5-1-7Tuomas Virtanen, Rita Singh, Bhiksha Raj (editors),Techniques for Noise Robustness in Automatic Speech Recognition,Wiley

Techniques for Noise Robustness in Automatic Speech Recognition
Tuomas Virtanen, Rita Singh, Bhiksha Raj (editors)
ISBN: 978-1-1199-7088-0
Publisher: Wiley

Automatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are noisy, for example users calling up a voice search system from a busy cafeteria or a street. This can result in degraded speech recordings and adversely affect the performance of speech recognition systems. As the use of ASR systems increases, knowledge of the state-of-the-art in techniques to deal with such problems becomes critical to system and application engineers and researchers who work with or on ASR technologies. This book presents a comprehensive survey of the state-of-the-art in techniques used to improve the robustness of speech recognition systems to these degrading external influences.

Key features:

*Reviews all the main noise robust ASR approaches, including signal separation, voice activity detection, robust feature extraction, model compensation and adaptation, missing data techniques and recognition of reverberant speech.
*Acts as a timely exposition of the topic in light of more widespread use in the future of ASR technology in challenging environments.
*Addresses robustness issues and signal degradation which are both key requirements for practitioners of ASR.
*Includes contributions from top ASR researchers from leading research units in the field

Back  Top

5-2 Database
5-2-1ELRA - Language Resources Catalogue - Update (2012-12)

ELRA - Language Resources Catalogue - Update
*****************************************************************
ELRA is happy to announce that 4 new Written Corpora are now available in its catalogue.

 ELRA-W0059 LT Corpus The LT Corpus is composed of 70 fiction texts from Portuguese renowned authors. The corpus contains 1,781,083 tokens. The texts date from before 1940. The corpus is delivered in one file, in two different formats. The txt version has one sentence per line, an identification number for each text and no further annotation. The cqpweb file is one token per line, followed by pos tag and lemma, and is annotated for NP chunks. For more information, see: http://catalog.elra.info/product_info.php?products_id=1178
ELRA-W0060 PTPARL Corpus The PTPARL Corpus contains 1,076 texts consisting of adapted transcriptions of the Portuguese Parliament sessions. The corpus contains 1,000,441 tokens. The corpus is delivered in one file, in two different formats. The txt version has one sentence per line, an identification number for each text and no further annotation. The cqpweb file is one token per line, followed by pos tag and lemma, and is annotated for NP chunks.For more information, see: http://catalog.elra.info/product_info.php?products_id=1179
ELRA-W0061 CINTIL-DependencyBank The CINTIL-DependencyBank (Silva and Branco, 2012) is a corpus of sentences annotated with their syntactic dependency graphs and grammatical function tags composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, there are 779 sentences (5,654 tokens) that are used for regression testing of the computational grammar that supported the annotation of the corpus. For more information, see: http://catalog.elra.info/product_info.php?products_id=1180
ELRA-W0062 CINTIL-DeepBank The CINTIL-DeepBank (Branco et al., 2010) is a corpus of sentences annotated with their full-fledged deep grammatical representations, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082 tokens). In addition, there are 779 sentences (5,654 tokens) used for regression testing of the computational grammar that supported the annotation of the corpus. For more information, see: http://catalog.elra.info/product_info.php?products_id=1181
For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli@elda.org
Visit our On-line Catalogue: http://catalog.elra.info Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/LRs-Announcements.html

 

 




 

 

Back  Top

5-2-2LDC Newsletter (December 2012)

In this newsletter:

- Spring 2013 LDC Data Scholarship Program - deadline approaching!  -

-  Two New LDC Podcasts for your Listening Pleasure  -

-  Penn Discourse Treebank Version 2.0 Update  -
-  LDC to close for Winter Break  -

New publications:

-  GALE Chinese-English Word Alignment and Tagging Training Part 3 -- Web  -

-  Russian-English Computer Security Parallel Text  -

 


 

 Spring 2013 LDC Data Scholarship Program - deadline approaching!

The deadline for the Spring 2013 LDC Data Scholarship Program is one month away!   Student applications are being accepted now through January 15, 2013, 11:59PM EST.  The LDC Data Scholarship program provides university students with access to LDC data at no cost.  This program is open to students pursuing both undergraduate and graduate studies in an accredited college or university. LDC Data Scholarships are not restricted to any particular field of study; however, students must demonstrate a well-developed research agenda and a bona fide inability to pay. 
Students will need to complete an application which consists of a data use proposal and letter of support from their adviser.  For further information on application materials and program rules, please visit the LDC Data Scholarship
page. 

Students can email their applications to the LDC Data Scholarship program. Decisions will be sent by email from the same address.

 

Two New LDC Podcasts for your Listening Pleasure

Two new podcasts are available on a theLDC blog continuing the  20th Anniversary series. The first features Natalia Bragilevskaya, LDC’s Business Administrator, Membership Coordinator Ilya Ahtaridis and Marian Reed, Marketing Coordinator. They recall the early days of LDC and describe the growth of sponsored projects work and LDC’s interactions with its membership.

Click here for Natalia, Ilya and Marian’s podcast.

The third podcast in the series introduces the community to two  LDC’ researchers Yiwola Awoyale and Moussa Bamba, whose work focuses on West African languages.

Yiwola has been teaching Linguistics, Yoruba language studies and various aspects of African linguistics since 1975. At LDC, he developed the Global Yoruba Lexical Database, a set of related dictionaries based on Yoruba and its diaspora. Moussa’s work in the Manding languages of the Niger-Congo family has resulted in the release of the Mawukakan Lexicon, to be followed by similar resources for Maninkakan, Bambara, and Jula.

In their podcast, Yiwola and Moussa discuss how they came  to LDC, their current research and how it benefits multiple communities.Click here for Yiwola and Moussa’s podcast.

Other podcasts will be published via the LDC blog, so stay tuned to that space.

Penn Discourse Treebank Version 2.0 Update

 The developers of the Penn Discourse Treebank Version 2.0 LDC2008T05 (PDTB) have updated this release to add metadata to the Wall Street Journal (WSJ) news stories in the corpus. The goal is to aid understanding PDTB files as texts and to support distinguishing texts from different genres within the WSJ. The metadata includes the following fields:

  • DD: the date the article appeared in the WSJ

  • AN: unique identifier for the article

  • HL: the column name (for regular features such as Who's News, Marketing & Media, Technology), its headline and by-line

  • SO: the source of the article

  • IN: manually-assigned codes or keywords for the article

  • CO: manually-assigned codes for companies or other organizations

  • DATELINE: normally the location where the article was filed, but sometimes has very unexpected contents

  • GV: Branch of Government or Government Agency mentioned in the article

  • SBREAKS: the byte position of section breaks present in the file

  • ARTICLEBREAK: separates files that contain more than one article

All new downloads of PDTB will contain the complete updated corpus.  Current PDTB licensees can re-download the file to obtain the updated data.

 

LDC to close for Winter Break

LDC will be closed from Monday, December 24, 2012 through Tuesday, January 1, 2013 in accordance with the University of Pennsylvania Winter Break Policy.  Our offices will reopen on Wednesday, January 2, 2013.  Requests received for membership renewals and corpora during the Winter Break will be processed at that time.
Best wishes for a happy and safe holiday season!

New publications

(1) GALE Chinese-English Word Alignment and Tagging Training Part 3 -- Webwas developed by LDC and contains 154,541 tokens of word aligned Chinese and English parallel text enriched with linguistic tags. This material was used as training data in the DARPA GALE(Global Autonomous Language Exploitation) program.

Some approaches to statistical machine translation include the incorporation of linguistic knowledge in word aligned text as a means to improve automatic word alignment and machine translation quality. This is accomplished with two annotation schemes: alignment and tagging. Alignment identifies minimum translation units and translation relations by using minimum-match and attachment annotation approaches. A set of word tags and alignment link tags are designed in the tagging scheme to describe these translation units and relations. Tagging adds contextual, syntactic and language-specific features to the alignment annotation.

GALE Chinese-English Word Alignment and Tagging Training Part 1 -- Newswire and Web (LDC2012T16) and GALE Chinese-English Word Alignment and Tagging Training Part 3 -- Web (LDC2012T20) are also available through LDC.
This release consists of Chinese source web data (newsgroup, weblog) collected by LDC in 2008 and 2009. The distribution by words, character tokens and segments appears below:

Language

Files

Words

CharTokens

Segments

Chinese

1249

103027

154541

4842

Note that all token counts are based on the Chinese data only. One token is equivalent to one character and one word is equivalent to 1.5 characters.

The Chinese word alignment tasks consisted of the following components:

  • Identifying, aligning, and tagging 8 different types of links

  • Identifying, attaching, and tagging local-level unmatched words

  • Identifying and tagging sentence/discourse-level unmatched words

  • Identifying and tagging all instances of Chinese (DE) except when they were a part of a semantic link.

GALE Chinese-English Word Alignment and Tagging Training Part 3 -- Web is distributed via web download.

2012 Subscription Members will automatically receive two copies of this data on disc.  2012 Standard Members may request a copy as part of their 16 free membership corpora.  Non-members may license this data for US$1750.

*

(2) Russian-English Computer Security Parallel Textwas developed by The MITRE Corporation. It consists of parallel sentences from a set of computer security reports published in Russian and translated into English by translators with particular expertise in the technical area. Translators were instructed to err on the side of literal translation if required, but to maintain the technical writing style of the source and to make the resulting English as natural as possible. The translators followed specific guidelines for translation, and those are included in this distribution.

There are 6,276 lines of parallel Russian and English, with a total of 60,059 words of Russian and 76,437 words of English, presented in a separate UTF-8 plain text file for each language. The sentences were translated in sequential order and presented in a scrambled order, such that parallel sentences at identical line numbers are translations. For example, the 31st line of the English file is a translation of the 31st line of the Russian file. The original line sequence is not provided. 1,694 untranslated lines (such as code snippets) are included as a separate file.

Russian-English Computer Security Parallel Text is distributed via web download.

2012 Subscription Members will automatically receive two copies of this data on disc.  2012 Standard Members may request a copy as part of their 16 free membership corpora.  Non-members may license this data for US$1500.

 


 

 

Back  Top

5-2-3Speechocean January 2012 update

Speechocean - Language Resource Catalogue - New Released (01- 2012)

Speechocean, as a global provider of language resources and data services, has more than 200 large-scale databases available in 80+ languages and accents covering the fields of Text to Speech, Automatic Speech Recognition, Text, Machine Translation, Web Search, Videos, Images etc.

 

Speechocean is glad to announce that more Speech Resources has been released:

 

Chinese and English Mixing Speech Synthesis Database (Female)

The Chinese Mandarin TTS Speech Corpus contains the read speech of a native Chinese Female professional broadcaster recorded in a studio with high SNR (>35dB) over two channels (AKG C4000B microphone and Electroglottography (EGG) sensor). 
The Corpus includes the following categories:
1.    Basic Mandarin sub-corpus: including 5,000 utterances which were carefully designed considering all kinds of linguistic phenomena. All sentences were declarative and extracted from News channels of People's Daily, China Daily, etc. The prompts with negative words were carefully excluded. ONLY suitable length sentences were accepted (7~20 words, in average 14 words). This sub-corpus can be used for R&D of HMM-based TTS, Limit domain TTS and Small-scale concatenative TTS;
2.    Complementary Mandarin sub-corpus: including 10,000 utterances which were carefully designed considering all kinds of linguistic phenomena. All sentences were declarative and extracted from News channels of People's Daily, China Daily, etc. The prompts with negative words are carefully excluded. ONLY suitable length sentences were accepted (7~20 words, average 14 words). This sub-corpus is a complementary corpus for Basic Mandarin sub-corpus and can be used for R&D of Large-scale concatenative TTS;
3.    Mandarin Neutral sub-corpus: including 380 Chinese bi-syllable words which embedded in carrier sentences;
4.    Mandarin ERHUA sub-corpus: including 290 Chinese Erhua syllables which embedded in carrier sentences;
5.    Mandarin Digit-String sub-corpus: including 1250 utterances with 3-digit length which considered the different pronunciation of 1, i.e. “yi1” and “yao1”.
6.    Mandarin Question sub-corpus: including 300 question sentences with common used question mark, for example “吗”, “么”, “呢”, and etc.;
7.    Mandarin exclamatory sub-corpus: including 200 exclamatory sentences with common used exclamatory mark, for example “呀”, “啊”, “吧”, “啦”, and etc.;
8.    Chinese English sentence sub-corpus: including 1,000 sentences which were carefully designed considering bi-phone coverage. All sentences were extracted from News channels of Voice of America (VOA), and etc. The prompts with negative words are carefully excluded. ONLY suitable length sentences were accepted (7~20 words, in average 12 words) and phonetically annotated with SAMPA. This sub-corpus can be used for R&D of HMM-based TTS, Limit domain TTS and Small-scale concatenative TTS;
9.    Chinese English words sub-corpus: including about 6,000 commonly used English words which embedded in carrier sentence;
10.    Chinese English Abbreviation sub-corpus: including about 200 utterances which considered not only the alphabet coverage, but also the combination of character and digit, such as “MP4”;
11.    Chinese English Letter sub-corpus: including 26 carrier utterances with each letter embedded in the Beginning, Middle and End;
12.    Chinese Greek Letter sub-corpus: including 24 carrier utterances with each letter embedded in the Beginning, Middle and End.

All speech data are segmented and labeled on phone level. Pronunciation lexicon and pitch extract from EEG can also be provided based on demands.

 

France French Speech Recognition Corpus (desktop) – 50 speakers

This France French desktop speech recognition database was collected by SpeechOcean in France. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections for 30 languages presently. 

It contains the voices of 50 different native speakers who were balanced distributed by age (mainly 16 – 30, 31 – 45, 46 – 60), gender (28 males, 22 females) and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognition applications. Each speaker recorded 500 utterances in a quiet office environment through two professional microphones. Each utterance is stored as 44.1K 16Bit uncompressed PCM format and accompanied by an ASCII SAM label file which contains the relevant descriptive information.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

 

UK English Speech Recognition Corpus (desktop) – 50 speakers

This UK English desktop speech recognition database was collected by SpeechOcean in England. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections for 30 languages presently. 

It contains the voices of 50 different native speakers who were balanced distributed by age (mainly 16 – 30, 31 – 45, 46 – 60), gender (28 males, 22 females) and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognition applications. Each speaker recorded 500 utterances in a quiet office environment through two professional microphones. Each utterance is stored as 44.1K 16Bit uncompressed PCM format and accompanied by an ASCII SAM label file which contains the relevant descriptive information.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

 

US English Speech Recognition Corpus (desktop) – 50 speakers

This US English desktop speech recognition database was collected by SpeechOcean in America. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections for 30 languages presently. 

It contains the voices of 50 different native speakers who were balanced distributed by age (mainly 16 – 30, 31 – 45, 46 – 60), gender (25 males, 25 females) and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognition applications. Each speaker recorded 500 utterances in a quiet office environment through two professional microphones. Each utterance is stored as 44.1K 16Bit uncompressed PCM format and accompanied by an ASCII SAM label file which contains the relevant descriptive information.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

 

Italian Speech Recognition Corpus (desktop) – 50 speakers

This Italian desktop speech recognition database was collected by SpeechOcean in Italy. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections for 30 languages presently. 

It contains the voices of 50 different native speakers who were balanced distributed by age (mainly 16 – 30, 31 – 45, 46 – 60), gender (23 males, 27 females) and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognition applications. Each speaker recorded 500 utterances in a quiet office environment through two professional microphones. Each utterance is stored as 44.1K 16Bit uncompressed PCM format and accompanied by an ASCII SAM label file which contains the relevant descriptive information.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

 

For more information about our Database and Services please visit our website www.Speechocen.com or visit our on-line Catalogue at http://www.speechocean.com/en-Product-Catalogue/Index.html

If you have any inquiry regarding our databases and service please feel free to contact us:

Xianfeng Cheng mailto: Chengxianfeng@speechocean.com

Marta Gherardi mailto: Marta@speechocean.com

 

 

Back  Top

5-2-4Appen ButlerHill

 

Appen ButlerHill 

A global leader in linguistic technology solutions

RECENT CATALOG ADDITIONS—MARCH 2012

1. Speech Databases

1.1 Telephony

1.1 Telephony

Language

Database Type

Catalogue Code

Speakers

Status

Bahasa Indonesia

Conversational

BAH_ASR001

1,002

Available

Bengali

Conversational

BEN_ASR001

1,000

Available

Bulgarian

Conversational

BUL_ASR001

217

Available shortly

Croatian

Conversational

CRO_ASR001

200

Available shortly

Dari

Conversational

DAR_ASR001

500

Available

Dutch

Conversational

NLD_ASR001

200

Available

Eastern Algerian Arabic

Conversational

EAR_ASR001

496

Available

English (UK)

Conversational

UKE_ASR001

1,150

Available

Farsi/Persian

Scripted

FAR_ASR001

789

Available

Farsi/Persian

Conversational

FAR_ASR002

1,000

Available

French (EU)

Conversational

FRF_ASR001

563

Available

French (EU)

Voicemail

FRF_ASR002

550

Available

German

Voicemail

DEU_ASR002

890

Available

Hebrew

Conversational

HEB_ASR001

200

Available shortly

Italian

Conversational

ITA_ASR003

200

Available shortly

Italian

Voicemail

ITA_ASR004

550

Available

Kannada

Conversational

KAN_ASR001

1,000

In development

Pashto

Conversational

PAS_ASR001

967

Available

Portuguese (EU)

Conversational

PTP_ASR001

200

Available shortly

Romanian

Conversational

ROM_ASR001

200

Available shortly

Russian

Conversational

RUS_ASR001

200

Available

Somali

Conversational

SOM_ASR001

1,000

Available

Spanish (EU)

Voicemail

ESO_ASR002

500

Available

Turkish

Conversational

TUR_ASR001

200

Available

Urdu

Conversational

URD_ASR001

1,000

Available

1.2 Wideband

Language

Database Type

Catalogue Code

Speakers

Status

English (US)

Studio

USE_ASR001

200

Available

French (Canadian)

Home/ Office

FRC_ASR002

120

Available

German

Studio

DEU_ASR001

127

Available

Thai

Home/Office

THA_ASR001

100

Available

Korean

Home/Office

KOR_ASR001

100

Available

2. Pronunciation Lexica

Appen Butler Hill has considerable experience in providing a variety of lexicon types. These include:

Pronunciation Lexica providing phonemic representation, syllabification, and stress (primary and secondary as appropriate)

Part-of-speech tagged Lexica providing grammatical and semantic labels

Other reference text based materials including spelling/mis-spelling lists, spell-check dictionar-ies, mappings of colloquial language to standard forms, orthographic normalization lists.

Over a period of 15 years, Appen Butler Hill has generated a significant volume of licensable material for a wide range of languages. For holdings information in a given language or to discuss any customized development efforts, please contact: sales@appenbutlerhill.com

3. Named Entity Corpora

Language

Catalogue Code

Words

Description

Arabic

ARB_NER001

500,000

These NER Corpora contain text material from a vari-ety of sources and are tagged for the following Named Entities: Person, Organization, Location, Na-tionality, Religion, Facility, Geo-Political Entity, Titles, Quantities

English

ENI_NER001

500,000

Farsi/Persian

FAR_NER001

500,000

Korean

KOR_NER001

500,000

Japanese

JPY_NER001

500,000

Russian

RUS_NER001

500,000

Mandarin

MAN_NER001

500,000

Urdu

URD_NER001

500,000

3. Named Entity Corpora

Language

Catalogue Code

Words

Description

Arabic

ARB_NER001

500,000

These NER Corpora contain text material from a vari-ety of sources and are tagged for the following Named Entities: Person, Organization, Location, Na-tionality, Religion, Facility, Geo-Political Entity, Titles, Quantities

English

ENI_NER001

500,000

Farsi/Persian

FAR_NER001

500,000

Korean

KOR_NER001

500,000

Japanese

JPY_NER001

500,000

Russian

RUS_NER001

500,000

Mandarin

MAN_NER001

500,000

Urdu

URD_NER001

500,000

4. Other Language Resources

Morphological Analyzers – Farsi/Persian & Urdu

Arabic Thesaurus

Language Analysis Documentation – multiple languages

 

For additional information on these resources, please contact: sales@appenbutlerhill.com

5. Customized Requests and Package Configurations

Appen Butler Hill is committed to providing a low risk, high quality, reliable solution and has worked in 130+ languages to-date supporting both large global corporations and Government organizations.

We would be glad to discuss to any customized requests or package configurations and prepare a cus-tomized proposal to meet your needs.

6. Contact Information

Prithivi Pradeep

Business Development Manager

ppradeep@appenbutlerhill.com

+61 2 9468 6370

Tom Dibert

Vice President, Business Development, North America

tdibert@appenbutlerhill.com

+1-315-339-6165

                                                         www.appenbutlerhill.com

Back  Top

5-2-5OFROM 1er corpus de français de Suisse romande
Nous souhaiterions vous signaler la mise en ligne d'OFROM, premier corpus de français parlé en Suisse romande. L'archive est, dans version actuelle, d'une durée d'environ 15 heures. Elle est transcrite en orthographe standard dans le logiciel Praat. Un concordancier permet d'y effectuer des recherches, et de télécharger les extraits sonores associés aux transcriptions. 
 
Pour accéder aux données et consulter une description plus complète du corpus, nous vous invitons à vous rendre à l'adresse suivante : http://www.unine.ch/ofrom
Back  Top

5-3 Software
5-3-1Matlab toolbox for glottal analysis

I am pleased to announce you that we made a Matlab toolbox for glottal analysis now available on the web at:

 

http://tcts.fpms.ac.be/~drugman/Toolbox/

 

This toolbox includes the following modules:

 

- Pitch and voiced-unvoiced decision estimation

- Speech polarity detection

- Glottal Closure Instant determination

- Glottal flow estimation

 

By the way, I am also glad to send you my PhD thesis entitled “Glottal Analysis and its Applications”:

http://tcts.fpms.ac.be/~drugman/files/DrugmanPhDThesis.pdf

 

where you will find applications in speech synthesis, speaker recognition, voice pathology detection, and expressive speech analysis.

 

Hoping that this might be useful to you, and to see you soon,

 

Thomas Drugman

Back  Top

5-3-2ROCme!: a free tool for audio corpora recording and management

ROCme!: nouveau logiciel gratuit pour l'enregistrement et la gestion de corpus audio.

Le logiciel ROCme! permet une gestion rationalisée, autonome et dématérialisée de l’enregistrement de corpus lus.

Caractéristiques clés :
- gratuit
- compatible Windows et Mac
- interface paramétrable pour le recueil de métadonnées sur les locuteurs
- le locuteur fait défiler les phrases à l'écran et les enregistre de façon autonome
- format audio paramétrable

Téléchargeable à cette adresse :
www.ddl.ish-lyon.cnrs.fr/rocme

 
Back  Top

5-3-3VocalTractLab 2.0 : A tool for articulatory speech synthesis

VocalTractLab 2.0 : A tool for articulatory speech synthesis

It is my pleasure to announce the release of the new major version 2.0 of VocalTractLab. VocalTractLab is an articulatory speech synthesizer and a tool to visualize and explore the mechanism of speech production with regard to articulation, acoustics, and control. It is available from http://www.vocaltractlab.de/index.php?page=vocaltractlab-download .
Compared to version 1.0, the new version brings many improvements in terms of the implemented models of the vocal tract, the vocal folds, the acoustic simulation, and articulatory control, as well as in terms of the user interface. Most importantly, the new version comes together with a manual.

If you like, give it a try. Reports on bugs and any other feedback are welcome.

Peter Birkholz

Back  Top



 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA