ISCA - International Speech
Communication Association


ISCApad Archive  »  2019  »  ISCApad #258  »  Resources

ISCApad #258

Tuesday, December 10, 2019 by Chris Wellekens

5 Resources
5-1 Books
5-1-1Bäckström, Tom (with Guillaume Fuchs, Sascha Disch, Christian Uhle and Jeremie Lecomte), 'Speech Coding with Code-Excited Linear Prediction', Springer


 Speech Coding with Code-Excited Linear Prediction

Author: Bäckström, Tom

Invited chapters from: Guillaume Fuchs, Sascha Disch, Christian Uhle and Jeremie Lecomte

Publisher: Springer

http://www.springer.com/gp/book/9783319502021

Back  Top

5-1-2Shinji Watanabe, Marc Delcroix, Florian Metze, John R. Hershey (Eds), 'New Era for Robust Seech Recognition', Springer.

Shinji Watanabe, Marc Delcroix, Florian Metze, John R. Hershey (Eds), 'New Era for Robust Seech Recognition', Springer.

https://link.springer.com/book/10.1007%2F978-3-319-64680-0

Back  Top

5-1-3Fabrice Marsac, Rudolph Sock, CONSÉCUTIVITÉ ET SIMULTANÉITÉ en Linguistique, Langues et Parole, L'Harmattan,France

Nous avons le plaisir de vous annoncer la parution du volume thématique « CONSÉCUTIVITÉ ET SIMULTANÉITÉ en Linguistique, Langues et Parole » dans la Collection Dixit Grammatica (L’Harmattan, France) :
 
- CONSÉCUTIVITÉ ET SIMULTANÉITÉ en Linguistique, Langues et Parole – 1. Phonétique, Phonologie (Sous la direction de Camille Fauth, Jean-Paul Meyer, Fabrice Marsac & Rudolph Sock) • ISBN : 978-2-343-14277-7 • 5 mars 2018 • 172 pages http://www.editionsharmattan.fr/index.asp?navig=catalogue&obj=livre&no=59200&razSqlClone=1
 
- CONSÉCUTIVITÉ ET SIMULTANÉITÉ en Linguistique, Langues et Parole – 2. Syntaxe, Sémantique (Sous la direction de Angelina Aleksandrova, Céline Benninger, Anne Theissen, Fabrice Marsac & Jean-Paul Meyer) • ISBN : 978-2-343-14278-4 • 5 mars 2018 • 300 pages http://www.editionsharmattan.fr/index.asp?navig=catalogue&obj=livre&no=59201&razSqlClone=1
 
- CONSÉCUTIVITÉ ET SIMULTANÉITÉ en Linguistique, Langues et Parole – 3. Didactique, Traductologie-Interprétation (Sous la direction de Jean-Paul Meyer, Mária Pal'ová & Fabrice Marsac) • ISBN : 978-2-343-14279-1 • 5 mars 2018 • 200 pages http://www.editionsharmattan.fr/index.asp?navig=catalogue&obj=livre&no=59202&razSqlClone=1
 
Cet ouvrage collectif, qui comprend trois tomes complémentaires, rassemble des études constituant les traces écrites de communications prononcées lors du colloque international éponyme s’étant tenu à l’Université de Strasbourg (France) en juillet 2015. Les tomes renferment des travaux originaux et novateurs traitant de la dynamicité complexe du couple consécutivité-simultanéité saisi dans le domaine des Sciences du Langage. Le contenu, délibérément interdisciplinaire, concerne non seulement l’ensemble des disciplines relatives aux Sciences du langage mais aussi d’autres disciplines scientifiques, connexes mais préoccupées par des problématiques résolument linguistiques. Les éditeurs de ce volume thématique espèrent que les divers points de vue linguistiques ainsi adoptés livreront aux lecteurs un état des connaissances actualisé relativement aux différentes problématiques traitées. Il va sans dire, par ailleurs, que les auteurs comme les éditeurs apprécieront tout retour constructif de la part des lecteurs.
 
 
Fabrice Marsac et Rudolph Sock Directeurs de Dixit Grammatica


 

Back  Top

5-1-4Emmanuel Vincent (Editor), Tuomas Virtanen (Editor), Sharon Gannot (Editor), 'Audio Source Separation and Speech Enhancement', Wiley

 Emmanuel Vincent (Editor), Tuomas Virtanen (Editor), Sharon Gannot (Editor),

Audio Source Separation and Speech Enhancement:


https://www.wiley.com/en-us/Audio+Source+Separation+and+Speech+Enhancement-p-9781119279891

ISBN: 978-1-119-27989-1

October 2018

504 pages



This 500-page book provides a unifying view of source separation and enhancement,
including but not limited to array processing, matrix factorization, and deep learning
based methods, and speech and music applications, with consistent notation and
terminology across all chapters.

Back  Top

5-1-5Jen-Tzung Chien, 'Source Separation and Machine Learning', Academic Press

Jen-Tzung Chien, 'Source Separation and Machine Learning', Academic Press

Source Separation and Machine Learning presents the fundamentals in adaptive learning
algorithms for Blind Source Separation (BSS) and emphasizes the importance of machine
learning perspectives. It illustrates how BSS problems are tackled through adaptive
learning algorithms and model-based approaches using the latest information on mixture
signals to build a BSS model that is seen as a statistical model for a whole system.
Looking at different models, including independent component analysis (ICA), nonnegative
matrix factorization (NMF), nonnegative tensor factorization (NTF), and deep neural
network (DNN), the book addresses how they have evolved to deal with multichannel and
singlechannel source separation.

Key features:
? Emphasizes the modern model-based Blind Source Separation (BSS) which closely connects
the latest research topics of BSS and Machine Learning
? Includes coverage of Bayesian learning, sparse learning, online learning,
discriminative learning and deep learning
? Presents a number of case studies of model-based BSS, using a variety of learning
algorithms that provide solutions for the construction of BSS systems

https://www.elsevier.com/books/source-separation-and-machine-learning/chien/978-0-12-804566-4

Back  Top

5-1-6Ingo Feldhausen, « Methods in prosody: A Romance language perspective », Language Science Press (open access)

Nous sommes heureux de vous annoncer la parution d'un recueil validé par un comité de lecture et consacré aux méthodes de recherche en prosodie. Cet ouvrage est intitulé « Methods in prosody: A Romance language perspective ».

Il est publié par Language Science Press, une maison d’édition open access. Le livre peut-être téléchargé gratuitement en cliquant sur le lien suivant :

http://langsci-press.org/catalog/book/183

La table des matières est la suivante :

---------------------------------------------------------------------------------------------------------

Introduction
Ingo Feldhausen, Jan Fliessbach & Maria del Mar Vanrell                                                                   iii

Foreword
Pilar Prieto                                                                                                                                              vii

I Large corpora and spontaneous speech

1) Using large corpora and computational tools to describe prosody: An
exciting challenge for the future with some (important) pending problems to solve

Juan María Garrido Almiñana                                                                                                                  3

2) Intonation of pronominal subjects in Porteño Spanish: Analysis of 
spontaneous speech

Andrea Pešková                                                                                                                                     45

II Approaches to prosodic analysis

3) Multimodal analyses of audio-visual information: Some methods and
issues in prosody research

Barbara Gili Fivela                                                                                                                                 83

4) The realizational coefficient: Devising a method for empirically
determining prominent positions in Conchucos Quechua

Timo Buchholz & Uli Reich                                                                                                                 123

5) On the role of prosody in disambiguating wh-exclamatives and
wh-interrogatives in Cosenza Italian

Olga Kellert, Daniele Panizza & Caterina Petrone                                                                               165

III Elicitation methods

6) The Discourse Completion Task in Romance prosody research: Status
quo and outlook

Maria del Mar Vanrell, Ingo Feldhausen & Lluïsa Astruc                                                                    191

7) Describing the intonation of speech acts in Brazilian Portuguese:
Methodological aspects

João Antônio de Moraes & Albert Rilliard                                                                                           229

Indexes                                                                                                                                                  263

---------------------------------------------------------------------------------------------------------

N'hésitez pas à diffuser la parution de cet ouvrage auprès de vos collègues qui pourraient s'y intéresser.

Bien cordialement,

Ingo Feldhausen
(Co-coordinateur d'ouvrage)

Back  Top

5-1-7Nigel Ward, 'Prosodic Patterns in English Conversation', Cambridge University Press, 2019

Prosodic Patterns in English Conversation

Nigel G. Ward, Professor of Computer Science, University of Texas at El Paso

Cambridge University Press, 2019.

 

Spoken language is more than words: it includes the prosodic features and patterns that speakers use, subconsciously, to frame meanings and achieve interactional goals. Thanks to the application of simple processing techniques to spoken dialog corpora, this book goes beyond intonation to describe how pitch, timing, intensity and voicing properties combine to form meaningful temporal configurations: prosodic constructions. Combining new findings with hitherto-scattered observations from diverse research traditions, this book enumerates twenty of the principal prosodic constructions of English.  

 

http://www.cambridge.org/ward/

nigel@utep.edu    http://www.cs.utep.edu/nigel/   

Back  Top

5-1-8J.H.Esling, Scott R.Moisik, Allison Benner, Lise Crevier-Buchman, 'Voice Quality: the Laryngeal Articulator Model', Cambridge University Press

Voice Quality

The Laryngeal Articulator Model

Hardback 978-1-108-49842-5

John H. Esling, University of Victoria, British Columbia

Scott R. Moisik, Nanyang Technological University, Singapore

Allison Benner, University of Victoria, British Columbia

Lise Crevier-Buchman, Centre National de la Recherche Scientifique (CNRS), Paris



The first description of voice quality production in forty years, this book

provides a new framework for its study: The Laryngeal Articulator Model.

Informed by instrumental examinations of the laryngeal articulatory

mechanism, it revises our understanding of articulatory postures to explain

the actions, vibrations and resonances generated in the epilarynx and

pharynx. It focuses on the long-term auditory-articulatory component of

accent in the languages of the world, explaining how voice quality relates to

segmental and syllabic sounds. Phonetic illustrations of phonation types

and of laryngeal and oral vocal tract articulatory postures are provided.

Extensive video and audio material is available on a companion website.

The book presents computational simulations, the laryngeal and voice

quality foundations of infant speech acquisition, speech/voice disorders and

surgeries that entail compensatory laryngeal articulator adjustment, and an

exploration of the role of voice quality in sound change and of the larynx in

the evolution of speech.

 

1. Voice and voice quality; 2. Voice quality classification; 3. Instrumental case

studies and computational simulations of voice quality; 4. Linguistic, paralinguistic

and extralinguistic illustrations of voice quality; 5. Phonological implications of

voice quality theory; 6. Infant acquisition of speech and voice quality; 7. Clinical

illustrations of voice quality; 8. Laryngeal articulation and voice quality in sound

change, language ontogeny.

Back  Top

5-1-9Albert di Cristo,' Les langues naturelles'. HAL archive ouverte


Albert di Cristo, les langues naturelles.
Première partie : La structure informationnelle et ses déterminants
, 2019, 548 p. 

https://hal-amu.archives-ouvertes.fr/hal-02149640

Cet ouvrage constitue la première partie d?un vaste travail dédié à l?étude des façons dont les langues naturelles conditionnent l?information et au rôle que joue la prosodie dans l?expression de ces conditionnements. Cette première partie se propose d?analyser, sous ses divers aspects (principalement d'ordre épistémologiques), la notion de structure informationnelle, notamment dans ses relations avec la grammaire et d?examiner, dans le détail, les déterminants qui forment l?armature de cette structure. Dans cette perspective, les discussions portent, outre sur les notions de thème, de topique et de « given », sur celles de focus, de focalisation et de contraste, qui font l?objet d?analyses approfondies. Les discussions s?attachent à appréhender ces notions, à la fois dans l?optique de leurs propriétés formelles, de leur fonctionnalité et des significations qu?elles contribuent à délivrer. Un chapitre entier de cette première partie est consacré à l?étude du questionnement et à la manière dont l?organisation de l?information est gérée dans l?exercice de cette activité. L?ouvrage est doté d?une bibliographie qui comporte plus de deux mille références.

Cet ouvrage sera complété par une 2ème partie, en cours de rédaction, qui traitera essentiellement de la prosodie et de son rôle dans les conditionnements de l'information.
 

Back  Top

5-1-10Benjamin Weiss, 'Talker Quality in Human and Machine Interaction - Modeling the Listener’s Perspective in Passive and Interactive Scenarios'. T-Labs Series in Telecommunication Services. Springer Nature, Cham. (2020)

Benjamin Weiss (2020): 'Talker Quality in Human and Machine Interaction - Modeling the Listener’s Perspective in Passive and Interactive Scenarios'. T-Labs Series in Telecommunication Services. Springer Nature, Cham.

In this book, the background, state of research, and own contributions to the assessment and prediction of talker quality that is constituted in voice perception and in dialog are presented. Starting from theories and empirical findings from human interaction, major results and approaches are transferred to the domain of human-computer interaction. The main subject of this book is to contribute to the evaluation of spoken interaction in both humans and between human and computer, and in particular to the quality subsequently attributed to the speaking system or person, based on the listening and interactive experience.

 

https://rd.springer.com/book/10.1007/978-3-030-22769-2

Back  Top

5-1-11W.F.Katz, P.F.Assman, 'The Routledge Handbook of Phonetics', Routledge.

 

The Routledge Handbook of Phonetics Edited by William F. Katz and Peter F. Assmann The Routledge Handbook of Phonetics provides a comprehensive and up-to-date compilation of research, history and techniques in phonetics. With contributions from 41 prominent authors from North America, Europe, Australia and Japan, and including over 130 figures to illustrate key points, this handbook covers all the most important areas in the field, including:

  • The history and scope of techniques used, including speech synthesis, vocal tract imaging techniques, and obtaining information on under-researched languages from language archives;
  • The physiological bases of speech and hearing, including auditory, articulatory, and neural explanations of hearing, speech, and language processes;
  • Theories and models of speech perception and production related to the processing of consonants, vowels, prosody, tone, and intonation;

 

 
Back  Top

5-2 Database
5-2-1Linguistic Data Consortium (LDC) update (November 2019 and December 2019)

 

In this newsletter: (November 2019)

Join LDC for Membership Year 2020
Spring 2020 Data Scholarship Program


New Publications:


DEFT English Committed Belief Annotation
CALLFRIEND American English-Non-Southern Dialect Second Edition
TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017
IARPA Babel Amharic Language Pack IARPA-babel307b-v1.0b

 

Join LDC for Membership Year 2020

Membership Year 2020 (MY2020) is open and discounts are available for those who keep their membership current and join early in the year. Now through March 2, 2020, current MY2019 members who renew their LDC membership before March 2 will receive a 10% discount off the membership fee. New or returning organizations will receive a 5% discount through March 2.


In addition to receiving new publications, current LDC members also enjoy the benefit of licensing older data at reduced costs from our Catalog of over 800 holdings. Current-year for-profit members may use most data for commercial applications.

Plans for MY2020 publications are in progress. Among the expected releases are:

  • Abstract Meaning Representation (AMR) Annotation Release 3.0: semantic treebank of over 59,000 English natural language sentences from broadcast conversations, newswire, weblogs and web discussion forums; updates the second version (LDC2017T10) with new annotations
  • TAC KBP: English sentiment slot filling, surprise slot filling, nugget detection and coreference, and event argument data in all languages (English, Chinese and Spanish)
  • DEFT Chinese ERE: Chinese discussion forum data annotated for entities, relations and events
  • LibriVox Spanish: 73 hours of Spanish audiobook read speech and transcripts
  • IARPA Babel Language Packs (telephone speech and transcripts): languages include Dhuluo, Javanese and Mongolian
  • HAVIC Med Training data: web video, metadata, and annotations for developing multimedia systems
  • RATS Speaker Identification: conversational telephone speech in Levantine Arabic, Pashto, Urdu, Farsi and Dari on degraded audio signals with annotation of speech segments for speaker identification
  • BOLT: discussion forums, SMS/chat, conversational telephone speech, word-aligned, tagged and co-reference data in all languages (Chinese, Egyptian Arabic, and English)

It’s also not too late to join for MY2018 (through December 31, 2019) and MY2019 (through December 31, 2020). Data sets from those years include Concretely Annotated New York Times and English Gigaword, DIRHA English WSJ Audio, BOLT English Treebank – Discussion Forum, First DIHARD Challenge Development and Evaluation releases, Penn Discourse Treebank Version 3.0, and 2016 NIST Speaker Recognition Evaluation Test Set.

For full descriptions of all LDC data sets, browse our Catalog.  

Visit Join LDC for details on membership, user accounts and payment.

Spring 2020 Data Scholarship Program
Applications are now being accepted through January 15, 2020 for the Spring 2020 LDC Data Scholarship program which provides university students with no-cost access to LDC data. Consult the LDC Data Scholarship page for more information about program rules and submission requirements.

New publications:

 

(1) DEFT English Committed Belief Annotation was developed by LDC and consists of approximately 950,000 words of English discussion forum text annotated for 'committed belief,' which marks the level of commitment displayed by the author to the truth of the propositions expressed in the text.

DARPA's Deep Exploration and Filtering of Text (DEFT) program aimed to address remaining capability gaps in state-of-the-art natural language processing technologies related to inference, causal relationships, and anomaly detection. LDC supported the DEFT program by collecting, creating, and annotating a variety of data sources.

DEFT English Committed Belief Annotation is distributed via web download.

2019 Subscription Members will automatically receive copies of this corpus. 2019 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for $2000.

*

(2) CALLFRIEND American English-Non-Southern Dialect Second Edition was developed by LDC and consists of approximately 26 hours of unscripted telephone conversations between native speakers of non-Southern dialects of American English. This second edition updates the audio files to wav format, simplifies the directory structure, and adds documentation and metadata. The first edition is available as CALLFRIEND American English-Non-Southern Dialect (LDC96S46).

All data was collected before July 1997. Participants could speak with a person of their choice on any topic; most called family members and friends. All calls originated in North America. The recorded conversations last up to 30 minutes.

CALLFRIEND American English-Non-Southern Dialect Second Edition is distributed via web download.

2019 Subscription Members will automatically receive copies of this corpus. 2019 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for $1000.

*

(3) TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017 was developed by LDC and contains Chinese, English, and Spanish data produced in support of the TAC KBP Cold Start evaluation track conducted from 2012 to 2017. This corpus includes source documents, queries, assessments, manual runs, and final assessments.

In the Cold Start track, systems were evaluated on their ability to construct a new knowledge base (KB) from information provided in a text collection in combination with technologies developed in other TAC KBP tracks -- slot filling, information extraction, question answering, and entity discovery and linking. Cold Start systems were required to find all entities in the text, and the KB must have ideally included every person, organization, and geo-political entity as well as all the targeted relations between them. To facilitate the evaluation of those KBs, LDC annotators created sets of queries, human-generated responses to the queries, and assessments of both human and system responses.

The source data in this release is comprised of English and Spanish newswire and web text collected by LDC for the 2012, 2014, and 2015 evaluations, and the 2016 pilot collection. The source collections for the 2016 and 2017 evaluations, which include Chinese data, are available in TAC KBP Evaluation Source Corpora 2016-2017 (LDC2019T12). The archived 2013 Cold Start source data collection is available from NIST upon request.

TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017 is distributed via web download.

2019 Subscription Members will automatically receive copies of this corpus. 2019 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for $1000.

*

(4) IARPA Babel Amharic Language Pack IARPA-babel307b-v1.0b was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 204 hours of Amharic conversational and scripted telephone speech collected in 2014 along with corresponding transcripts.

The Amharic speech in this release represents the Addis Ababa, Shewa, and Gondar dialect regions of Ethiopia. The gender distribution among speakers is approximately equal; speakers' ages range from 16 years to 60 years. Calls were made using different telephones (e.g., mobile, landline) from a variety of environments including the street, a home or office, a public place, and inside a vehicle.

IARPA Babel Amharic Language Pack IARPA-babel307b-v1.0b is distributed via web download.

2019 Subscription Members will automatically receive copies of this corpus. 2019 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for $25.


Membership Office

Linguistic Data Consortium

University of Pennsylvania

T: +1-215-573-1275

E: ldc@ldc.upenn.edu

M: 3600 Market St. Suite 810

      Philadelphia, PA 19104

 

 

In this newsletter: (December 2019)
LDC Membership Discounts for MY2020 Still Available
Spring 2020 Data Scholarship Program – deadline approaching
Introducing LanguageArc: A Citizen Linguist Portal

New Publications:
Magic Data Chinese Mandarin Conversational Speech
BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training
TAC KBP Entity Discovery and Linking - Comprehensive Evaluation Data 2016-2017



LDC Membership Discounts for MY2020 Still Available
Join LDC while membership savings are still available. Now through March 2, 2020, current MY2019 members who renew their LDC membership receive a 10% discount off the membership fee. New or returning member organizations receive a 5% discount through March 2. Membership remains the most economical way to access LDC releases. Visit Join LDC for details on membership options and benefits.


Spring 2020 Data Scholarship Program – deadline approaching
Students can apply for the Spring 2020 Data Scholarship Program now through January 15, 2020. The LDC Data Scholarship program provides students with no-cost access to LDC data. For more information on application requirements and program rules, please visit LDC Data Scholarships


Introducing LanguageArc: A Citizen Linguist Portal
LanguageARC is a citizen science website for languages developed with a grant from the National Science Foundation (no. 170377). Contributors to this online community – “citizen linguists” – participate in a variety of tasks and activities that support linguistic research, such as identifying accents from audio clips, recording “tongue twisters,” and translating English sentences into other languages. Data collected from LanguageArc will be made freely available to the research community. New collection and annotation projects will be added on an ongoing basis, and researchers will soon be able to create their own LanugageArc projects with an easy-to-use Project Builder Toolkit.  All are encouraged to explore the site and participate in the community. Comments, questions and suggestions are welcome via the site’s Contact page.



New publications:

(1) Magic Data Chinese Mandarin Conversational Speech was developed by Beijing Magic Data Technology Co., Ltd. and consists of approximately 10 hours of Mandarin conversational speech from 60 speakers. Each conversation was recorded on multiple devices and is presented in multiple forms, resulting in a total of approximately 60 hours of audio with corresponding transcripts.


All participants were native speakers of Mandarin in Mainland China from accent regions across the country. Speakers were paired for conversations on a range of topics, including travel, fitness, games, sports, and pets. Metadata such as topic, collection date, mobile device, and speaker demographic information is available in the documentation accompanying this release.

Magic Data Chinese Mandarin Conversational Speech is distributed via web download.

2019 Subscription Members will automatically receive copies of this corpus. 2019 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for $750.

*

(2) BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training was developed by LDC and consists of 349,414 words of Egyptian Arabic and English parallel text enhanced with linguistic tags to indicate word relations.

This release contains Egyptian Arabic source text message and chat conversations collected using two methods: new collection via LDC's collection platform, and donation of SMS or chat archives from BOLT collection participants. The source data is released as BOLT Egyptian Arabic SMS/Chat and Transliteration (LDC2017T07).

The BOLT word alignment task was built on treebank annotation. Egyptian Arabic source tree tokens were automatically extracted from tree files in LDC’s BOLT Egyptian Arabic Treebank, which had been tagged for part-of-speech and syntactically annotated. That data was then aligned and annotated for the word alignment task.

BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training is distributed via web download.

2019 Subscription Members will automatically receive copies of this corpus. 2019 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for $1750.

*

(3) TAC KBP Entity Discovery and Linking - Comprehensive Evaluation Data 2016-2017 was developed by LDC and contains training and evaluation data produced in support of the TAC KBP Entity Discovery and Linking (EDL) tasks in 2016 and 2017. This corpus includes queries, knowledge base (KB) links, equivalence class clusters for NIL entities, and entity type information for each of the queries. The EDL reference KB, to which EDL data are linked, is available separately in TAC KBP Entity Discovery and Linking - Comprehensive Training and Evaluation Data 2014-2015 (LDC2019T02).

The goal of the EDL track is to conduct end-to-end entity extraction, linking and clustering. For producing gold standard data, given a document collection, annotators (1) extract (identify and classify) entity mentions (queries), link them to nodes in a reference KB and (2) perform cross-document co-reference on within-document entity clusters that cannot be linked to the KB.

Source data for the annotations consists of Chinese, English and Spanish newswire and discussion forum text collected by LDC and is available in TAC KBP Evaluation Source Corpora 2016-2017 (LDC2019T12).

TAC KBP Entity Discovery and Linking - Comprehensive Evaluation Data 2016-2017 is distributed via web download.

2019 Subscription Members will automatically receive copies of this corpus. 2019 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for $1500.

*

 

Membership Office

Linguistic Data Consortium

University of Pennsylvania

T: +1-215-573-1275

E: ldc@ldc.upenn.edu

M: 3600 Market St. Suite 810

      Philadelphia, PA 19104

 


 

 

 

 

 

 

 

 

 

 

 

 

Back  Top

5-2-2ELRA - Language Resources Catalogue - Update (October 2019)
In the framework of a distribution agreement between ELRA and the CJK Dictionary Institute, Inc., ELRA is happy to announce the distribution of 29 Monolingual Lexicons and 20 Multilingual Lexicons, suitable for a large variety of natural language processing applications. Monolingual Lexicons are available for Arabic, Cantonese, Simplified and Traditional Chinese, Japanese, Korean, Persian and Spanish and Multilingual lexicons include those languages as well as some other European languages (English, German, French, Italian, Portuguese and Russian) and Asian languages (Vietnamese, Indonesian, Thai). Possible applications include information retrieval, morphological analysis, word segmentation, named entity recognition, machine translation, etc. All lexicons are made available in tab-delimited, UTF-8 encoded text files.

The following list of lexicons is available:

1) Monolingual Lexicons:

Cantonese Readings Database, ELRA ID: ELRA-L0101, ISLRN: 634-690-317-631-5
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0101
Chinese Phonological Database, ELRA ID: ELRA-L0102, ISLRN: 968-547-869-011-3
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0102
Simplified to Traditional Chinese Conversion, ELRA ID: ELRA-L0103, ISLRN: 151-342-562-705-1
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0103
Hanzi Pinyin Database for Simplified Chinese, ELRA ID: ELRA-L0104, ISLRN: 292-895-602-975-4
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0104
Database of Chinese Name Variants, ELRA ID: ELRA-L0105, ISLRN: 379-237-021-386-4
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0105
Database of Chinese Full Names, ELRA ID: ELRA-L0106, ISLRN: 356-835-468-182-0
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0106
Chinese Lexical Database, ELRA ID: ELRA-L0107, ISLRN: 500-068-723-953-8
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0107
Chinese Morphological Database, ELRA ID: ELRA-L0108, ISLRN: 279-636-746-963-2
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0108
Comprehensive Wordlist of Simplified Chinese, ELRA ID: ELRA-L0109, ISLRN: 159-767-888-341-3
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0109
Comprehensive Word List of Traditional Chinese, ELRA ID: ELRA-L0110, ISLRN: 378-715-589-213-1
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0110
Japanese Phonological Database, ELRA ID: ELRA-L0111, ISLRN: 169-903-096-259-9
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0111
Japanese Lexical Database, ELRA ID: ELRA-L0112, ISLRN: 162-212-767-492-8
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0112
Japanese Morphological Database, ELRA ID: ELRA-L0113, ISLRN: 212-935-180-069-7
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0113
Japanese Orthographical Database, ELRA ID: ELRA-L0114, ISLRN: 261-356-756-593-8
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0114
Japanese Companies and Organizations, ELRA ID: ELRA-L0115, ISLRN: 570-674-242-221-3
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0115
Database of Japanese Name Variants, ELRA ID: ELRA-L0116, ISLRN: 850-674-726-461-2
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0116
Comprehensive Word List of Japanese, ELRA ID: ELRA-L0117, ISLRN: 145-375-006-102-6
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0117
Korean Lexical Database, ELRA ID: ELRA-L0118, ISLRN: 702-121-344-159-1
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0118
Comprehensive Word List of Korean, ELRA ID: ELRA-L0119, ISLRN: 652-932-407-045-1
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0119
Arabic Full Form Lexicon, ELRA ID: ELRA-L0120, ISLRN: 968-827-909-119-8
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0120
Database of Arabic Plurals, ELRA ID: ELRA-L0121, ISLRN: 414-072-749-098-5
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0121
Database of Arab Names, ELRA ID: ELRA-L0122, ISLRN: 998-153-793-831-3
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0122
Database of Arab Names in Arabic, ELRA ID: ELRA-L0123, ISLRN: 126-981-976-765-2
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0123
Database of Foreign Names in Arabic, ELRA ID: ELRA-L0124, ISLRN: 130-493-475-689-4
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0124
Database of Arabic Place Names, ELRA ID: ELRA-L0125, ISLRN: 916-541-123-321-8
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0125
Comprehensive Database of Chinese Personal Names, ELRA ID: ELRA-L0126, ISLRN: 797-857-604-135-5
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0126
Database of Persian Names, ELRA ID: ELRA-L0127, ISLRN: 739-878-734-567-6
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0127
Spanish Full-form Lexicon (Monolingual), ELRA ID: ELRA-L0128, ISLRN: 866-578-477-474-1
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0128
Database of Chinese Names, ELRA ID: ELRA-L0129, ISLRN: 792-499-131-789-4
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0129
 
2) Bilingual/Multilingal Lexicons:

Simplified Chinese?English Technical Terms, ELRA ID: ELRA-M0053, ISLRN: 418-191-947-016-4
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0053
Simplified Chinese-to-English Dictionary, ELRA ID: ELRA-M0054, ISLRN: 694-156-385-534-4
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0054
English-to-Simplified Chinese Dictionary, ELRA ID: ELRA-M0055, ISLRN: 407-348-028-638-3
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0055
Chinese-English Database of Proverbs and Idioms (Chengyu), ELRA ID: ELRA-M0056, ISLRN: 506-728-933-717-0
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0056
Chinese-Japanese Technical Terms Dictionary, ELRA ID: ELRA-M0057, ISLRN: 079-503-057-574-0
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0057
Chinese-English Database of Proper Nouns, ELRA ID: ELRA-M0058, ISLRN: 638-295-493-483-2
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0058
Chinese-Japanese Database of Proper Nouns, ELRA ID: ELRA-M0059, ISLRN: 951-838-928-664-9
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0059
Spanish Full-form Lexicon (Bilingual), ELRA ID: ELRA-M0060, ISLRN: 942-238-032-826-7
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0060
Japanese ? English Dictionary, ELRA ID: ELRA-M0061, ISLRN: 854-879-959-652-9
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0061
English ? Japanese Dictionary, ELRA ID: ELRA-M0062, ISLRN: 233-968-157-290-2
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0062
Multilingual Database of Japanese Points-of-Interest 1, ELRA ID: ELRA-M0063, ISLRN: 902-666-654-661-3
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0063
Multilingual Database of Japanese Points-of-Interest 2, ELRA ID: ELRA-M0064, ISLRN: 268-160-514-957-3
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0064
Japanese ? English Database of Proper Nouns, ELRA ID: ELRA-M0065, ISLRN: 104-268-721-502-8
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0065
Japanese - English Dictionary of Technical Terms, ELRA ID: ELRA-M0066, ISLRN: 499-497-806-398-9
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0066
Korean-Japanese Dictionary of Technical Terms, ELRA ID: ELRA-M0067, ISLRN: 584-164-296-035-1
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0067
Korean-English Database of Proper Nouns, ELRA ID: ELRA-M0068, ISLRN: 408-409-094-493-9
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0068
Korean-Japanese Database of Proper Nouns, ELRA ID: ELRA-M0069, ISLRN: 265-620-933-123-5
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0069
Korean-Chinese Database of Proper Nouns, ELRA ID: ELRA-M0070, ISLRN: 207-127-841-003-9
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0070
Comprehensive Word Lists for Chinese, Japanese, Korean and Arabic, ELRA ID: ELRA-M0071, ISLRN: 476-146-877-598-3
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0071
Multilingual Proper Noun Database, ELRA ID: ELRA-M0072, ISLRN: 340-315-642-771-9
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0072
 
About CJK Dictionary Institute, Inc.
The CJK Dictionary Institute, Inc. (CJKI) specializes in CJK lexicography.  The principal activity of CJKI is the development and continuous expansion of lexical databases of general vocabulary, proper nouns and technical terms for CJK languages (Chinese, Japanese, Korean), including Chinese dialects such as Cantonese and Hakka, containing millions of entries. CJKI also developed databases and romanization systems of Arabic proper nouns, a comprehensive Spanish-English dictionary, a Chinese-Vietnamese names dictionary, and various others. In addition, CJKI offers a full range of professional consulting services on CJK linguistics and lexicography.
To find out more about ELRA, please visit the website: http://www.cjk.org/cjk/index.htm

About ELRA
The European Language Resources Association (ELRA) is a non-profit-making organisation founded by the European Commission in 1995, with the mission of providing a clearing house for Language Resources and promoting Human Language Technologies. Language Resources covering various fields of HLT (including Multimodal, Speech, Written, Terminology) and a great number of languages are available from the ELRA catalogue. ELRA's strong involvement in the fields of Language Resources  and Language Technologies is also emphasized at the LREC conference, organized every other year since 1998.
To find out more about ELRA, please visit the website: http://www.elra.info


For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli@elda.org

If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us.

Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/en/catalogues/language-resources-announcements








Back  Top

5-2-3Speechocean – update (August 2019)

 

English Speech Recognition Corpus - Speechocean

 

At present, Speechocean has produced more than 24,000 hours of English Speech Recognition Corpora, including some rare corpora recorded by kids. Those corpora were recorded by 23,000 speakers in total. Please check the form below:

 

Name

Speakers

Hours

American English

8,441

8,029

Indian English

2,394

3,540

British English

2,381

3,029

Australian English

1,286

1,954

Chinese (Mainland) English

3,478

1,513

Canadian English

1,607

1,309

Japanese English

1,005

902

Singapore English

404

710

Russian English

230

492

Romanian English

201

389

French English

225

378

Chinese (Hong Kong) English

200

378

Italian English

213

366

Portugal English

201

341

Spainish English

200

326

German English

196

306

Korean English

116

207

Indonesian English

402

126

 

 

If you have any further inquiries, please do not hesitate to contact us.

Web: en.speechocean.com

Email: marketing@speechocean.com

 

 

 

 

 

 


 


 

 

Back  Top

5-2-4Google 's Language Model benchmark
 Here is a brief description of the project.

'The purpose of the project is to make available a standard training and test setup for language modeling experiments.

The training/held-out data was produced from a download at statmt.org using a combination of Bash shell and Perl scripts distributed here.

This also means that your results on this data set are reproducible by the research community at large.

Besides the scripts needed to rebuild the training/held-out data, it also makes available log-probability values for each word in each of ten held-out data sets, for each of the following baseline models:

  • unpruned Katz (1.1B n-grams),
  • pruned Katz (~15M n-grams),
  • unpruned Interpolated Kneser-Ney (1.1B n-grams),
  • pruned Interpolated Kneser-Ney (~15M n-grams)

 

Happy benchmarking!'

Back  Top

5-2-5Forensic database of voice recordings of 500+ Australian English speakers

Forensic database of voice recordings of 500+ Australian English speakers

We are pleased to announce that the forensic database of voice recordings of 500+ Australian English speakers is now published.

The database was collected by the Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunications, University of New South Wales as part of the Australian Research Council funded Linkage Project on making demonstrably valid and reliable forensic voice comparison a practical everyday reality in Australia. The project was conducted in partnership with: Australian Federal Police,  New South Wales Police,  Queensland Police, National Institute of Forensic Sciences, Australasian Speech Sciences and Technology Association, Guardia Civil, Universidad Autónoma de Madrid.

The database includes multiple non-contemporaneous recordings of most speakers. Each speaker is recorded in three different speaking styles representative of some common styles found in forensic casework. Recordings are recorded under high-quality conditions and extraneous noises and crosstalk have been manually removed. The high-quality audio can be processed to reflect recording conditions found in forensic casework.

The database can be accessed at: http://databases.forensic-voice-comparison.net/

Back  Top

5-2-6Audio and Electroglottographic speech recordings

 

Audio and Electroglottographic speech recordings from several languages

We are happy to announce the public availability of speech recordings made as part of the UCLA project 'Production and Perception of Linguistic Voice Quality'.

http://www.phonetics.ucla.edu/voiceproject/voice.html

Audio and EGG recordings are available for Bo, Gujarati, Hmong, Mandarin, Black Miao, Southern Yi, Santiago Matatlan/ San Juan Guelavia Zapotec; audio recordings (no EGG) are available for English and Mandarin. Recordings of Jalapa Mazatec extracted from the UCLA Phonetic Archive are also posted. All recordings are accompanied by explanatory notes and wordlists, and most are accompanied by Praat textgrids that locate target segments of interest to our project.

Analysis software developed as part of the project – VoiceSauce for audio analysis and EggWorks for EGG analysis – and all project publications are also available from this site. All preliminary analyses of the recordings using these tools (i.e. acoustic and EGG parameter values extracted from the recordings) are posted on the site in large data spreadsheets.

All of these materials are made freely available under a Creative Commons Attribution-NonCommercial-ShareAlike-3.0 Unported License.

This project was funded by NSF grant BCS-0720304 to Pat Keating, Abeer Alwan and Jody Kreiman of UCLA, and Christina Esposito of Macalester College.

Pat Keating (UCLA)

Back  Top

5-2-7EEG-face tracking- audio 24 GB data set Kara One, Toronto, Canada

We are making 24 GB of a new dataset, called Kara One, freely available. This database combines 3 modalities (EEG, face tracking, and audio) during imagined and articulated speech using phonologically-relevant phonemic and single-word prompts. It is the result of a collaboration between the Toronto Rehabilitation Institute (in the University Health Network) and the Department of Computer Science at the University of Toronto.

 

In the associated paper (abstract below), we show how to accurately classify imagined phonological categories solely from EEG data. Specifically, we obtain up to 90% accuracy in classifying imagined consonants from imagined vowels and up to 95% accuracy in classifying stimulus from active imagination states using advanced deep-belief networks.

 

Data from 14 participants are available here: http://www.cs.toronto.edu/~complingweb/data/karaOne/karaOne.html.

 

If you have any questions, please contact Frank Rudzicz at frank@cs.toronto.edu.

 

Best regards,

Frank

 

 

PAPER Shunan Zhao and Frank Rudzicz (2015) Classifying phonological categories in imagined and articulated speech. In Proceedings of ICASSP 2015, Brisbane Australia

ABSTRACT This paper presents a new dataset combining 3 modalities (EEG, facial, and audio) during imagined and vocalized phonemic and single-word prompts. We pre-process the EEG data, compute features for all 3 modalities, and perform binary classi?cation of phonological categories using a combination of these modalities. For example, a deep-belief network obtains accuracies over 90% on identifying consonants, which is signi?cantly more accurate than two baseline supportvectormachines. Wealsoclassifybetweenthedifferent states (resting, stimuli, active thinking) of the recording, achievingaccuraciesof95%. Thesedatamaybeusedtolearn multimodal relationships, and to develop silent-speech and brain-computer interfaces.

 

Back  Top

5-2-8TORGO data base free for academic use.

In the spirit of the season, I would like to announce the immediate availability of the TORGO database free, in perpetuity for academic use. This database combines acoustics and electromagnetic articulography from 8 individuals with speech disorders and 7 without, and totals over 18 GB. These data can be used for multimodal models (e.g., for acoustic-articulatory inversion), models of pathology, and augmented speech recognition, for example. More information (and the database itself) can be found here: http://www.cs.toronto.edu/~complingweb/data/TORGO/torgo.html.

Back  Top

5-2-9Datatang

Datatang is a global leading data provider that specialized in data customized solution, focusing in variety speech, image, and text data collection, annotation, crowdsourcing services.

 

Summary of the new datasets (2018) and a brief plan for 2019.

 

 

 

? Speech data (with annotation) that we finished in 2018 

 

Language
Datasets Length
  ( Hours )
French
794
British English
800
Spanish
435
Italian
1,440
German
1,800
Spanish (Mexico/Colombia)
700
Brazilian Portuguese
1,000
European Portuguese
1,000
Russian
1,000

 

?2019 ongoing  speech project 

 

Type

Project Name

Europeans speak English

1000 Hours-Spanish Speak English

1000 Hours-French Speak English

1000 Hours-German Speak English

Call Center Speech

1000 Hours-Call Center Speech

off-the-shelf data expansion

1000 Hours-Chinese Speak English

1500 Hours-Mixed Chinese and English Speech Data

 

 

 

On top of the above,  there are more planed speech data collections, such as Japanese speech data, children`s speech data, dialect speech data and so on.  

 

What is more, we will continually provide those data at a competitive price with a maintained high accuracy rate.

 

 

 

If you have any questions or need more details, do not hesitate to contact us jessy@datatang.com 

 

It would be possible to send you with a sample or specification of the data.

 

 

 


Back  Top

5-2-10Fearless Steps Corpus (University of Texas, Dallas)

Fearless Steps Corpus

John H.L. Hansen, Abhijeet Sangwan, Lakshmish Kaushik, Chengzhu Yu Center for Robust Speech Systems (CRSS), Eric Jonsson School of Engineering, The University of Texas at Dallas (UTD), Richardson, Texas, U.S.A.


NASA’s Apollo program is a great achievement of mankind in the 20th century. CRSS, UT-Dallas has undertaken an enormous Apollo data digitization initiative where we proposed to digitize Apollo mission speech data (~100,000 hours) and develop Spoken Language Technology based algorithms to analyze and understand various aspects of conversational speech. Towards achieving this goal, a new 30 track analog audio decoder is designed to decode 30 track Apollo analog tapes and is mounted on to the NASA Soundscriber analog audio decoder (in place of single channel decoder). Using the new decoder all 30 channels of data can be decoded simultaneously thereby reducing the digitization time significantly. 
We have digitized 19,000 hours of data from Apollo missions (including entire Apollo-11, most of Apollo-13, Apollo-1, and Gemini-8 missions). This audio archive is named as “Fearless Steps Corpus”. This is one of the most unique and singularly large naturalistic audio corpus of such magnitude. Automated transcripts are generated by building Apollo mission specific custom Deep Neural Networks (DNN) based Automatic Speech Recognition (ASR) system along with Apollo mission specific language models. Speaker Identification System (SID) to identify the speakers are designed. A complete diarization pipeline is established to study and develop various SLT tasks. 
We will release this corpus for public usage as a part of public outreach and promote SLT community to utilize this opportunity to build naturalistic spoken language technology systems. The data provides ample opportunity setup challenging tasks in various SLT areas. As a part of this outreach we will be setting “Fearless Challenge” in the upcoming INTERSPEECH 2018. We will define and propose 5 tasks as a part of this challenge. The guidelines and challenge data will be released in the Spring 2018 and will be available for download for free. The five challenges are, (1) Automatic Speech Recognition (2) Speaker Identification (3) Speech Activity Detection (4) Speaker Diarization (5) Keyword spotting and Joint Topic/Sentiment detection.
Looking forward for your participation (John.Hansen@utdallas.edu) 

Back  Top

5-2-11SIWIS French Speech Synthesis Database
The SIWIS French Speech Synthesis Database includes high quality French speech recordings and associated text files, aimed at building TTS systems, investigate multiple styles, and emphasis. A total of 9750 utterances from various sources such as parliament debates and novels were uttered by a professional French voice talent. A subset of the database contains emphasised words in many different contexts. The database includes more than ten hours of speech data and is freely available.
 
Back  Top

5-2-12JLCorpus - Emotional Speech corpus with primary and secondary emotions
JLCorpus - Emotional Speech corpus with primary and secondary emotions:
 

For further understanding the wide array of emotions embedded in human speech, we are introducing an emotional speech corpus. In contrast to the existing speech corpora, this corpus was constructed by maintaining an equal distribution of 4 long vowels in New Zealand English. This balance is to facilitate emotion related formant and glottal source feature comparison studies. Also, the corpus has 5 secondary emotions along with 5 primary emotions. Secondary emotions are important in Human-Robot Interaction (HRI), where the aim is to model natural conversations among humans and robots. But there are very few existing speech resources to study these emotions,and this work adds a speech corpus containing some secondary emotions.

Please use the corpus for emotional speech related studies. When you use it please include the citation as:

Jesin James, Li Tian, Catherine Watson, 'An Open Source Emotional Speech Corpus for Human Robot Interaction Applications', in Proc. Interspeech, 2018.

To access the whole corpus including the recording supporting files, click the following link: https://www.kaggle.com/tli725/jl-corpus, (if you have already installed the Kaggle API, you can type the following command to download: kaggle datasets download -d tli725/jl-corpus)

Or if you simply want the raw audio+txt files, click the following link: https://www.kaggle.com/tli725/jl-corpus/downloads/Raw%20JL%20corpus%20(unchecked%20and%20unannotated).rar/4

The corpus was evaluated by a large scale human perception test with 120 participants. The link to the survey are here- For Primary emorion corpus: https://auckland.au1.qualtrics.com/jfe/form/SV_8ewmOCgOFCHpAj3

For Secondary emotion corpus: https://auckland.au1.qualtrics.com/jfe/form/SV_eVDINp8WkKpsPsh

These surveys will give an overall idea about the type of recordings in the corpus.

The perceptually verified and annotated JL corpus will be given public access soon.

Back  Top

5-2-13OPENGLOT –An open environment for the evaluation of glottal inverse filtering

OPENGLOT –An open environment for the evaluation of glottal inverse filtering

 

OPENGLOT is a publically available database that was designed primarily for the evaluation of glottal inverse filtering algorithms. In addition, the database can be used in evaluating formant estimation methods. OPENGLOT consists of four repositories. Repository I contains synthetic glottal flow waveforms, and speech signals generated by using the Liljencrants–Fant (LF) waveform as an excitation, and an all-pole vocal tract model. Repository II contains glottal flow and speech pressure signals generated using physical modelling of human speech production. Repository III contains pairs of glottal excitation and speech pressure signal generated by exciting 3D printed plastic vocal tract replica with LF excitations via a loudspeaker. Finally, Repository IV contains multichannel recordings (speech pressure signal, EGG, high-speed video of the vocal folds) from natural production of speech.

 

OPENGLOT is available at:

http://research.spa.aalto.fi/projects/openglot/

Back  Top

5-2-14Corpus Rhapsodie

Nous sommes heureux de vous annoncer la publication d¹un ouvrage consacré
au treebank Rhapsodie, un corpus de français parlé de 33 000 mots
finement annoté en prosodie et en syntaxe.

Accès à la publication : https://benjamins.com/catalog/scl.89 (voir flyer
ci-joint)

Accès au treebank : https://www.projet-rhapsodie.fr/
Les données librement accessibles sont diffusées sous licence Creative
Commons.
Le site donne également accès aux guides d¹annotations.

Back  Top

5-2-15The My Science Tutor Children?s Conversational Speech Corpus (MyST Corpus) , Boulder Learning Inc.

The My Science Tutor Children?s Conversational Speech Corpus (MyST Corpus) is the world?s largest English children?s speech corpus.  It is freely available to the research community for research use.  Companies can acquire the corpus for $10,000.  The MyST Corpus was collected over a 10-year period, with support from over $9 million in grants from the US National Science Foundation and Department of Education, awarded to Boulder Learning Inc. (Wayne Ward, Principal Investigator).

The MyST corpus contains speech collected from 1,374 third, fourth and fifth grade students.  The students engaged in spoken dialogs with a virtual science tutor in 8 areas of science.  A total of 11,398 student sessions of 15 to 20 minutes produced a total of 244,069 utterances.  42% of the utterances have been transcribed at the word level.  The corpus is partitioned into training and test sets to support comparison of research results across labs. All parents and students signed consent forms, approved by the University of Colorado?s Institutional Review Board,  that authorize distribution of the corpus for research and commercial use. 

The MyST children?s speech corpus contains approximately ten times as many spoken utterances as all other English children?s speech corpora combined (see https://en.wikipedia.org/wiki/List_of_children%27s_speech_corpora). 

Additional information about the corpus, and instructions for how to acquire the corpus (and samples of the speech data) can be found on the Boulder Learning Web site at http://boulderlearning.com/request-the-myst-corpus/.   

Back  Top

5-2-16HARVARD speech corpus - native British English speaker
  • HARVARD speech corpus - native British English speaker, digital re-recording
 
Back  Top

5-2-17Magic Data Technology Kid Voice TTS Corpus in Mandarin Chinese (November 2019)

Magic Data Technology Kid Voice TTS Corpus in Mandarin Chinese

 

Magic Data Technology is one of the leading artificial intelligence data service providers in the world. The company is committed to providing a wild range of customized data services in the fields of speech recognition, intelligent imaging and Natural Language Understanding.

 

This corpus was recorded by a four-year-old Chinese girl originally born in Beijing China. This time we published 15-minute speech data from the corpus for non-commercial use.

 

The contents and the corresponding descriptions of the corpus:

  • The corpus contains 15 minutes of speech data, which is recorded in NC-20 acoustic studio.

  • The speaker is 4 years old originally born in Beijing

  • Detail information such as speech data coding and speaker information is preserved in the metadata file.

  • This corpus is natural kid style.

  • Annotation includes four parts: pronunciation proofreading, prosody labeling, phone boundary labeling and POS Tagging.

  • The annotation accuracy is higher than 99%.

  • For phone labeling, the database contains the annotation not only on the boundary of phonemes, but also on the boundary of the silence parts.

 

The corpus aims to help researchers in the TTS fields. And it is part of a much bigger dataset (2.3 hours MAGICDATA Kid Voice TTS Corpus in Mandarin Chinese) which was recorded in the same environment. This is the first time to publish this voice!

 

Please note that this corpus has got the speaker and her parents’ authorization.

 

Samples are available.

Do not hesitate to contact us for any questions.

Website: http://www.imagicdatatech.com/index.php/home/dataopensource/data_info/id/360

E-mail: business@magicdatatech.com

Back  Top

5-3 Software
5-3-1Release of the version 2 of FASST (Flexible Audio Source Separation Toolbox).
Release of the version 2 of FASST (Flexible Audio Source Separation Toolbox). http://bass-db.gforge.inria.fr/fasst/ This toolbox is intended to speed up the conception and to automate the implementation of new model-based audio source separation algorithms. It has the following additions compared to version 1: * Core in C++ * User scripts in MATLAB or python * Speedup * Multichannel audio input We provide 2 examples: 1. two-channel instantaneous NMF 2. real-world speech enhancement (2nd CHiME Challenge, Track 1)
Back  Top

5-3-2Cantor Digitalis, an open-source real-time singing synthesizer controlled by hand gestures.

We are glad to announce the public realease of the Cantor Digitalis, an open-source real-time singing synthesizer controlled by hand gestures.


It can be used e.g. for making music or for singing voice pedagogy.

A wide variety of voices are available, from the classic vocal quartet (soprano, alto, tenor, bass), to the extreme colors of childish, breathy, roaring, etc. voices.  All the features of vocal sounds are entirely under control, as the synthesis method is based on a mathematic model of voice production, without prerecording segments.

The instrument is controlled using chironomy, i.e. hand gestures, with the help of interfaces like stylus or fingers on a graphic tablet, or computer mouse. Vocal dimensions such as the melody, vocal effort, vowel, voice tension, vocal tract size, breathiness etc. can easily and continuously be controlled during performance, and special voices can be prepared in advance or using presets.

Check out the capabilities of Cantor Digitalis, through performances extracts from the ensemble Chorus Digitalis:
http://youtu.be/_LTjM3Lihis?t=13s.

In pratice, this release provides:
  • the synthesizer application
  • the source code in the form of a Max package (GPL-like license)
  • a documentation for the musician and another for the developper
What do you need ?
  • a Mac OSX
  • ideally a Wacom graphic tablet, but it also works with your computer mouse
  • for the developers, the Max software
Interested ?
  • To download the Cantor Digitalis, click here
  • To subscribe to the Cantor Digitalisnewsletter and/or the forum list, or to contact the developers, click here
  • To learn about the Chorus Digitalis, ensemble of Cantor Digitalisand watch videos of performances, click here
  • For more details about the Cantor Digitalis, click here
 
Regards,
 
The Cantor Digitalis team (who loves feedback — cantordigitalis@limsi.fr)
Christophe d'Alessandro, Lionel Feugère, Olivier Perrotin
http://cantordigitalis.limsi.fr/
Back  Top

5-3-3MultiVec: a Multilingual and MultiLevel Representation Learning Toolkit for NLP

 

We are happy to announce the release of our new toolkit “MultiVec” for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes Mikolov et al. [2013b]’s word2vec features, Le and Mikolov [2014]’s paragraph vector (batch and online) and Luong et al. [2015]’s model for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification. The toolkit also includes C++ and Python libraries, that you can use to query bilingual and monolingual models.

 

The project is fully open to future contributions. The code is provided on the project webpage (https://github.com/eske/multivec) with installation instructions and command-line usage examples.

 

When you use this toolkit, please cite:

 

@InProceedings{MultiVecLREC2016,

Title                    = {{MultiVec: a Multilingual and MultiLevel Representation Learning Toolkit for NLP}},

Author                   = {Alexandre Bérard and Christophe Servan and Olivier Pietquin and Laurent Besacier},

Booktitle                = {The 10th edition of the Language Resources and Evaluation Conference (LREC 2016)},

Year                     = {2016},

Month                    = {May}

}

 

The paper is available here: https://github.com/eske/multivec/raw/master/docs/Berard_and_al-MultiVec_a_Multilingual_and_Multilevel_Representation_Learning_Toolkit_for_NLP-LREC2016.pdf

 

Best regards,

 

Alexandre Bérard, Christophe Servan, Olivier Pietquin and Laurent Besacier

Back  Top

5-3-4An android application for speech data collection LIG_AIKUMA
We are pleased to announce the release of LIG_AIKUMA, an android application for speech data collection, specially dedicated to language documentation. LIG_AIKUMA is an improved version of the Android application (AIKUMA) initially developed by Steven Bird and colleagues. Features were added to the app in order to facilitate the collection of parallel speech data in line with the requirements of a French-German project (ANR/DFG BULB - Breaking the Unwritten Language Barrier). 
 
The resulting app, called LIG-AIKUMA, runs on various mobile phones and tablets and proposes a range of different speech collection modes (recording, respeaking, translation and elicitation). It was used for field data collections in Congo-Brazzaville resulting in a total of over 80 hours of speech.
 
Users who just want to use the app without access to the code can download it directly from the forge direct link: https://forge.imag.fr/frs/download.php/706/MainActivity.apk 
Code is also available on demand (contact elodie.gauthier@imag.fr and laurent.besacier@imag.fr).
 
More details on LIG_AIKUMA can be found on the following paper: http://www.sciencedirect.com/science/article/pii/S1877050916300448
Back  Top

5-3-5Web services via ALL GO from IRISA-CNRS

It is our pleasure to introduce A||GO (https://allgo.inria.fr/ or http://allgo.irisa.fr/), a platform providing a collection of web-services for the automatic analysis of various data, including multimedia content across modalities. The platform builds on the back-end web service deployment infrastructure developed and maintained by  Inria?s  Service for Experimentation and Development (SED). Originally dedicated to multimedia content, A||GO progressively broadened to other fields such as computational biology, networks and telecommunications, computational graphics or computational physics.

As part of the CNRS PlaSciDo initiative [1], the Linkmedia team at IRISA / Inria Rennes is making available via A||GO a number of web services devoted to multimedia content analysis across modalities (language, audio, image, video). The web services provided currently include research results from the Linkmedia team as well as contribution from a number of partners. A list of the services available by the date is given below and the current state is available at
https://www-linkmedia.irisa.fr/software along with demo videos. Most web services are interoperable, facilitating the implementation of a multimedia content analysis processing chain, and are free to use for trial, prototyping or lab work. A brief and free account creation step will allow you to execute the web-services using either the graphical interface or a command line via a dedicated API.

We expect the number of web services to grow over time and invite interested parties to contact us should they wish to contribute the multimedia web service offer of A||GO.

List of multimedia content analysis tools currently available on A||GO:
- Audio Processing
        SaMuSa: music/speech segmentation
        SilAD: silence detection
        Radi.sh: repeated audio motif discovery
        LORIA STS v2: speech transcription for the French language from LORIA
        Multi channel BSS locate: audio source localization toolbox from IRISA-PANAMA
        A-spade: audio declipper from IRISA-PANAMA
        Transvox: voice faker from LORIA
- Natural Language Processing
        NERO: name entity recognition
        TermEx: keywords/indexing terms detection
        Otis!: topic segmentation
        Hi-tost: hierarchical topic structuring
- Video Processing
        Vidseg: video shot segmentation
        HUFA: face detection and tracking
Shortcuts to Linkmedia services are also available here:
https://www-linkmedia.irisa.fr/software/
 
For more information don't hesitate to contact us (
contact-multimedia-allgo@irisa.fr
). 

 
Gabriel Sargent and Guillaume Gravier
--
Linkmedia
IRISA - CNRS
Rennes, France

Back  Top

5-3-6Clickable map - Illustrations of the IPA

Clickable map - Illustrations of the IPA


We have produced a clickable map showing the Illustrations of the International Phonetic
Alphabet.

The map is being updated with each new issue of the Journal of the International Phonetic
Association.

https://richardbeare.github.io/marijatabain/ipa_illustrations_all.html

Marija Tabain - La Trobe University, Australia
Richard Beare - Monash University & MCRI, Australia

Back  Top

5-3-7LIG-Aikuma running on mobile phones and tablets

 


Dear all,

LIG is pleased to inform you that the website for the app Lig-Aikuma is online: https://lig-aikuma.imag.fr/
In the same time, an update of Lig-Aikuma (V3) was made available (see website).      

LIG-AIKUMA is a free Android app running on various mobile phones and tablets. The app proposes a range of different speech collection modes (recording, respeaking, translation and elicitation) and offers the possibility to share recordings between users. LIG-AIKUMA is built upon the initial AIKUMA app developed by S. Bird & F. Hanke (see https://en.wikipedia.org/wiki/Aikuma  for more information)

Improvements of the app:

  • Visual upgrade:
    + Waveform visualizer on the Respeaking and Translation modes (possibility to zoom in/out the audio signal)
    + File explorer included in all modes, to facilitate the navigation between files
    + New Share mode to share recordings between devices (by Bluetooth, Mail, NFC if available)
    + French and German languages available. In addition to English, the application now supports French and German languages. Lig-Aikuma uses by default the language of the phone/tablet.
    + New icons, more consistent to discriminate all type of files (audio, text, image, video)
  • Conceptual upgrade:
    + New name for the root project: ligaikuma ?> /! Henceforth, all data will be stored into this directory instead of ?aikuma? (in the previous versions of the app). This change doesn?t have compatibility issues. In the file explorer of the mode, the default position is this root directory. Just go back once with the left grey arrow (on the lower left of the screen) and select the ?aikuma? directory to access to your old recordings
    + Generation of a PDF consent form (from informations filled in the metadata form) that can be signed by linguist and speaker thanks to a pdf annotation tool (like Adobe Fill & Sign mobile app)
    + Generation of a CSV file which can be imported in Elan software: it will automatically create segmented tier, as it was done during a respeaking or a translation session. It will also mention by a ?non-speech? label that a segment has no speech.
    + Géolocalisation of the recordings
    + Respeak an elicit file: it is now possible to use in Respeaking or Translation mode an audio file initially recorded in Elicitation mode
  • Structural upgrade:
    + Undo button on Elicitation to erase/redo the current recording
    + Improvement session backup on Elicitation
    + Non-speech button in Respeaking and Translation modes to indicate by a comment that the segment does not contain speech (but noise or silent for instance)
    + Automatic speaker profile creation to quickly fill in the metadata infos if several sessions with a same speaker
Best regards,

Elodie Gauthier & Laurent Besacier
Back  Top

5-3-8Python Library
Nous sommes heureux d'annoncer la mise à disposition du public de la
première bibliothèque en langage Python pour convertir des nombres écrits en
français en leur représentation en chiffres.
 
L'analyseur est robuste et est capable de segmenter et substituer les expressions
de nombre dans un flux de mots, comme une conversation par exemple. Il reconnaît les différentes
variantes de la langue (quantre-vingt-dix / nonante?) et traduit aussi bien les
ordinaux que les entiers, les nombres décimaux et les séquences formelles (n° de téléphone, CB?).
 
Nous espérons que cet outil sera utile à celles et ceux qui, comme nous, font du traitment
du langage naturel en français.
 
Cette bibliothèque est diffusée sous license MIT qui permet une utilisation très libre.
 
 
-- 
Romuald Texier-Marcadé
Back  Top



 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA