Li Deng and Dong Yu, Deep Learning: Methods and Applications, Foundations and Trends in Signal Processing
Foundations and Trends in Signal Processing (www.nowpublishers.com/sig) has published the following issue:
Volume 7, Issue 3-4
Deep Learning: Methods and Applications
By Li Deng and Dong Yu (Microsoft Research, USA)
http://dx.doi.org/10.1561/2000000039
O.Niebuhr, R.Skarnitzl, 'Tackling the Complexity in Speech', Prague University Press
Tackling the Complexity in Speech
Author
Oliver Niebuhr, Radek Skarnitzl (eds)
Publisher
Univerzita Karlova v Praze, Filozofická fakulta
Release year
2015
ISBN
978-80-7308-558-2
Series
Opera Facultatis philosophicae
Pages
230
The present volume is meant to give the reader an impression of the range of questions and topics that are currently subject of international research in the discovery of complexity, the organization of complexity, and the modelling of complexity. These are the main sections of our volume. Each section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations.
Barbosa, P. A. and Madureira, S. Manual de Fonética Acústica Experimental. Aplicações a dados do português. 591 p. São Paulo: Cortez, 2015. [In Portuguese]
Barbosa, P. A. and Madureira, S. Manual de Fonética Acústica Experimental. Aplicações a dados do português. 591 p. São Paulo: Cortez, 2015. [In Portuguese]
This manual of Experimental Acoustic Phonetics is conceived for Undergraduate and Graduate classes on areas such as Acoustic Phonetics, Phonology, Communications Engineering, Music, Acoustic Physics, Speech Theraphy, among others. Starting with a theoretical and methodological presentation of Acoustic Phonetics Theory and Techniques in five chapters, including a chapter on experimental methods, the book follows with detailed acoustic analysis of all classes of sounds using audio files from both European and Brazilian Portuguese as data. All analyses are explained step by step using Praat. The audiofiles are available on the book web site and can be downloaded. All techniques can be applied to any language, of course. Proposed exercices at the end of each chapter allow the teacher o evaluate the student progress.
Damien Nouvel, Inalco, Maud Ehrmann, EPFL,Sophie Rosset, CNRS. Les entités nommées pour le traitement automatique des langues
Les entités nommées pour le traitement automatique des langues
Damien Nouvel, Inalco, Maud Ehrmann, EPFL Sophie Rosset, CNRS
Le livre est disponible en ebook au prix de 9,90 euros. (prix réservé aux particuliers - PDF lisible sur tout support - uniquement disponible sur iste-editions.fr) Le livre est disponible en version papier au prix de 40,00 euros.
Le monde numérisé et connecté produit de grandes quantités de données. Analyser automatiquement le langage naturel est un enjeu majeur pour les applications de recherches sur le Web, de suivi d'actualités, de fouille, de veille, d'opinion, etc.
Les recherches menées en extraction d'information ont montré l'importance de certaines unités, telles que les noms de personnes, de lieux et d’organisations, les dates ou les montants. Le traitement de ces éléments, les « entités nommées », a donné lieu au développement d'algorithmes et de ressources utilisées par les systèmes informatiques.
Théorique et pratique, cet ouvrage propose des outils pour définir ces entités, les identifier, les lier à des bases de connaissance ou pour procéder à l’évaluation des systèmes.
Sommaire
1. Les entités nommées pour l’accès à l’information 2. Les entités nommées, des unités référentielles 3. Ressources autour des entités nommées 4. Reconnaître les entités nommées 5. Lier les entités nommées aux référentiels 6. Évaluation de la reconnaissance des entités nommées
168 pages - Octobre 2015 Ouvrage papier - broché ISBN 978-1-78405-104-4
An overview on the challenging new topic of phase-aware signal processing
Speech communication technology is a key factor in human-machine interaction, digital hearing aids, mobile telephony, and automatic speech/speaker recognition. With the proliferation of these applications, there is a growing requirement for advanced methodologies that can push the limits of the conventional solutions relying on processing the signal magnitude spectrum.
Single-Channel Phase-Aware Signal Processing in Speech Communication provides a comprehensive guide to phase signal processing and reviews the history of phase importance in the literature, basic problems in phase processing, fundamentals of phase estimation together with several applications to demonstrate the usefulness of phase processing.
Key features:
Analysis of recent advances demonstrating the positive impact of phase-based processing in pushing the limits of conventional methods.
Offers unique coverage of the historical context, fundamentals of phase processing and provides several examples in speech communication.
Provides a detailed review of many references and discusses the existing signal processing techniques required to deal with phase information in different applications involved with speech.
The book supplies various examples and MATLAB® implementations delivered within the PhaseLab toolbox.
Single-Channel Phase-Aware Signal Processing in Speech Communication is a valuable single-source for students, non-expert DSP engineers, academics and graduate students.
ELRA - Language Resources Catalogue - Update (April 2017)
ELRA - Language Resources Catalogue - Update *****************************************************************
We are happy to announce that 1 Evaluation Package, 1 Written Corpus, 3 Desktop/Microphone Speech Resources and 1 Broadcast Speech Resource are now available in our catalogue.
The ETAPE Evaluation Package consists of ca. 30 hours of radio and TV data, selected to include mostly non planned speech and a reasonable proportion of multiple speaker data. All data were carefully transcribed, including named entity annotation. This package includes the material that was used for the ETAPE evaluation campaign. It includes resources, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of this evaluation package is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. For more information, see: http://catalog.elra.info/product_info.php?products_id=1299
The Danish Propbank (DPB) is an 87,000-token treebank from a variety of genres, annotated with morphosyntactic and semantic information, namely propositions/frames with VerbNet classes and semantic roles for both arguments and satellites. There are over 12,000 frames with 32,000 role instances. The corpus has also been annotated with 20 Named Entity classes and a 200-category semantic ontology for nouns. For more information, see http://catalog.elra.info/product_info.php?products_id=1300
This extended version of the Bulgarian Pronunciation Dictionary called Bulgarian-Dict260k contains pronunciations of more than 260,000 word forms. For more information, see: http://catalog.elra.info/product_info.php?products_id=1301
The Accented English part of the GlobalPhone resources contains 63 recording sessions of Bulgarian, Chinese, German, and Indian native speakers reading 37 English sentences each, produced in GlobalPhone-style, i.e. 16kHz PCM encoded audio recordings of utterance-segmented read speech from the newspaper domain. For more information, see: http://catalog.elra.info/product_info.php?products_id=1302
ELRA-S0390 Parallel EMG-Acoustic English GlobalPhone ISLRN: 910-309-096-523-6
The parallel EMG-Acoustic English GlobalPhone language resource contains 63 recordings sessions from 8 speakers articulating speech in three speaking modes, audible, whispered, and silent by reading three times 50 English sentences in GlobalPhone-style, i.e. 16kHz PCM encoded audio recordings of utterance-segmented read speech from the newspaper domain. Speech is recorded in a parallel fashion, i.e. synchronously by a standard close-talking microphone and by surface electrodes capturing the muscle activities of the articulatory muscles in the face (EelectroMmyoGgraphy =- EMG). For more information, see: http://catalog.elra.info/product_info.php?products_id=1303
This Frisian corpus consists of 203 audio segments of approximately 5 minutes long extracted from various radio programs covering a time span of almost 50 years (1966-2015), adding a longitudinal dimension to the database. The content of the recordings are very diverse including radio programs about culture, history, literature, sports, nature, agriculture, politics, society and languages. There are 309 identified speakers in the FAME! Speech Corpus, 21 of whom appear at least 3 times in the database. The total duration of the manually annotated radio broadcasts sums up to 18 hours, 33 minutes and 57 seconds. For more information, see: http://catalog.elra.info/product_info.php?products_id=1304
For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli@elda.org
If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us.
A global leader in linguistic technology solutions
RECENT CATALOG ADDITIONS—MARCH 2012
1. Speech Databases
1.1 Telephony
1.1 Telephony
Language
Database Type
Catalogue Code
Speakers
Status
Bahasa Indonesia
Conversational
BAH_ASR001
1,002
Available
Bengali
Conversational
BEN_ASR001
1,000
Available
Bulgarian
Conversational
BUL_ASR001
217
Available shortly
Croatian
Conversational
CRO_ASR001
200
Available shortly
Dari
Conversational
DAR_ASR001
500
Available
Dutch
Conversational
NLD_ASR001
200
Available
Eastern Algerian Arabic
Conversational
EAR_ASR001
496
Available
English (UK)
Conversational
UKE_ASR001
1,150
Available
Farsi/Persian
Scripted
FAR_ASR001
789
Available
Farsi/Persian
Conversational
FAR_ASR002
1,000
Available
French (EU)
Conversational
FRF_ASR001
563
Available
French (EU)
Voicemail
FRF_ASR002
550
Available
German
Voicemail
DEU_ASR002
890
Available
Hebrew
Conversational
HEB_ASR001
200
Available shortly
Italian
Conversational
ITA_ASR003
200
Available shortly
Italian
Voicemail
ITA_ASR004
550
Available
Kannada
Conversational
KAN_ASR001
1,000
In development
Pashto
Conversational
PAS_ASR001
967
Available
Portuguese (EU)
Conversational
PTP_ASR001
200
Available shortly
Romanian
Conversational
ROM_ASR001
200
Available shortly
Russian
Conversational
RUS_ASR001
200
Available
Somali
Conversational
SOM_ASR001
1,000
Available
Spanish (EU)
Voicemail
ESO_ASR002
500
Available
Turkish
Conversational
TUR_ASR001
200
Available
Urdu
Conversational
URD_ASR001
1,000
Available
1.2 Wideband
Language
Database Type
Catalogue Code
Speakers
Status
English (US)
Studio
USE_ASR001
200
Available
French (Canadian)
Home/ Office
FRC_ASR002
120
Available
German
Studio
DEU_ASR001
127
Available
Thai
Home/Office
THA_ASR001
100
Available
Korean
Home/Office
KOR_ASR001
100
Available
2. Pronunciation Lexica
Appen Butler Hill has considerable experience in providing a variety of lexicon types. These include:
Pronunciation Lexica providing phonemic representation, syllabification, and stress (primary and secondary as appropriate)
Part-of-speech tagged Lexica providing grammatical and semantic labels
Other reference text based materials including spelling/mis-spelling lists, spell-check dictionar-ies, mappings of colloquial language to standard forms, orthographic normalization lists.
Over a period of 15 years, Appen Butler Hill has generated a significant volume of licensable material for a wide range of languages. For holdings information in a given language or to discuss any customized development efforts, please contact: sales@appenbutlerhill.com
3. Named Entity Corpora
Language
Catalogue Code
Words
Description
Arabic
ARB_NER001
500,000
These NER Corpora contain text material from a vari-ety of sources and are tagged for the following Named Entities: Person, Organization, Location, Na-tionality, Religion, Facility, Geo-Political Entity, Titles, Quantities
English
ENI_NER001
500,000
Farsi/Persian
FAR_NER001
500,000
Korean
KOR_NER001
500,000
Japanese
JPY_NER001
500,000
Russian
RUS_NER001
500,000
Mandarin
MAN_NER001
500,000
Urdu
URD_NER001
500,000
3. Named Entity Corpora
Language
Catalogue Code
Words
Description
Arabic
ARB_NER001
500,000
These NER Corpora contain text material from a vari-ety of sources and are tagged for the following Named Entities: Person, Organization, Location, Na-tionality, Religion, Facility, Geo-Political Entity, Titles, Quantities
English
ENI_NER001
500,000
Farsi/Persian
FAR_NER001
500,000
Korean
KOR_NER001
500,000
Japanese
JPY_NER001
500,000
Russian
RUS_NER001
500,000
Mandarin
MAN_NER001
500,000
Urdu
URD_NER001
500,000
4. Other Language Resources
Morphological Analyzers – Farsi/Persian & Urdu
Arabic Thesaurus
Language Analysis Documentation – multiple languages
For additional information on these resources, please contact: sales@appenbutlerhill.com
5. Customized Requests and Package Configurations
Appen Butler Hill is committed to providing a low risk, high quality, reliable solution and has worked in 130+ languages to-date supporting both large global corporations and Government organizations.
We would be glad to discuss to any customized requests or package configurations and prepare a cus-tomized proposal to meet your needs.
6. Contact Information
Prithivi Pradeep
Business Development Manager
ppradeep@appenbutlerhill.com
+61 2 9468 6370
Tom Dibert
Vice President, Business Development, North America
Nous souhaiterions vous signaler la mise en ligne d'OFROM, premier corpus de français parlé en Suisse romande. L'archive est, dans version actuelle, d'une durée d'environ 15 heures. Elle est transcrite en orthographe standard dans le logiciel Praat. Un concordancier permet d'y effectuer des recherches, et de télécharger les extraits sonores associés aux transcriptions.
Pour accéder aux données et consulter une description plus complète du corpus, nous vous invitons à vous rendre à l'adresse suivante : http://www.unine.ch/ofrom.
We are happy to announce the release of DEMAND, a set of real-world 16-channel noise recordings designed for the evaluation of microphone array processing techniques.
1.5 h of noise data were recorded in 18 different indoor and outdoor environments and are available under the terms of the Creative Commons Attribution-ShareAlike License.
Joachim Thiemann (CNRS - IRISA) Nobutaka Ito (University of Tokyo) Emmanuel Vincent (Inria Nancy - Grand Est)
Aide à la finalisation de corpus oraux ou multimodaux pour diffusion, valorisation et dépôt pérenne
Aide à la finalisation de corpus oraux ou multimodaux pour diffusion, valorisation et dépôt pérenne
Le consortium IRCOM de la TGIR Corpus et l’EquipEx ORTOLANG s’associent pour proposer une aide technique et financière à la finalisation de corpus de données orales ou multimodales à des fins de diffusion et pérennisation par l’intermédiaire de l’EquipEx ORTOLANG. Cet appel ne concerne pas la création de nouveaux corpus mais la finalisation de corpus existants et non-disponibles de manière électronique. Par finalisation, nous entendons le dépôt auprès d’un entrepôt numérique public, et l’entrée dans un circuit d’archivage pérenne. De cette façon, les données de parole qui ont été enrichies par vos recherches vont pouvoir être réutilisées, citées et enrichies à leur tour de manière cumulative pour permettre le développement de nouvelles connaissances, selon les conditions d’utilisation que vous choisirez (sélection de licences d’utilisation correspondant à chacun des corpus déposés).
Cet appel d’offre est soumis à plusieurs conditions (voir ci-dessous) et l’aide financière par projet est limitée à 3000 euros. Les demandes seront traitées dans l’ordre où elles seront reçues par l’ IRCOM. Les demandes émanant d’EA ou de petites équipes ne disposant pas de support technique « corpus » seront traitées prioritairement. Les demandes sont à déposer du 1er septembre 2013 au 31 octobre 2013. La décision de financement relèvera du comité de pilotage d’IRCOM. Les demandes non traitées en 2013 sont susceptibles de l’être en 2014. Si vous avez des doutes quant à l’éligibilité de votre projet, n’hésitez pas à nous contacter pour que nous puissions étudier votre demande et adapter nos offres futures.
Pour palier la grande disparité dans les niveaux de compétences informatiques des personnes et groupes de travail produisant des corpus, L’ IRCOM propose une aide personnalisée à la finalisation de corpus. Celle-ci sera réalisée par un ingénieur IRCOM en fonction des demandes formulées et adaptées aux types de besoin, qu’ils soient techniques ou financiers.
Les conditions nécessaires pour proposer un corpus à finaliser et obtenir une aide d’IRCOM sont :
Pouvoir prendre toutes décisions concernant l’utilisation et la diffusion du corpus (propriété intellectuelle en particulier).
Disposer de toutes les informations concernant les sources des corpus et le consentement des personnes enregistrées ou filmées.
Accorder un droit d’utilisation libre des données ou au minimum un accès libre pour la recherche scientifique.
Les demandes peuvent concerner tout type de traitement : traitements de corpus quasi-finalisés (conversion, anonymisation), alignement de corpus déjà transcrits, conversion depuis des formats « traitement de textes », digitalisation de support ancien. Pour toute demande exigeant une intervention manuelle importante, les demandeurs devront s’investir en moyens humains ou financiers à la hauteur des moyens fournis par IRCOM et ORTOLANG.
IRCOM est conscient du caractère exceptionnel et exploratoire de cette démarche. Il convient également de rappeler que ce financement est réservé aux corpus déjà largement constitués et ne peuvent intervenir sur des créations ex-nihilo. Pour ces raisons de limitation de moyens, les propositions de corpus les plus avancés dans leur réalisation pourront être traitées en priorité, en accord avec le CP d’IRCOM. Il n’y a toutefois pas de limite « théorique » aux demandes pouvant être faites, IRCOM ayant la possibilité de rediriger les demandes qui ne relèvent pas de ses compétences vers d’autres interlocuteurs.
Les propositions de réponse à cet appel d’offre sont à envoyer à ircom.appel.corpus@gmail.com. Les propositions doivent utiliser le formulaire de deux pages figurant ci-dessous. Dans tous les cas, une réponse personnalisée sera renvoyée par IRCOM.
Ces propositions doivent présenter les corpus proposés, les données sur les droits d’utilisation et de propriétés et sur la nature des formats ou support utilisés.
Cet appel est organisé sous la responsabilité d’IRCOM avec la participation financière conjointe de IRCOM et l’EquipEx ORTOLANG.
Pour toute information complémentaire, nous rappelons que le site web de l'Ircom (http://ircom.corpus-ir.fr) est ouvert et propose des ressources à la communauté : glossaire, inventaire des unités et des corpus, ressources logicielles (tutoriaux, comparatifs, outils de conversion), activités des groupes de travail, actualités des formations, ...
L'IRCOM invite les unités à inventorier leur corpus oraux et multimodaux - 70 projets déjà recensés - pour avoir une meilleure visibilité des ressources déjà disponibles même si elles ne sont pas toutes finalisées.
Le comité de pilotage IRCOM
Utiliser ce formulaire pour répondre à l’appel : Merci.
Réponse à l’appel à la finalisation de corpus oral ou multimodal
Nom du corpus :
Nom de la personne à contacter :
Adresse email :
Numéro de téléphone :
Nature des données de corpus :
Existe-t’il des enregistrements :
Quel média ? Audio, vidéo, autre…
Quelle est la longueur totale des enregistrements ? Nombre de cassettes, nombre d’heures, etc.
Quel type de support ?
Quel format (si connu) ?
Existe-t’il des transcriptions :
Quel format ? (papier, traitement de texte, logiciel de transcription)
Quelle quantité (en heures, nombre de mots, ou nombre de transcriptions) ?
Disposez vous de métadonnées (présentation des droits d’auteurs et d’usage) ?
Disposez-vous d’une description précise des personnes enregistrées ?
Disposez-vous d’une attestation de consentement éclairé pour les personnes ayant été enregistrées ? En quelle année (environ) les enregistrements ont eu lieu ?
Quelle est la langue des enregistrements ?
Le corpus comprend-il des enregistrements d’enfants ou de personnes ayant un trouble du langage ou une pathologie ?
Si oui, de quelle population s’agit-il ?
Dans un souci d’efficacité et pour vous conseiller dans les meilleurs délais, il nous faut disposer d’exemples des transcriptions ou des enregistrements en votre possession. Nous vous contacterons à ce sujet, mais vous pouvez d’ores et déjà nous adresser par courrier électronique un exemple des données dont vous disposez (transcriptions, métadonnées, adresse de page web contenant les enregistrements).
Nous vous remercions par avance de l’intérêt que vous porterez à notre proposition. Pour toutes informations complémentaires veuillez contacter Martine Toda martine.toda@ling.cnrs.fr ou à ircom.appel.corpus@gmail.com.
Rhapsodie: un Treebank prosodique et syntaxique de français parlé
Rhapsodie: un Treebank prosodique et syntaxique de français parlé
Nous avons le plaisir d'annoncer que la ressource Rhapsodie, Corpus de français parlé annoté pour la prosodie et la syntaxe, est désormais disponible surhttp://www.projet-rhapsodie.fr/
Le treebank Rhapsodie est composé de 57 échantillons sonores (5 minutes en moyenne, au total 3h de parole, 33000 mots) dotés d’une transcription orthographique et phonétique alignées au son.
Il s'agit d’une ressource de français parlé multi genres (parole privée et publique ; monologues et dialogues ; entretiens en face à face vs radiodiffusion, parole plus ou moins interactive et plus ou moins planifiée, séquences descriptives, argumentatives, oratoires et procédurales) articulée autour de sources externes (enregistrements extraits de projets antérieurs, en accord avec les concepteurs initiaux) et internes. Nous tenons en particulier à remercier les responsables des projetsCFPP2000,PFC,ESLO,C-Promainsi que Mathieu Avanzi, Anne Lacheret, Piet Mertens et Nicolas Obin.
Les échantillons sonores (wave & MP3, pitch nettoyé et lissé), les transcriptions orthographiques (txt), les annotations macrosyntaxiques (txt), les annotations prosodiques (xml, textgrid) ainsi que les metadonnées (xml & html) sont téléchargeables librement selon les termes de la licence Creative Commons Attribution - Pas d’utilisation commerciale - Partage dans les mêmes conditions 3.0 France.
Les annotations microsyntaxiques seront disponibles prochainement
Les métadonnées sont également explorables en ligne grâce à un browser.
Les tutoriels pour la transcription, les annotations et les requêtes sont disponibles sur le site Rhapsodie.
Enfin, L’annotation prosodique est interrogeable en ligne grâce au langage de requêtes Rhapsodie QL.
L'équipe Ressource Rhapsodie (Modyco, Université Paris Ouest Nanterre)
Sylvain Kahane, Anne Lacheret, Paola Pietrandrea, Atanas Tchobanov, Arthur Truong.
Rhapsodie: a Prosodic and Syntactic Treebank for Spoken French
We are pleased to announce thatRhapsodie,a syntactic and prosodic treebank of spoken French created with the aim of modeling the interface between prosody, syntax and discourse in spoken French is now available at http://www.projet-rhapsodie.fr/
The Rhapsodie treebankis made up of 57 short samples of spoken French (5 minutes long on average, amounting to 3 hours of speech and a 33 000 word corpus) endowed with an orthographical phoneme-aligned transcription .
The corpus is representative of different genres (private and public speech; monologues and dialogues; face-to-face interviews and broadcasts; more or less interactive discourse; descriptive, argumentative and procedural samples, variations in planning type).
The corpus samples have beenmainly drawn from existing corpora of spoken Frenchand partially created within the frame of theRhapsodieproject. We would especially like to thank the coordinators of the CFPP2000,PFC,ESLO,C-Promprojects as well as Piet Mertens, Mathieu Avanzi, Anne Lacheret and Nicolas Obin.
The sound samples (waves, MP3, cleaned and stylized pitch), the orthographic transcriptions (txt), the macrosyntactic annotations (txt), the prosodic annotations (xml, textgrid) as well as the metadata (xml and html) can be freely downloaded under the terms of the Creative Commons licence Attribution - Noncommercial - Share Alike 3.0 France.
Microsyntactic annotation will be available soon.
The metadata are searchable on line through a browser.
The prosodic annotation can be explored on line through the Rhapsodie Query Language.
The tutorials of transcription, annotations and Rhapsodie Query Language are available on the site.
The Rhapsodie team (Modyco, Université Paris Ouest Nanterre :
Sylvain Kahane, Anne Lacheret, Paola Pietrandrea, Atanas Tchobanov, Arthur Truong.
Annotation of “Hannah and her sisters” by Woody Allen.
We have created and made publicly available a dense audio-visual person-oriented ground-truth annotation of a feature movie (100 minutes long): “Hannah and her sisters” by Woody Allen.
The annotation includes
• Face tracks in video (densely annotated, i.e., in each frame, and person-labeled)
• Speech segments in audio (person-labeled)
• Shot boundaries in video
The annotation can be useful for evaluating
• Person-oriented video-based tasks (e.g., face tracking, automatic character naming, etc.)
• Person-oriented audio-based tasks (e.g., speaker diarization or recognition)
• Person-oriented multimodal-based tasks (e.g., audio-visual character naming)
Detail on Hannah dataset and access to it can be obtained there:
Text to Speech Synthesis: over an hourof speechsynthesis samples from 1968 to2001by25French, Canadian, US , Belgian, Swedish, Swisssystems
33 ans de synthèse de la parole à partir du texte: une promenade sonore (1968-2001) (33 years ofText to Speech Synthesis in French : an audio tour(1968-2001) ) Christophe d'Alessandro Article publishedin Volume42 -No. 1/2001 issue of Traitement Automatique des Langues(TAL, EditionsHermes), pp. 297-321.
'The purpose of the project is to make available a standard training and test setup for language modeling experiments.
The training/held-out data was produced from a download at statmt.org using a combination of Bash shell and Perl scripts distributed here.
This also means that your results on this data set are reproducible by the research community at large.
Besides the scripts needed to rebuild the training/held-out data, it also makes available log-probability values for each word in each of ten held-out data sets, for each of the following baseline models:
International Standard Language Resource Number (ISLRN) (ELRA Press release)
Press Release - Immediate - Paris, France, December 13, 2013
Establishing the International Standard Language Resource Number (ISLRN)
12 major NLP organisations announce the establishment of the ISLRN, a Persistent Unique Identifier, to be assigned to each Language Resource.
On November 18, 2013, 12 NLP organisations have agreed to announce the establishment of the International Standard Language Resource Number (ISLRN), a Persistent Unique Identifier, to be assigned to each Language Resource. Experiment replicability, an essential feature of scientific work, would be enhanced by such unique identifier. Set up by ELRA, LDC and AFNLP/Oriental-COCOSDA, the ISLRN Portal will provide unique identifiers using a standardised nomenclature, as a service free of charge for all Language Resource providers. It will be supervised by a steering committee composed of representatives of participating organisations and enlarged whenever necessary.
More information on ELRA and the ISLRN, please contact: Khalid Choukri choukri@elda.org
More information on ELDA, please contact: Hélène Mazo mazo@elda.org
Opening of the ISLRN Portal ELRA, LDC, and AFNLP/Oriental-COCOSDA announce the opening of the ISLRN Portal @ www.islrn.org.
Further to the establishment of the International Standard Language Resource Number (ISLRN) as a unique and universal identification schema for Language Resources on November 18, 2013, ELRA, LDC and AFNLP/Oriental-COCOSDA now announce the opening of the ISLRN Portal (www.islrn.org). As a service free of charge for all Language Resource providers and under the supervision of a steering committee composed of representatives of participating organisations, the ISLRN Portal provides unique identifiers using a standardised nomenclature.
Overview The 13-digit ISLRN format is: XXX-XXX-XXX-XXX-X. It can be allocated to any Language Resource; its composition is neutral and does not include any semantics in reference to the type or nature of the Language Resource. The ISLRN is a randomly created number with a check digit that validates a Verhoeff algorithm.
Two types of external players may interact with the ISLRN Portal: Visitors and Providers. Visitors may browse the web site and search for the ISLRN of a given Language Resource by its name or by its number if it exists. Providers are registered and own credentials. They can request a new ISLRN for a given Language Resource. A provider has the possibility to become certified, after moderation, in order to be able to import metadata in XML format.
The functionalities that can be accessed by Visitors are:
The functionalities that can be accessed by Providers, once they have signed up, are:
- Log in - Request an ISLRN according to the metadata of a given resource - Request to become a certified provider so as to import XML files containing metadata - Import one or more metadata descriptions in XML to request ISLRN(s) (only for certified providers) - Edit pending requests - Access previous requests - Contact a Moderator or an Administrator - Edit Providers’ own profile
ISLRN request is handled by moderators within 5 working days. Contact: islrn@elda.org
Background The International Standard Language Resource Number (ISLRN) is a unique and universal identification schema for Language Resources which provides Language Resources with unique identifier using a standardised nomenclature. It also ensures that Language Resources are correctly identified, and consequently, recognised with proper references for their usage in applications in R&D projects, products evaluation and benchmark as well as in documents and scientific papers. Moreover, it is a major step in the interconnected world that Human Language Technologies (HLT) has become: unique resources must be identified as they are and meta-catalogues need a common identification format to manage data correctly.
The ISLRN does not intend to replace local and specific identifiers, it is not meant to be a legal deposit, not an obligation, but rather an essential and best practice. For instance a resource that is distributed by several data centres will still have the “local” data-centre identifier but will have a unique ISLRN.
******************************************************************** About ELRA The European Language Resources Association (ELRA) is a non-profit making organisation founded by the European Commission in 1995, with the mission of providing a clearing house for language resources and promoting Human Language Technologies (HLT). To find out more about ELRA, please visit www.elra.info.
About LDC Founded in 1992, the Linguistic Data Consortium (LDC) is an open consortium of universities, companies and government research laboratories. It creates, collects and distributes speech and text databases, lexicons, and other resources for research and development purposes. The University of Pennsylvania is the LDC's host institution. To find out more about LDC, please visit www.ldc.upenn.edu.
About AFNLP The mission of the Asian Federation of Natural Language Processing (AFNLP) is to promote and enhance R&D relating to the computational analysis and the automatic processing of all languages of importance to the Asian region by assisting and supporting like-minded organizations and institutions through information sharing, conference organization, research and publication co-ordination, and other forms of support. To find out more about AFNLP, please visit www.afnlp.org.
About Oriental-COCOSDA The International Committee for the Co-ordination and Standardisation of Speech Databases and Assesment Techniques, Oriental-COCOSDA, has been established to encourage and promote international interaction and cooperation in the foundation areas of Spoken Language Processing, especially for Speech Input/Output. To find out more about Oriental-COCOSDA, please visit our web site: www.cocosda.org
Speechocean: A global language resources and data services supplier
About Speechocean
Speechocean is one of the world well-known language related resources & services provider in the fields of Human Computer Interaction and Human Language Technology. At present, we can provide data services with 110+ languages and dialects across the world.
KingLine Data Center ---Data Sharing Platform
Kingline Data Center is operated and supervised by Speechocean, which is mainly focused on language resources creating and providing for research and development of human language technology.
These diversified corpora are widely used for the research and development in the fields of Speech Recognition, Speech Synthesis, Natural Language Processing, Machine Translation, Web Search, etc. All corpora are openly accessible for users all over the world, including users from scientific research institutions, enterprises or individuals.
The Uighur Telephone Speech Recognition Corpus was collected in China. It contains the voices of 300 different speakers (150 males, 150 females) who were balanced distributed in age (mainly 18-35, 36-45, >46), gender and regional accents (for the details, please see the technical document). The script contains 120,000(approx.) utterances in total (for more details of script structure design, please check the specification), specially designed to provide materials for both training and testing of many classes of speech recognizers. Each speaker was recorded in quiet environments (home/office). Telephone platform, i.e. IVR was used for speech collection. Each utterance wave was stored in a separate file and uncompressed. A pronunciation lexicon is available. All audio files were manually transcribed and annotated by native transcribers. Details are available with specification.
The Hindi Mobile Speech Recognition Corpus was collected in India. It contains the voices of 200 different speakers (108 males, 92 females) who were balanced distributed in age (16-30, 31-45, 46-60), gender and regional accents (for the details, please see the technical document). More than 20 topics were included in total (for more details of script structure design, please check the specification), specially designed to provide materials for both training and testing of many classes of speech recognizers. Each speaker was recorded in a quiet office environment. Mobile platforms, i.e. iOS, Android and Windows were used for speech collection. Each utterance wave was stored in a separate file and uncompressed. A pronunciation lexicon is available with a phonemic transcription in SAMPA. All manually checked. All audio files were manually transcribed and annotated by native transcribers. Details are available with specification.
The Kids Mandarin Speech Recognition Corpus was collected in China Liaoning Province and Hebei Province. It contains the voices of 575 different native speakers (283 males, 292 females) who were balanced distributed in age (4-9 years old), gender and regional accents (for the details, please see the technical document). The script contains 396,000(approx.) utterances in total( for more details of script structure design, please check the specification), specially designed to provide material for both training and testing of many classes of speech recognizers. Each speaker was recorded in a quiet office room. Mobile phone, desktop and smart TV were used for speech collection. Each utterance wave is stored in a separate file and uncompressed. A pronunciation lexicon is available with a phonemic transcription in Pinyin. All manually checked. All audio files were manually transcribed and annotated by native transcribers. Details are available with specification.
The Hong Kong English Mobile Speech Recognition Corpus was collected in Hong Kong. It contains the voices of 200 different speakers (99 males, 101 females) who were balanced distributed in age (18-30, 31-45, 46-60), gender and regional accents (for the details, please see the technical document). The script contains 179,406(approx.) utterances in total (for more details of script structure design, please check the specification), specially designed to provide materials for both training and testing of many classes of speech recognizers. Each speaker was recorded in a quiet environment. Mobile platforms, i.e. iOSAndroidWindows were used for speech collection. Each utterance wave was stored in a separate file and uncompressed. A pronunciation lexicon is available with a phonemic transcription in Hepburn. All manually checked. All audio files were manually transcribed and annotated by native transcribers. Details are available with specification.
kidLUCID: London UCL Children’s Clear Speech in Interaction Database
kidLUCID: London UCL Children’s Clear Speech in Interaction Database
We are delighted to announce the availability of a new corpus of spontaneous speech for children aged 9 to 14 years inclusive, produced as part of the ESRC-funded project on ‘Speaker-controlled Variability in Children's Speech in Interaction’ (PI: Valerie Hazan).
Speech recordings (a total of 288 conversations) are available for 96 child participants (46M, 50F, range 9;0 to 15;0 years), all native southern British English speakers. Participants were recorded in pairs while completing the diapix spot-the-difference picture task in which the pair verbally compared two scenes, only one of which was visible to each talker. High-quality digital recordings were made in sound-treated rooms. For each conversation, a stereo audio recording is provided with each speaker on a separate channel together with a Praat Textgrid containing separate word- and phoneme-level segmentations for each speaker.
There are six recordings per speaker pair made in the following conditions:
NOB (No barrier): both speakers heard each other normally
VOC (Vocoder): one conversational partner heard the other's speech after it had been processed in real time through a noise-excited three channel vocoder
BAB (Babble): one conversational partner heard the other's speech in a background of adult multi-talker babble at an approximate SNR of 0 dB.
We hope that these tables will promote wider dissemination of the datasets and software tools available in our community and help newcomers select the most suitable dataset or software for a given experiment. We plan to provide additional tables on, e.g., room impulse response datasets or speaker recognition software in the future.
We highly welcome your input, especially additional tables/entries and reproducible baselines for each dataset. It just takes a few minutes thanks to the simple wiki interface.
International Standard Language Resource Number (ISLRN) implemented by ELRA and LDC
ELRA and LDC partner to implement ISLRN process and assign identifiers to all the Language Resources in their catalogues.
Following the meeting of the largest NLP organizations, the NLP12, and their endorsement of the International Standard Language Resource Number (ISLRN), ELRA and LDC partnered to implement the ISLRN process and to assign identifiers to all the Language Resources (LRs) in their catalogues. The ISLRN web portal was designed to enable the assignment of unique identifiers as a service free of charge for all Language Resource providers. To enhance the use of ISLRN, ELRA and LDC have collaborated to provide the ISLRN 13-digit ID to all the Language Resources distributed in their respective catalogues. Anyone who is searching the ELRA and LDC catalogues can see that each Language Resource is now identified by both the data centre ID and the ISLRN number. All providers and users of such LRs should refer to the latter in their own publications and whenever referring to the LR.
ELRA and LDC will continue their joint involvement in ISLRN through active participation in this web service.
The International Standard Language Resource Number (ISLRN) aims to provide unique identifiers using a standardised nomenclature, thus ensuring that LRs are correctly identified, and consequently, recognised with proper references for their usage in applications within R&D projects, product evaluation and benchmarking, as well as in documents and scientific papers. Moreover, this is a major step in the networked and shared world that Human Language Technologies (HLT) has become: unique resources must be identified as such and meta-catalogues need a common identification format to manage data correctly.
Representatives of the major Natural Language Processing and Computational Linguistics organizations met in Paris on 18 November 2013 to harmonize and coordinate their activities within the field. The results of this coordination are expressed in the Paris Declaration: http://www.elra.info/NLP12-Paris-Declaration.html.
*** About ELRA *** The European Language Resources Association (ELRA) is a non-profit making organisation founded by the European Commission in 1995, with the mission of providing a clearing house for language resources and promoting Human Language Technologies (HLT). To find out more about ELRA, please visit our web site: http://www.elra.info
*** About LDC ***
The Linguistic Data Consortium (LDC) is an open consortium of universities, libraries, corporations and research laboratories that creates and distributes linguistic resources for language-related education, research and technology development.
ISLRN adopted by Joint Research Center (JRC) of the European Commission
JRC, the EC's Joint Research Centre, an important LR player: First to adopt the ISLRN initiative
The Joint Research Centre (JRC), the European Commission's in house science service, is the first organisation to use the International Standard Language Resource Number (ISLRN) initiative and has requested ISLRN 13-digit unique identifiers to its Language Resources (LR). Thus, anyone who is using JRC LRs may now refer to this number in their own publications.
The International Standard Language Resource Number (ISLRN) aims to provide unique identifiers using a standardised nomenclature, thus ensuring that LRs are correctly identified, and consequently, recognised with proper references for their usage in applications within R&D projects, product evaluation and benchmarking, as well as in documents and scientific papers. Moreover, this is a major step in the networked and shared world that Human Language Technologies (HLT) has become: unique resources must be identified as such and meta-catalogues need a common identification format to manage data correctly. The ISLRN portal can be accessed from http://www.islrn.org,
*** About the JRC ***
As the Commission's in-house science service, the Joint Research Centre's mission is to provide EU policies with independent, evidence-based scientific and technical support throughout the whole policy cycle. Within its research in the field of global security and crisis management, the JRC develops open source intelligence and analysis systems that can automatically harvest and analyse a huge amount of multi-lingual information from the internet-based sources. In this context, the JRC has developed Language Technology resources and tools that can be used for highly multilingual text analysis and cross-lingual applications. To find out more about JRC's research in open source information monitoring, please visit https://ec.europa.eu/jrc/en/research-topic/internet-surveillance-systems. To access media monitoring applications directly, go to http://emm.newsbrief.eu/overview.html.
*** About ELRA *** The European Language Resources Association (ELRA) is a non-profit making organisation founded by the European Commission in 1995, with the mission of providing a clearing house for language resources and promoting Human Language Technologies (HLT). To find out more about ELRA, please visit our web site: http://www.elra.info
Forensic database of voice recordings of 500+ Australian English speakers
Forensic database of voice recordings of 500+ Australian English speakers
We are pleased to announce that the forensic database of voice recordings of 500+ Australian English speakers is now published.
The database was collected by the Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunications, University of New South Wales as part of the Australian Research Council funded Linkage Project on making demonstrably valid and reliable forensic voice comparison a practical everyday reality in Australia. The project was conducted in partnership with: Australian Federal Police, New South Wales Police, Queensland Police, National Institute of Forensic Sciences, Australasian Speech Sciences and Technology Association, Guardia Civil, Universidad Autónoma de Madrid.
The database includes multiple non-contemporaneous recordings of most speakers. Each speaker is recorded in three different speaking styles representative of some common styles found in forensic casework. Recordings are recorded under high-quality conditions and extraneous noises and crosstalk have been manually removed. The high-quality audio can be processed to reflect recording conditions found in forensic casework.
Audio and Electroglottographic speech recordings from several languages
We are happy to announce the public availability of speech recordings made as part of the UCLA project 'Production and Perception of Linguistic Voice Quality'.
Audio and EGG recordings are available for Bo, Gujarati, Hmong, Mandarin, Black Miao, Southern Yi, Santiago Matatlan/ San Juan Guelavia Zapotec; audio recordings (no EGG) are available for English and Mandarin. Recordings of Jalapa Mazatec extracted from the UCLA Phonetic Archive are also posted. All recordings are accompanied by explanatory notes and wordlists, and most are accompanied by Praat textgrids that locate target segments of interest to our project.
Analysis software developed as part of the project – VoiceSauce for audio analysis and EggWorks for EGG analysis – and all project publications are also available from this site. All preliminary analyses of the recordings using these tools (i.e. acoustic and EGG parameter values extracted from the recordings) are posted on the site in large data spreadsheets.
All of these materials are made freely available under a Creative Commons Attribution-NonCommercial-ShareAlike-3.0 Unported License.
This project was funded by NSF grant BCS-0720304 to Pat Keating, Abeer Alwan and Jody Kreiman of UCLA, and Christina Esposito of Macalester College.
support the right-holders in finding the appropriate licenses under which to share/distribute their Language Resources, and
clarify the legal obligations applicable in various licensing situations.
Currently, the License Wizard allows the user to choose among several licenses that exist for the use of Language Resources: ELRA, Creative Commons and META-SHARE. More will be added.
The License Wizard works as a web configurator that helps Right Holders/Users:
- to select a number of legal features and obtain the user license adapted to their selection. - to define which user licenses they would like to select in order to distribute their Language Resources. - to integrate the user license terms into a Distribution Agreement that could be proposed to ELRA or META-SHARE for further distribution through the ELRA Catalogue of Language Resources (http://catalogue.elra.info, www.meta-share.eu).
Background From the very beginning, ELRA has come across all types of legal issues that arise when exchanging and sharing Language Resources. The association has devoted huge efforts to streamline the licensing processes while continuously monitoring the impacts of regulation changes on the HLT community activities. The first major step was to come up with a few licenses for both the research and the industrial sectors to use the resources available within the ELRA catalogue. Recently, its strong involvement in the META-SHARE infrastructure led to designing and drafting a small set of licenses, inspired by the ELRA licenses but also accounting for the new trends of permissive licenses and free resources, represented in particular by the Creative Commons.
EEG-face tracking- audio 24 GB data set Kara One, Toronto, Canada
We are making 24 GB of a new dataset, called Kara One, freely available. This database combines 3 modalities (EEG, face tracking, and audio) during imagined and articulated speech using phonologically-relevant phonemic and single-word prompts. It is the result of a collaboration between the Toronto Rehabilitation Institute (in the University Health Network) and the Department of Computer Science at the University of Toronto.
In the associated paper (abstract below), we show how to accurately classify imagined phonological categories solely from EEG data. Specifically, we obtain up to 90% accuracy in classifying imagined consonants from imagined vowels and up to 95% accuracy in classifying stimulus from active imagination states using advanced deep-belief networks.
PAPER Shunan Zhao and Frank Rudzicz (2015) Classifying phonological categories in imagined and articulated speech. In Proceedings of ICASSP 2015, Brisbane Australia
ABSTRACT This paper presents a new dataset combining 3 modalities (EEG, facial, and audio) during imagined and vocalized phonemic and single-word prompts. We pre-process the EEG data, compute features for all 3 modalities, and perform binary classi?cation of phonological categories using a combination of these modalities. For example, a deep-belief network obtains accuracies over 90% on identifying consonants, which is signi?cantly more accurate than two baseline supportvectormachines. Wealsoclassifybetweenthedifferent states (resting, stimuli, active thinking) of the recording, achievingaccuraciesof95%. Thesedatamaybeusedtolearn multimodal relationships, and to develop silent-speech and brain-computer interfaces.
In the spirit of the season, I would like to announce the immediate availability of the TORGO database free, in perpetuity for academic use. This database combines acoustics and electromagnetic articulography from 8 individuals with speech disorders and 7 without, and totals over 18 GB. These data can be used for multimodal models (e.g., for acoustic-articulatory inversion), models of pathology, and augmented speech recognition, for example. More information (and the database itself) can be found here: http://www.cs.toronto.edu/~complingweb/data/TORGO/torgo.html.
Datatang is a global leading data provider that specialized in data customized solution, focusing in variety speech, image, and text data collection, annotation, crowdsourcing services.
1, Speech data collection
2, Speech data synthesis
3, Speech data transcription
I’ve attached our company introduction as reference, as well as available speech data lists as follows:
US English Speech Data
300 people, about 200 hours
Uyghur Speech Data
2,500 people, about 1,000 hours
German Speech Data
100 people, about 40 hours
French Speech Data
100 people, about 40 hours
Spanish Speech Data
100 people, about 40 hours
Korean Speech Data
100 people, about 40 hours
Italian Speech Data
100 people, about 40 hours
Thai Speech Data
100 people, about 40 hours
Portuguese Speech Data
300 People, about 100 hours
Chinese Mandarin Speech Data
4,000 people, about 1,200 hours
Chinese Speaking English Speech Data
3,700 people, 720 hours
Cantonese Speech Data
5,000 people, about 1,400 hours
Japanese Speech Data
800 people, about 270 hours
Chinese Mandarin In-car Speech Data
690 people, about 245 hours
Shanghai Dialect Speech Data
2,500 people, about 1,000 hours
Southern Fujian Dialect Speech Data
2,500 people, about 1,000 hours
Sichuan Dialect Speech Data
2,500 people, about 860 hours
Henan Dialect Speech Data
400 people, about 150 hours
Northeastern Dialect Speech Data
300 people, 80 hours
Suzhou Dialect Speech Data
270 people, about 110 hours
Hangzhou Dialect Speech Data
400 people, about 170 hours
Non-Native Speaking Chinese Speech Data
1,100 people, about 73 hours
Real-world Call Center Chinese Speech Data
650 hours, more than 5,000 people
Mobile-end Real-world Voice Assistant Chinese Speech Data
4,000 hours, more than 2,000,000 people
Heavy Accent Chinese Speech Data
2,000 people, more than 1,000 hours
If you find any particular interested datasets, we could provide you samples with costs too.
April 2017 marks the beginning of LDC’s 25th year as the leader in language resource development and distribution. Founded in 1992, the Consortium has grown from a data repository to a vibrant data center that creates, shares and archives language resources. The Catalog continues to grow, boasting over 700 titles in more than 90 languages. With the support of members, licensees, sponsors and collaborators, LDC has distributed over 120,000 copies of data to more than 3,500 organizations worldwide. Our heartfelt thanks for your support as we continue our mission to provide large quantities of diverse data, research program support and high quality member services.
LDC data and commercial technology development
Any organization wishing to use LDC data to develop or test products for commercialization or use LDC data in any commercial product or for any commercial purpose, must first license the data as a For-Profit Member. Once the data is licensed under the For-Profit Membership, the organization retains perpetual rights to use the data for commercial technology development. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. Visit our Licensing page for more information.
The telephone speech segments include two-channel excerpts of approximately 10 seconds and 5 minutes. There are also summed-channel excerpts in the range of 5 minutes. The microphone excerpts are 3-15 minutes in duration. As in prior evaluations, intervals of silence were not removed.
The 2010 evaluation includes not only conversational telephone speech (CTS) recorded over ordinary telephone channels for the core training and test conditions, but also CTS and conversational interview speech recorded over a room microphone channel. Unlike prior evaluations, some of the conversational telephone style speech was collected in a manner to produce particularly high, or particularly low, vocal effort on the part of the speaker of interest. In addition to evaluation data, this package also consists of answer keys, trial and train files, development data and evaluation documentation.
2010 NIST Speaker Recognition Evaluation Test Set is distributed via hard drive.
2017 Subscription Members will receive copies of this corpus. 2017 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US $4000.
*
(2) BOLT Egyptian Arabic SMS/Chat and Transliteration was developed by LDC and consists of naturally-occurring Short Message Service (SMS) and Chat (CHT) data collected through data donations and live collection involving native speakers of Egyptian Arabic. The corpus contains 5,691 conversations totaling 1,029,248 words across 262,026 messages. Messages were natively written in either Arabic orthography or romanized Arabizi. A total of 1,856 Arabizi conversations (287,022 words) were transliterated from the original romanized Arabizi script into standard Arabic orthography and then reviewed, corrected and normalized by LDC annotators according to 'Conventional Orthography for Dialectal Arabic' (CODA).
The BOLT (Broad Operational Language Translation) program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. LDC supported the BOLT program by collecting informal data sources -- discussion forums, text messaging and chat -- in Chinese, Egyptian Arabic and English. The collected data was translated and annotated for various tasks including word alignment, treebanking, propbanking and co-reference.
BOLT Egyptian Arabic SMS/Chat and Transliteration is distributed via web download.
2017 Subscription Members will receive copies of this corpus. 2017 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US $1750.
*
(3) CHiME2 Grid was developed as part of The 2nd CHiME Speech Separation and Recognition Challenge and contains approximately 120 hours of English speech from a noisy living room environment. The CHiME Challenges focus on distant-microphone automatic speech recognition (ASR) in real-world environments.
CHiME2 Grid reflects the small vocabulary track of the CHiME2 Challenge. The target utterances were taken from the Grid corpus and consist of 34 speakers reading simple 6-word sequences. The Data is divided into training, development and test sets.
CHiME2 Grid is distributed via web download.
2017 Subscription Members will receive copies of this corpus. 2017 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US $50.
ROCme!: a free tool for audio corpora recording and management
ROCme!: nouveau logiciel gratuit pour l'enregistrement et la gestion de corpus audio.
Le logiciel ROCme! permet une gestion rationalisée, autonome et dématérialisée de l’enregistrement de corpus lus.
Caractéristiques clés : - gratuit - compatible Windows et Mac - interface paramétrable pour le recueil de métadonnées sur les locuteurs - le locuteur fait défiler les phrases à l'écran et les enregistre de façon autonome - format audio paramétrable
VocalTractLab 2.0 : A tool for articulatory speech synthesis
VocalTractLab 2.0 : A tool for articulatory speech synthesis
It is my pleasure to announce the release of the new major version 2.0 of VocalTractLab. VocalTractLab is an articulatory speech synthesizer and a tool to visualize and explore the mechanism of speech production with regard to articulation, acoustics, and control. It is available from http://www.vocaltractlab.de/index.php?page=vocaltractlab-download . Compared to version 1.0, the new version brings many improvements in terms of the implemented models of the vocal tract, the vocal folds, the acoustic simulation, and articulatory control, as well as in terms of the user interface. Most importantly, the new version comes together with a manual.
If you like, give it a try. Reports on bugs and any other feedback are welcome.
Bob signal-processing and machine learning toolbox (v.1.2..0)
The release 1.2.0 of the Bob signal-processing and machine learning toolbox is available . Bob provides both efficient implementations of several machine learning algorithms as well as a framework to help researchers to publish reproducible research.
The previous release of Bob was providing: * image, video and audio IO interfaces such as jpg, avi, wav, * database accessors such as FRGC, Labelled Face in the Wild, and many others, *mage processing: Local Binary Patterns (LBPs), Gabor Jets, SIFT, * machines and trainers such as Support Vector Machines (SVMs), k-Means, Gaussian Mixture Models (GMMs), Inter-Session Variability modeling (ISV), Joint Factor Analysis (JFA), Probabilistic Linear Discriminant Analysis (PLDA), Bayesian intra/extra (personal) classifier, The new release of Bob has brought the following features and/or improvements, such as: * Unified implementation of Local Binary Patterns (LBPs), * Histograms of Oriented Gradients (HOG) implementation, * Total variability (i-vector) implementation, * Conjugate gradient based-implementation for logistic regression, * Improved multi-layer perceptrons implementation (Back-propagation can now be easily used in combination with any optimizer -- i.e L-BFGS), * Pseudo-inverse-based method for Linear Discriminant Analysis, * Covariance-based method for Principal Component Analysis, * Whitening and within-class covariance normalization techniques, * Module for object detection and keypoint localization (bob.visioner), * Module for audio processing including feature extraction such as LFCC and MFCC, * Improved extensions (satellite packages), that now support both Python and C++ code, within an easy to use framework, * Improved documentation and add new tutorials, * Support for Intel's MKL (in addition to ATLAS), * Extend supported platforms (Arch Linux). This release represents a major milestone in Bob with plenty of functionality improvements (>640 commits in total) and plenty of bug fixes. • Sources and Documentation • Binary packages: • Ubuntu: 10.04, 12.04, 12.10 and 13.04 • For Mac OSX: works with 10.6 (Snow Leopard), 10.7 (Lion) and 10.8 (Mountain Lion) For instructions on how to install pre-packaged version on Ubuntu or OSX, consult our quick installation instructions (N.B. OS X macport has not yet been upgraded. This will be done very soon. cf. https://trac.macports.org/ticket/39831 ). Best regards, Elie Khoury (on Behalf of the Biometric Group at Idiap lead by Sebastien Marcel) ---
Dr. Elie Khoury Post Doctorant Biometric Person Recognition Group
IDIAP Research Institute (Switzerland) Tel : +41 27 721 77 23
COVAREP: A Cooperative Voice Analysis Repository for Speech Technologies
======================
CALL for contributions
======================
We are pleased to announce the creation of an open-source repository of advanced speech processing algorithms called COVAREP (A Cooperative Voice Analysis Repository for Speech Technologies). COVAREP has been created as a GitHub project (https://github.com/covarep/covarep) where researchers in speech processing can store original implementations of published algorithms.
Over the past few decades a vast array of advanced speech processing algorithms have been developed, often offering significant improvements over the existing state-of-the-art. Such algorithms can have a reasonably high degree of complexity and, hence, can be difficult to accurately re-implement based on article descriptions. Another issue is the so-called 'bug magnet effect' with re-implementations frequently having significant differences from the original. The consequence of all this has been that many promising developments have been under-exploited or discarded, with researchers tending to stick to conventional analysis methods.
By developing the COVAREP repository we are hoping to address this by encouraging authors to include original implementations of their algorithms, thus resulting in a single de facto version for the speech community to refer to.
We envisage a range of benefits to the repository:
1) Reproducible research: COVAREP will allow fairer comparison of algorithms in published articles.
2) Encouraged usage: the free availability of these algorithms will encourage researchers from a wide range of speech-related disciplines (both in academia and industry) to exploit them for their own applications.
3) Feedback: as a GitHub project users will be able to offer comments on algorithms, report bugs, suggest improvements etc.
SCOPE
We welcome contributions from a wide range of speech processing areas, including (but not limited to): Speech analysis, synthesis, conversion, transformation, enhancement, speech quality, glottal source/voice quality analysis, etc.
REQUIREMENTS
In order to achieve a reasonable standard of consistency and homogeneity across algorithms we have compiled a list of requirements for prospective contributors to the repository. However, we intend the list of the requirements not to be so strict as to discourage contributions.
Only published work can be added to the repository
The code must be available as open source
Algorithms should be coded in Matlab, however we strongly encourage authors to make the code compatible with Octave in order to maximize usability
Contributions have to comply with a Coding Convention (see GitHub site for coding convention and template). However, only for normalizing the inputs/outputs and the documentation. There is no restriction for the content of the functions (though, comments are obviously encouraged).
LICENCE
Getting contributing institutions to agree to a homogenous IP policy would be close to impossible. As a result COVAREP is a repository and not a toolbox, and each algorithm will have its own licence associated with it. Though flexible to different licence types, contributions will need to have a licence which is compatible with the repository, i.e. {GPL, LGPL, X11, Apache, MIT} or similar. We would encourage contributors to try to obtain LGPL licences from their institutions in order to be more industry friendly.
CONTRIBUTE!
We believe that the COVAREP repository has a great potential benefit to the speech research community and we hope that you will consider contributing your published algorithms to it. If you have any questions, comments issues etc regarding COVAREP please contact us on one of the email addresses below. Please forward this email to others who may be interested.
Existing contributions include: algorithms for spectral envelope modelling, adaptive sinusoidal modelling, fundamental frequncy/voicing decision/glottal closure instant detection algorithms, methods for detecting non-modal phonation types etc.
Release of the version 2 of FASST (Flexible Audio Source Separation Toolbox).
Release of the version 2 of FASST (Flexible Audio Source Separation Toolbox).http://bass-db.gforge.inria.fr/fasst/ This toolbox is intended to speed up the conception and to automate the implementation of new model-based audio source separation algorithms. It has the following additions compared to version 1: * Core in C++ * User scripts in MATLAB or python * Speedup * Multichannel audio input We provide 2 examples: 1. two-channel instantaneous NMF 2. real-world speech enhancement (2nd CHiME Challenge, Track 1)
Cantor Digitalis, an open-source real-time singing synthesizer controlled by hand gestures.
We are glad to announce the public realease of the Cantor Digitalis, an open-sourcereal-time singing synthesizer controlled by hand gestures.
It can be used e.g. for making music or for singing voice pedagogy.
A wide variety of voices are available, from the classic vocal quartet (soprano, alto, tenor, bass), to the extreme colors of childish, breathy, roaring, etc. voices. All the features of vocal sounds are entirely under control, as the synthesis method is based on a mathematic model of voice production, without prerecording segments.
The instrument is controlled using chironomy, i.e. hand gestures, with the help of interfaces like stylus or fingers on a graphic tablet, or computer mouse. Vocal dimensions such as the melody, vocal effort, vowel, voice tension, vocal tract size, breathiness etc. can easily and continuously be controlled during performance, and special voices can be prepared in advance or using presets.
MultiVec: a Multilingual and MultiLevel Representation Learning Toolkit for NLP
We are happy to announce the release of our new toolkit “MultiVec” for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes Mikolov et al. [2013b]’s word2vec features, Le and Mikolov [2014]’s paragraph vector (batch and online) and Luong et al. [2015]’s model for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification. The toolkit also includes C++ and Python libraries, that you can use to query bilingual and monolingual models.
The project is fully open to future contributions. The code is provided on the project webpage (https://github.com/eske/multivec) with installation instructions and command-line usage examples.
When you use this toolkit, please cite:
@InProceedings{MultiVecLREC2016,
Title = {{MultiVec: a Multilingual and MultiLevel Representation Learning Toolkit for NLP}},
Author = {Alexandre Bérard and Christophe Servan and Olivier Pietquin and Laurent Besacier},
Booktitle = {The 10th edition of the Language Resources and Evaluation Conference (LREC 2016)},
An android application for speech data collection LIG_AIKUMA
We are pleased to announce the release of LIG_AIKUMA, an android application for speech data collection, specially dedicated to language documentation. LIG_AIKUMA is an improved version of the Android application (AIKUMA) initially developed by Steven Bird and colleagues. Features were added to the app in order to facilitate the collection of parallel speech data in line with the requirements of a French-German project (ANR/DFG BULB - Breaking the Unwritten Language Barrier).
The resulting app, called LIG-AIKUMA, runs on various mobile phones and tablets and proposes a range of different speech collection modes (recording, respeaking, translation and elicitation). It was used for field data collections in Congo-Brazzaville resulting in a total of over 80 hours of speech.
(2017-09-20) 11th Oxford Dysfluency Conference , Oxford, UK
11th Oxford Dysfluency Conference Challenge and Change
20-23 September 2017 | St Catherine’s College, Oxford, UK
The 11th Oxford Dysfluency Conference (ODC), under the theme ‘Challenge and Change’, is to be held at St Catherine’s College Oxford from 20-23 September, 2017. ODC has a reputation as one of the leading international scientific conferences in the field of dysfluency.
The conference brings together researchers and clinicians, providing a showcase and forum for discussion and collegial debate about the most current and innovative research and clinical practices. Throughout the history of ODC, the primary aim has been to bridge the gap between research and clinical practice.
The conference seeks to promote research that informs management, with interventions that are supported by sound theory and which inform future research.
In 2017, the goal is to encourage discussion and debate that will challenge and enhance our perspectives and understanding of research; the nature of stuttering and / or cluttering; and management across the ages.