ISCApad #254 |
Saturday, August 10, 2019 by Chris Wellekens |
5-1-1 | Bäckström, Tom (with Guillaume Fuchs, Sascha Disch, Christian Uhle and Jeremie Lecomte), 'Speech Coding with Code-Excited Linear Prediction', Springer
| |||||
5-1-2 | Shinji Watanabe, Marc Delcroix, Florian Metze, John R. Hershey (Eds), 'New Era for Robust Seech Recognition', Springer. Shinji Watanabe, Marc Delcroix, Florian Metze, John R. Hershey (Eds), 'New Era for Robust Seech Recognition', Springer. https://link.springer.com/book/10.1007%2F978-3-319-64680-0
| |||||
5-1-3 | Fabrice Marsac, Rudolph Sock, CONSÉCUTIVITÉ ET SIMULTANÉITÉ en Linguistique, Langues et Parole, L'Harmattan,France Nous avons le plaisir de vous annoncer la parution du volume thématique « CONSÉCUTIVITÉ ET SIMULTANÉITÉ en Linguistique, Langues et Parole » dans la Collection Dixit Grammatica (L’Harmattan, France) :
| |||||
5-1-4 | Emmanuel Vincent (Editor), Tuomas Virtanen (Editor), Sharon Gannot (Editor), 'Audio Source Separation and Speech Enhancement', Wiley Emmanuel Vincent (Editor), Tuomas Virtanen (Editor), Sharon Gannot (Editor), Audio Source Separation and Speech Enhancement:
ISBN: 978-1-119-27989-1 October 2018 504 pages
| |||||
5-1-5 | Jen-Tzung Chien, 'Source Separation and Machine Learning', Academic Press Jen-Tzung Chien, 'Source Separation and Machine Learning', Academic Press
| |||||
5-1-6 | Ingo Feldhausen, « Methods in prosody: A Romance language perspective », Language Science Press (open access) Nous sommes heureux de vous annoncer la parution d'un recueil validé par un comité de lecture et consacré aux méthodes de recherche en prosodie. Cet ouvrage est intitulé « Methods in prosody: A Romance language perspective ». Il est publié par Language Science Press, une maison d’édition open access. Le livre peut-être téléchargé gratuitement en cliquant sur le lien suivant : http://langsci-press.org/catalog/book/183 La table des matières est la suivante : --------------------------------------------------------------------------------------------------------- Introduction Foreword I Large corpora and spontaneous speech 1) Using large corpora and computational tools to describe prosody: An 2) Intonation of pronominal subjects in Porteño Spanish: Analysis of II Approaches to prosodic analysis 3) Multimodal analyses of audio-visual information: Some methods and 4) The realizational coefficient: Devising a method for empirically 5) On the role of prosody in disambiguating wh-exclamatives and III Elicitation methods 6) The Discourse Completion Task in Romance prosody research: Status 7) Describing the intonation of speech acts in Brazilian Portuguese: Indexes 263 --------------------------------------------------------------------------------------------------------- N'hésitez pas à diffuser la parution de cet ouvrage auprès de vos collègues qui pourraient s'y intéresser. Bien cordialement, Ingo Feldhausen
| |||||
5-1-7 | Nigel Ward, 'Prosodic Patterns in English Conversation', Cambridge University Press, 2019 Prosodic Patterns in English Conversation Nigel G. Ward, Professor of Computer Science, University of Texas at El Paso Cambridge University Press, 2019.
Spoken language is more than words: it includes the prosodic features and patterns that speakers use, subconsciously, to frame meanings and achieve interactional goals. Thanks to the application of simple processing techniques to spoken dialog corpora, this book goes beyond intonation to describe how pitch, timing, intensity and voicing properties combine to form meaningful temporal configurations: prosodic constructions. Combining new findings with hitherto-scattered observations from diverse research traditions, this book enumerates twenty of the principal prosodic constructions of English.
http://www.cambridge.org/ward/ nigel@utep.edu http://www.cs.utep.edu/nigel/
| |||||
5-1-8 | J.H.Esling, Scott R.Moisik, Allison Benner, Lise Crevier-Buchman, 'Voice Quality: the Laryngeal Articulator Model', Cambridge University Press Voice Quality The Laryngeal Articulator Model Hardback 978-1-108-49842-5 John H. Esling, University of Victoria, British Columbia Scott R. Moisik, Nanyang Technological University, Singapore Allison Benner, University of Victoria, British Columbia Lise Crevier-Buchman, Centre National de la Recherche Scientifique (CNRS), Paris The first description of voice quality production in forty years, this book provides a new framework for its study: The Laryngeal Articulator Model. Informed by instrumental examinations of the laryngeal articulatory mechanism, it revises our understanding of articulatory postures to explain the actions, vibrations and resonances generated in the epilarynx and pharynx. It focuses on the long-term auditory-articulatory component of accent in the languages of the world, explaining how voice quality relates to segmental and syllabic sounds. Phonetic illustrations of phonation types and of laryngeal and oral vocal tract articulatory postures are provided. Extensive video and audio material is available on a companion website. The book presents computational simulations, the laryngeal and voice quality foundations of infant speech acquisition, speech/voice disorders and surgeries that entail compensatory laryngeal articulator adjustment, and an exploration of the role of voice quality in sound change and of the larynx in the evolution of speech.
1. Voice and voice quality; 2. Voice quality classification; 3. Instrumental case studies and computational simulations of voice quality; 4. Linguistic, paralinguistic and extralinguistic illustrations of voice quality; 5. Phonological implications of voice quality theory; 6. Infant acquisition of speech and voice quality; 7. Clinical illustrations of voice quality; 8. Laryngeal articulation and voice quality in sound change, language ontogeny.
| |||||
5-1-9 | Albert di Cristo,' Les langues naturelles'. HAL archive ouverte Albert di Cristo, les langues naturelles. https://hal-amu.archives-ouvertes.fr/hal-02149640 Cet ouvrage constitue la première partie d?un vaste travail dédié à l?étude des façons dont les langues naturelles conditionnent l?information et au rôle que joue la prosodie dans l?expression de ces conditionnements. Cette première partie se propose d?analyser, sous ses divers aspects (principalement d'ordre épistémologiques), la notion de structure informationnelle, notamment dans ses relations avec la grammaire et d?examiner, dans le détail, les déterminants qui forment l?armature de cette structure. Dans cette perspective, les discussions portent, outre sur les notions de thème, de topique et de « given », sur celles de focus, de focalisation et de contraste, qui font l?objet d?analyses approfondies. Les discussions s?attachent à appréhender ces notions, à la fois dans l?optique de leurs propriétés formelles, de leur fonctionnalité et des significations qu?elles contribuent à délivrer. Un chapitre entier de cette première partie est consacré à l?étude du questionnement et à la manière dont l?organisation de l?information est gérée dans l?exercice de cette activité. L?ouvrage est doté d?une bibliographie qui comporte plus de deux mille références. Cet ouvrage sera complété par une 2ème partie, en cours de rédaction, qui traitera essentiellement de la prosodie et de son rôle dans les conditionnements de l'information.
|
5-2-1 | Linguistic Data Consortium (LDC) update (July 2019) In this newsletter:
Fall 2019 LDC Data Scholarship Program
LDC data and commercial technology development
New Publications:
The DKU-JNU-EMA Electromagnetic Articulography Database
Fall 2019 LDC Data Scholarship Program
Student applications for the Fall 2019 LDC Data Scholarship program are being accepted now through September 15, 2019. This scholarship program provides eligible students with access to LDC data at no cost. Students must complete an application consisting of a data use proposal and letter of support from their advisor.
For application requirements and program rules, please visit the LDC Data Scholarship page.
LDC data and commercial technology development
For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. Visit the Licensing page for further information.
(1) The DKU-JNU-EMA Electromagnetic Articulography Database was developed by Duke Kunshan University and Jinan University and contains approximately 10 hours of articulography and speech data in Mandarin, Cantonese, Hakka, and Teochew Chinese from two to seven native speakers for each dialect.
Articulatory measurements were made using the NDI electromagnetic articulography wave research system to capture real-time vocal tract variable trajectories. Subjects had six sensors placed in various locations in their mouth and one reference sensor was placed on the bridge of their nose. For simultaneous recording of speech signals, subjects also wore a head-mounted close-talk microphone.
Speakers engaged in four different types of recording sessions: one in which they read complete sentences or short texts, and three sessions in which they read related words of a specific common consonant, vowel, or tone.
DKU-JNU-EMA Electromagnetic Articulography Database is distributed via web download.
2019 Subscription Members will automatically receive copies of this corpus. 2019 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for $1000.
*
(2) Phrase Detectives Corpus Version 2 was developed by the School of Computer Science and Electronic Engineering at the University of Essex and consists of approximately 407,000 tokens across 537 documents anaphorically-annotated by the Phrase Detectives Game, an online interactive 'game-with-a-purpose' (GWAP) designed to collect data about English anaphoric coreference.
This release constitutes a new version of the Phrase Detectives Corpus (LDC2017T08), adding significantly more annotated tokens to the data set and supplying players’ judgments and a silver label annotation based on the probabilistic aggregation method for anaphoric information for each markable.
The documents in the corpus are taken from Wikipedia articles and from narrative text in Project Gutenberg. The annotation is a simplified form of the coding scheme used in The ARRAU Corpus of Anaphoric Information (LDC2013T22).
Phrase Detectives Corpus Version 2 is distributed via web download.
2019 Subscription Members will automatically receive copies of this corpus. 2019 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data at no cost.
*
(3) First DIHARD Challenge Evaluation - Nine Sources was developed by LDC and contains approximately 18 hours of English and Chinese speech data along with corresponding annotations used in support of the First DIHARD Challenge.
The First DIHARD Challenge was an attempt to reinvigorate work on diarization through a shared task focusing on 'hard' diarization; that is, speech diarization for challenging corpora where there was an expectation that existing state-of-the-art systems would fare poorly. As such, it included speech from a wide sampling of domains representing diversity in number of speakers, speaker demographics, interaction style, recording quality, and environmental conditions as follows (all sources are in English unless otherwise indicated):
This release, when combined with First DIHARD Challenge Evaluation - SEEDLingS (LDC2019S13), contains the evaluation set audio data and annotation as well as the official scoring tool. The development data for the First DIHARD Challenge is also available from LDC as Eight Sources (LDC2019S09) and SEEDLingS (LDC2019S10).
First DIHARD Challenge Evaluation - Nine Sources is distributed via web download.
2019 Subscription Members will automatically receive copies of this corpus. 2019 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for $300.
*
(4) First DIHARD Challenge Evaluation – SEEDLingS was developed by Duke University and LDC and contains approximately two hours of English child language recordings along with corresponding annotations used in support of the First DIHARD Challenge.
The source data was drawn from the SEEDLingS (The Study of Environmental Effects on Developing Linguistic Skills) corpus, designed to investigate how infants' early linguistic and environmental input plays a role in their learning. Recordings for SEEDLingS were generated in the home environment of 44 infants from 6-18 months of age in the Rochester, New York area. A subset of that data was annotated by LDC for use in the First DIHARD Challenge.
This release, when combined with First DIHARD Challenge Evaluation - Nine Sources (LDC2019S12), contains the evaluation set audio data and annotation as well as the official scoring tool. The development data for the First DIHARD Challenge is also available from LDC as Eight Sources (LDC2019S09) and SEEDLingS (LDC2019S10).
First DIHARD Challenge Evaluation – SEEDLingS is distributed via web download.
2019 Subscription Members will receive copies of this corpus provided they have submitted a completed copy of the special license agreement. 2019 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for $50.
*
Membership Office
University of Pennsylvania
T: +1-215-573-1275
E: ldc@ldc.upenn.edu
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
| ||||||||||||||||||||||||||||||||||||
5-2-2 | ELRA - Language Resources Catalogue - Update (July 2019) We are happy to announce that 2 new Speech resources and 3 new Terminological Resources are now available in our catalogue.
ELRA-S0406 Glissando-sp ISLRN: 024-286-962-247-6 Glissando-sp includes more than 12 hours of speech in Spanish, recorded under optimal acoustic conditions, orthographically transcribed, phonetically aligned and annotated with prosodic information (location of the stressed syllables and prosodic phrasing). The corpus was recorded by 8 professional speakers and 20 non-professional speakers: 4 ?news broadcaster? professional speakers (2 male and 2 female), 4 ?advertising? professional speakers (2 male and 2 female), and 20 non-professional speakers (10 male and 10 female). Glissando-sp is made of three subcorpora: readings of real news texts (provided by ?Cadena Ser? radio station), interactions between two speakers oriented to a specific goal in the domain of information requests, and conversations between people who have some degree of familiarity with each other. For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-S0406/ ELRA-S0407 Glissando-ca
ISLRN: 780-617-066-913-1 Glissando-ca includes more than 12 hours of speech in Catalan, recorded under optimal acoustic conditions, orthographically transcribed, phonetically aligned and annotated with prosodic information (location of the stressed syllables and prosodic phrasing). The corpus was recorded by 8 professional speakers and 20 non-professional speakers: 4 ?news broadcaster? professional speakers (2 male and 2 female), 4 ?advertising? professional speakers (2 male and 2 female), and 20 non-professional speakers (10 male and 10 female). Glissando-ca is made of three subcorpora: readings of real news texts (provided by ?Cadena Ser? radio station), interactions between two speakers oriented to a specific goal in the domain of information requests, and conversations between people who have some degree of familiarity with each other. For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-S0407/ ELRA-T0378 English-Persian database of idioms and expressions ISLRN: 387-435-142-983-6 This database consists of about 30,000 bilingual parallel sentences and phrases in English and Persian (15,000 in each language). It comes with a software through which the users can search a word, phrase or chunk and receive all idioms and expressions related to the query. The database is presented in Access format and the software is executable on Windows systems. For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-T0378/ ELRA-T0379 English-Persian terminology database of computer and IT ISLRN: 760-940-374-770-6 This bilingual terminology consists of around 25,000 terms in the field of computer engineering, computer sciences and information technology. It comes with a software through which the users can search a word, phrase or chunk and receive all entries related to the query. The database is presented in Access format and the software is executable on Windows systems. For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-T0379/ ELRA-T0380 English-Persian terminology database of management and economics
ISLRN: 188-448-142-468-5 This bilingual terminology consists of around 15,000 terms in the field of management and economics sciences. It comes with a software through which the users can search a word, phrase or chunk and receive all entries related to the query. The main database of the software is presented in Access format and the software itself is executable on Windows systems. For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-T0380/ For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli@elda.org If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us. Visit the Universal Catalogue: http://universal.elra.info Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/en/catalogues/language-resources-announcements/
| ||||||||||||||||||||||||||||||||||||
5-2-3 | Speechocean – update (July 2019)
141 Hours Free Data! Join the Speech Challenge Right Now!
Introduction of OLR 2019
As an international event, the Oriental Language Recognition (OLR) Challenge series organized by Speechocean and Tsinghua University aims at boosting language recognition technology for oriental languages. Following the success of challenges held in last three years, OLR 2019 follows the same theme but will be more challenging and more interesting.
141 hours of speech recognition corpus covering 16 languages are totally free for each participant. Come and join this challenge right now!
Data Details
Test Tasks
Task 1: Short-utterance LID, where the test utterances are as short as 1 second.
Task 2: Cross-channel LID, where test data is in different channels from the training set.
Task 3: Zero-resource LID, where no resources are provided for training before inference, but several reference utterances are provided for each language.
Important Dates
Organization Committees
Tsinghua University
Speechocean
Xiamen University
Duke-Kunshan University
Northwestern Polytechnical University
Registration Procedure
Please send email to the olr19@cslt.org with the following information:
-- Team name
-- Institute
-- Participants
-- Duty person
-- Homepage of person / organization / company (If homepage is not available, any of your online papers in speech field are feasible.)
| ||||||||||||||||||||||||||||||||||||
5-2-4 | Google 's Language Model benchmark A LM benchmark is available at:https://github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark
Here is a brief description of the project.
'The purpose of the project is to make available a standard training and test setup for language modeling experiments. The training/held-out data was produced from a download at statmt.org using a combination of Bash shell and Perl scripts distributed here. This also means that your results on this data set are reproducible by the research community at large. Besides the scripts needed to rebuild the training/held-out data, it also makes available log-probability values for each word in each of ten held-out data sets, for each of the following baseline models:
ArXiv paper: http://arxiv.org/abs/1312.3005
Happy benchmarking!'
| ||||||||||||||||||||||||||||||||||||
5-2-5 | Forensic database of voice recordings of 500+ Australian English speakers Forensic database of voice recordings of 500+ Australian English speakers
| ||||||||||||||||||||||||||||||||||||
5-2-6 | Audio and Electroglottographic speech recordings
Audio and Electroglottographic speech recordings from several languages We are happy to announce the public availability of speech recordings made as part of the UCLA project 'Production and Perception of Linguistic Voice Quality'. http://www.phonetics.ucla.edu/voiceproject/voice.html Audio and EGG recordings are available for Bo, Gujarati, Hmong, Mandarin, Black Miao, Southern Yi, Santiago Matatlan/ San Juan Guelavia Zapotec; audio recordings (no EGG) are available for English and Mandarin. Recordings of Jalapa Mazatec extracted from the UCLA Phonetic Archive are also posted. All recordings are accompanied by explanatory notes and wordlists, and most are accompanied by Praat textgrids that locate target segments of interest to our project. Analysis software developed as part of the project – VoiceSauce for audio analysis and EggWorks for EGG analysis – and all project publications are also available from this site. All preliminary analyses of the recordings using these tools (i.e. acoustic and EGG parameter values extracted from the recordings) are posted on the site in large data spreadsheets. All of these materials are made freely available under a Creative Commons Attribution-NonCommercial-ShareAlike-3.0 Unported License. This project was funded by NSF grant BCS-0720304 to Pat Keating, Abeer Alwan and Jody Kreiman of UCLA, and Christina Esposito of Macalester College. Pat Keating (UCLA)
| ||||||||||||||||||||||||||||||||||||
5-2-7 | EEG-face tracking- audio 24 GB data set Kara One, Toronto, Canada We are making 24 GB of a new dataset, called Kara One, freely available. This database combines 3 modalities (EEG, face tracking, and audio) during imagined and articulated speech using phonologically-relevant phonemic and single-word prompts. It is the result of a collaboration between the Toronto Rehabilitation Institute (in the University Health Network) and the Department of Computer Science at the University of Toronto.
In the associated paper (abstract below), we show how to accurately classify imagined phonological categories solely from EEG data. Specifically, we obtain up to 90% accuracy in classifying imagined consonants from imagined vowels and up to 95% accuracy in classifying stimulus from active imagination states using advanced deep-belief networks.
Data from 14 participants are available here: http://www.cs.toronto.edu/~complingweb/data/karaOne/karaOne.html.
If you have any questions, please contact Frank Rudzicz at frank@cs.toronto.edu.
Best regards, Frank
PAPER Shunan Zhao and Frank Rudzicz (2015) Classifying phonological categories in imagined and articulated speech. In Proceedings of ICASSP 2015, Brisbane Australia ABSTRACT This paper presents a new dataset combining 3 modalities (EEG, facial, and audio) during imagined and vocalized phonemic and single-word prompts. We pre-process the EEG data, compute features for all 3 modalities, and perform binary classi?cation of phonological categories using a combination of these modalities. For example, a deep-belief network obtains accuracies over 90% on identifying consonants, which is signi?cantly more accurate than two baseline supportvectormachines. Wealsoclassifybetweenthedifferent states (resting, stimuli, active thinking) of the recording, achievingaccuraciesof95%. Thesedatamaybeusedtolearn multimodal relationships, and to develop silent-speech and brain-computer interfaces.
| ||||||||||||||||||||||||||||||||||||
5-2-8 | TORGO data base free for academic use. In the spirit of the season, I would like to announce the immediate availability of the TORGO database free, in perpetuity for academic use. This database combines acoustics and electromagnetic articulography from 8 individuals with speech disorders and 7 without, and totals over 18 GB. These data can be used for multimodal models (e.g., for acoustic-articulatory inversion), models of pathology, and augmented speech recognition, for example. More information (and the database itself) can be found here: http://www.cs.toronto.edu/~complingweb/data/TORGO/torgo.html.
| ||||||||||||||||||||||||||||||||||||
5-2-9 | Datatang Datatang is a global leading data provider that specialized in data customized solution, focusing in variety speech, image, and text data collection, annotation, crowdsourcing services.
Summary of the new datasets (2018) and a brief plan for 2019.
? Speech data (with annotation) that we finished in 2018
?2019 ongoing speech project
On top of the above, there are more planed speech data collections, such as Japanese speech data, children`s speech data, dialect speech data and so on.
What is more, we will continually provide those data at a competitive price with a maintained high accuracy rate.
If you have any questions or need more details, do not hesitate to contact us jessy@datatang.com
It would be possible to send you with a sample or specification of the data.
| ||||||||||||||||||||||||||||||||||||
5-2-10 | Fearless Steps Corpus (University of Texas, Dallas) Fearless Steps Corpus John H.L. Hansen, Abhijeet Sangwan, Lakshmish Kaushik, Chengzhu Yu Center for Robust Speech Systems (CRSS), Eric Jonsson School of Engineering, The University of Texas at Dallas (UTD), Richardson, Texas, U.S.A.
| ||||||||||||||||||||||||||||||||||||
5-2-11 | SIWIS French Speech Synthesis Database The SIWIS French Speech Synthesis Database includes high quality French speech recordings and associated text files, aimed at building TTS systems, investigate multiple styles, and emphasis. A total of 9750 utterances from various sources such as parliament debates and novels were uttered by a professional French voice talent. A subset of the database contains emphasised words in many different contexts. The database includes more than ten hours of speech data and is freely available.
| ||||||||||||||||||||||||||||||||||||
5-2-12 | JLCorpus - Emotional Speech corpus with primary and secondary emotions JLCorpus - Emotional Speech corpus with primary and secondary emotions:
For further understanding the wide array of emotions embedded in human speech, we are introducing an emotional speech corpus. In contrast to the existing speech corpora, this corpus was constructed by maintaining an equal distribution of 4 long vowels in New Zealand English. This balance is to facilitate emotion related formant and glottal source feature comparison studies. Also, the corpus has 5 secondary emotions along with 5 primary emotions. Secondary emotions are important in Human-Robot Interaction (HRI), where the aim is to model natural conversations among humans and robots. But there are very few existing speech resources to study these emotions,and this work adds a speech corpus containing some secondary emotions. Please use the corpus for emotional speech related studies. When you use it please include the citation as: Jesin James, Li Tian, Catherine Watson, 'An Open Source Emotional Speech Corpus for Human Robot Interaction Applications', in Proc. Interspeech, 2018. To access the whole corpus including the recording supporting files, click the following link: https://www.kaggle.com/tli725/jl-corpus, (if you have already installed the Kaggle API, you can type the following command to download: kaggle datasets download -d tli725/jl-corpus) Or if you simply want the raw audio+txt files, click the following link: https://www.kaggle.com/tli725/jl-corpus/downloads/Raw%20JL%20corpus%20(unchecked%20and%20unannotated).rar/4 The corpus was evaluated by a large scale human perception test with 120 participants. The link to the survey are here- For Primary emorion corpus: https://auckland.au1.qualtrics.com/jfe/form/SV_8ewmOCgOFCHpAj3 For Secondary emotion corpus: https://auckland.au1.qualtrics.com/jfe/form/SV_eVDINp8WkKpsPsh These surveys will give an overall idea about the type of recordings in the corpus. The perceptually verified and annotated JL corpus will be given public access soon.
| ||||||||||||||||||||||||||||||||||||
5-2-13 | OPENGLOT –An open environment for the evaluation of glottal inverse filtering OPENGLOT –An open environment for the evaluation of glottal inverse filtering
OPENGLOT is a publically available database that was designed primarily for the evaluation of glottal inverse filtering algorithms. In addition, the database can be used in evaluating formant estimation methods. OPENGLOT consists of four repositories. Repository I contains synthetic glottal flow waveforms, and speech signals generated by using the Liljencrants–Fant (LF) waveform as an excitation, and an all-pole vocal tract model. Repository II contains glottal flow and speech pressure signals generated using physical modelling of human speech production. Repository III contains pairs of glottal excitation and speech pressure signal generated by exciting 3D printed plastic vocal tract replica with LF excitations via a loudspeaker. Finally, Repository IV contains multichannel recordings (speech pressure signal, EGG, high-speed video of the vocal folds) from natural production of speech.
OPENGLOT is available at: http://research.spa.aalto.fi/projects/openglot/
| ||||||||||||||||||||||||||||||||||||
5-2-14 | Corpus Rhapsodie Nous sommes heureux de vous annoncer la publication d¹un ouvrage consacré
| ||||||||||||||||||||||||||||||||||||
5-2-15 | The My Science Tutor Children?s Conversational Speech Corpus (MyST Corpus) , Boulder Learning Inc. The My Science Tutor Children?s Conversational Speech Corpus (MyST Corpus) is the world?s largest English children?s speech corpus. It is freely available to the research community for research use. Companies can acquire the corpus for $10,000. The MyST Corpus was collected over a 10-year period, with support from over $9 million in grants from the US National Science Foundation and Department of Education, awarded to Boulder Learning Inc. (Wayne Ward, Principal Investigator). The MyST corpus contains speech collected from 1,374 third, fourth and fifth grade students. The students engaged in spoken dialogs with a virtual science tutor in 8 areas of science. A total of 11,398 student sessions of 15 to 20 minutes produced a total of 244,069 utterances. 42% of the utterances have been transcribed at the word level. The corpus is partitioned into training and test sets to support comparison of research results across labs. All parents and students signed consent forms, approved by the University of Colorado?s Institutional Review Board, that authorize distribution of the corpus for research and commercial use. The MyST children?s speech corpus contains approximately ten times as many spoken utterances as all other English children?s speech corpora combined (see https://en.wikipedia.org/wiki/List_of_children%27s_speech_corpora). Additional information about the corpus, and instructions for how to acquire the corpus (and samples of the speech data) can be found on the Boulder Learning Web site at http://boulderlearning.com/request-the-myst-corpus/.
|
5-3-1 | Release of the version 2 of FASST (Flexible Audio Source Separation Toolbox).Release of the version 2 of FASST (Flexible Audio Source Separation Toolbox). http://bass-db.gforge.inria.fr/fasst/ This toolbox is intended to speed up the conception and to automate the implementation of new model-based audio source separation algorithms. It has the following additions compared to version 1: * Core in C++ * User scripts in MATLAB or python * Speedup * Multichannel audio input We provide 2 examples: 1. two-channel instantaneous NMF 2. real-world speech enhancement (2nd CHiME Challenge, Track 1)
| |||||
5-3-2 | Cantor Digitalis, an open-source real-time singing synthesizer controlled by hand gestures. We are glad to announce the public realease of the Cantor Digitalis, an open-source real-time singing synthesizer controlled by hand gestures. It can be used e.g. for making music or for singing voice pedagogy. A wide variety of voices are available, from the classic vocal quartet (soprano, alto, tenor, bass), to the extreme colors of childish, breathy, roaring, etc. voices. All the features of vocal sounds are entirely under control, as the synthesis method is based on a mathematic model of voice production, without prerecording segments. The instrument is controlled using chironomy, i.e. hand gestures, with the help of interfaces like stylus or fingers on a graphic tablet, or computer mouse. Vocal dimensions such as the melody, vocal effort, vowel, voice tension, vocal tract size, breathiness etc. can easily and continuously be controlled during performance, and special voices can be prepared in advance or using presets. Check out the capabilities of Cantor Digitalis, through performances extracts from the ensemble Chorus Digitalis: http://youtu.be/_LTjM3Lihis?t=13s. In pratice, this release provides:
Regards,
The Cantor Digitalis team (who loves feedback — cantordigitalis@limsi.fr) Christophe d'Alessandro, Lionel Feugère, Olivier Perrotin http://cantordigitalis.limsi.fr/
| |||||
5-3-3 | MultiVec: a Multilingual and MultiLevel Representation Learning Toolkit for NLP
We are happy to announce the release of our new toolkit “MultiVec” for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes Mikolov et al. [2013b]’s word2vec features, Le and Mikolov [2014]’s paragraph vector (batch and online) and Luong et al. [2015]’s model for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification. The toolkit also includes C++ and Python libraries, that you can use to query bilingual and monolingual models.
The project is fully open to future contributions. The code is provided on the project webpage (https://github.com/eske/multivec) with installation instructions and command-line usage examples.
When you use this toolkit, please cite:
@InProceedings{MultiVecLREC2016, Title = {{MultiVec: a Multilingual and MultiLevel Representation Learning Toolkit for NLP}}, Author = {Alexandre Bérard and Christophe Servan and Olivier Pietquin and Laurent Besacier}, Booktitle = {The 10th edition of the Language Resources and Evaluation Conference (LREC 2016)}, Year = {2016}, Month = {May} }
The paper is available here: https://github.com/eske/multivec/raw/master/docs/Berard_and_al-MultiVec_a_Multilingual_and_Multilevel_Representation_Learning_Toolkit_for_NLP-LREC2016.pdf
Best regards,
Alexandre Bérard, Christophe Servan, Olivier Pietquin and Laurent Besacier
| |||||
5-3-4 | An android application for speech data collection LIG_AIKUMA We are pleased to announce the release of LIG_AIKUMA, an android application for speech data collection, specially dedicated to language documentation. LIG_AIKUMA is an improved version of the Android application (AIKUMA) initially developed by Steven Bird and colleagues. Features were added to the app in order to facilitate the collection of parallel speech data in line with the requirements of a French-German project (ANR/DFG BULB - Breaking the Unwritten Language Barrier).
The resulting app, called LIG-AIKUMA, runs on various mobile phones and tablets and proposes a range of different speech collection modes (recording, respeaking, translation and elicitation). It was used for field data collections in Congo-Brazzaville resulting in a total of over 80 hours of speech.
Users who just want to use the app without access to the code can download it directly from the forge direct link: https://forge.imag.fr/frs/download.php/706/MainActivity.apk
Code is also available on demand (contact elodie.gauthier@imag.fr and laurent.besacier@imag.fr).
More details on LIG_AIKUMA can be found on the following paper: http://www.sciencedirect.com/science/article/pii/S1877050916300448
| |||||
5-3-5 | Web services via ALL GO from IRISA-CNRS It is our pleasure to introduce A||GO (https://allgo.inria.fr/ or http://allgo.irisa.fr/), a platform providing a collection of web-services for the automatic analysis of various data, including multimedia content across modalities. The platform builds on the back-end web service deployment infrastructure developed and maintained by Inria?s Service for Experimentation and Development (SED). Originally dedicated to multimedia content, A||GO progressively broadened to other fields such as computational biology, networks and telecommunications, computational graphics or computational physics.
| |||||
5-3-6 | Clickable map - Illustrations of the IPA Clickable map - Illustrations of the IPA
| |||||
5-3-7 | LIG-Aikuma running on mobile phones and tablets
| |||||
5-3-8 | Python Library Nous sommes heureux d'annoncer la mise à disposition du public de la
première bibliothèque en langage Python pour convertir des nombres écrits en
français en leur représentation en chiffres.
L'analyseur est robuste et est capable de segmenter et substituer les expressions
de nombre dans un flux de mots, comme une conversation par exemple. Il reconnaît les différentes
variantes de la langue (quantre-vingt-dix / nonante?) et traduit aussi bien les
ordinaux que les entiers, les nombres décimaux et les séquences formelles (n° de téléphone, CB?).
Nous espérons que cet outil sera utile à celles et ceux qui, comme nous, font du traitment
du langage naturel en français.
Cette bibliothèque est diffusée sous license MIT qui permet une utilisation très libre.
Sources : https://github.com/allo-media/text2num
|