ISCA Services

ISCA - International Speech
Communication Association

ISCApad Archive » 2024 » ISCApad #310 » Resources » Software

ISCApad #310

Tuesday, April 09, 2024 by Chris Wellekens

5-3 Software

5-3-1

Cantor Digitalis, an open-source real-time singing synthesizer controlled by hand gestures.

We are glad to announce the public realease of the Cantor Digitalis, an open-source real-time singing synthesizer controlled by hand gestures.

It can be used e.g. for making music or for singing voice pedagogy.

A wide variety of voices are available, from the classic vocal quartet (soprano, alto, tenor, bass), to the extreme colors of childish, breathy, roaring, etc. voices. All the features of vocal sounds are entirely under control, as the synthesis method is based on a mathematic model of voice production, without prerecording segments.

The instrument is controlled using chironomy, i.e. hand gestures, with the help of interfaces like stylus or fingers on a graphic tablet, or computer mouse. Vocal dimensions such as the melody, vocal effort, vowel, voice tension, vocal tract size, breathiness etc. can easily and continuously be controlled during performance, and special voices can be prepared in advance or using presets.

Check out the capabilities of Cantor Digitalis, through performances extracts from the ensemble Chorus Digitalis:
http://youtu.be/_LTjM3Lihis?t=13s.

In pratice, this release provides:

the synthesizer application
the source code in the form of a Max package (GPL-like license)
a documentation for the musician and another for the developper

What do you need ?

a Mac OSX
ideally a Wacom graphic tablet, but it also works with your computer mouse
for the developers, the Max software

Interested ?

To download the Cantor Digitalis, click here
To subscribe to the Cantor Digitalisnewsletter and/or the forum list, or to contact the developers, click here
To learn about the Chorus Digitalis, ensemble of Cantor Digitalisand watch videos of performances, click here
For more details about the Cantor Digitalis, click here

Regards,

The Cantor Digitalis team (who loves feedback — cantordigitalis@limsi.fr)
Christophe d'Alessandro, Lionel Feugère, Olivier Perrotin
http://cantordigitalis.limsi.fr/

Back

Top

5-3-2

MultiVec: a Multilingual and MultiLevel Representation Learning Toolkit for NLP

We are happy to announce the release of our new toolkit “MultiVec” for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes Mikolov et al. [2013b]’s word2vec features, Le and Mikolov [2014]’s paragraph vector (batch and online) and Luong et al. [2015]’s model for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification. The toolkit also includes C++ and Python libraries, that you can use to query bilingual and monolingual models.

The project is fully open to future contributions. The code is provided on the project webpage (https://github.com/eske/multivec) with installation instructions and command-line usage examples.

When you use this toolkit, please cite:

@InProceedings{MultiVecLREC2016,

Title = {{MultiVec: a Multilingual and MultiLevel Representation Learning Toolkit for NLP}},

Author = {Alexandre Bérard and Christophe Servan and Olivier Pietquin and Laurent Besacier},

Booktitle = {The 10th edition of the Language Resources and Evaluation Conference (LREC 2016)},

Year = {2016},

Month = {May}

}

The paper is available here: https://github.com/eske/multivec/raw/master/docs/Berard_and_al-MultiVec_a_Multilingual_and_Multilevel_Representation_Learning_Toolkit_for_NLP-LREC2016.pdf

Best regards,

Alexandre Bérard, Christophe Servan, Olivier Pietquin and Laurent Besacier

Back

Top

5-3-3

An android application for speech data collection LIG_AIKUMA

We are pleased to announce the release of LIG_AIKUMA, an android application for speech data collection, specially dedicated to language documentation. LIG_AIKUMA is an improved version of the Android application (AIKUMA) initially developed by Steven Bird and colleagues. Features were added to the app in order to facilitate the collection of parallel speech data in line with the requirements of a French-German project (ANR/DFG BULB - Breaking the Unwritten Language Barrier).

The resulting app, called LIG-AIKUMA, runs on various mobile phones and tablets and proposes a range of different speech collection modes (recording, respeaking, translation and elicitation). It was used for field data collections in Congo-Brazzaville resulting in a total of over 80 hours of speech.

Users who just want to use the app without access to the code can download it directly from the forge direct link: https://forge.imag.fr/frs/download.php/706/MainActivity.apk

Code is also available on demand (contact elodie.gauthier@imag.fr and laurent.besacier@imag.fr).

More details on LIG_AIKUMA can be found on the following paper: http://www.sciencedirect.com/science/article/pii/S1877050916300448

Back

Top

5-3-4

Web services via ALL GO from IRISA-CNRS

It is our pleasure to introduce A||GO (https://allgo.inria.fr/ or http://allgo.irisa.fr/), a platform providing a collection of web-services for the automatic analysis of various data, including multimedia content across modalities. The platform builds on the back-end web service deployment infrastructure developed and maintained by Inria?s Service for Experimentation and Development (SED). Originally dedicated to multimedia content, A||GO progressively broadened to other fields such as computational biology, networks and telecommunications, computational graphics or computational physics.

As part of the CNRS PlaSciDo initiative [1], the Linkmedia team at IRISA / Inria Rennes is making available via A||GO a number of web services devoted to multimedia content analysis across modalities (language, audio, image, video). The web services provided currently include research results from the Linkmedia team as well as contribution from a number of partners. A list of the services available by the date is given below and the current state is available at https://www-linkmedia.irisa.fr/software along with demo videos. Most web services are interoperable, facilitating the implementation of a multimedia content analysis processing chain, and are free to use for trial, prototyping or lab work. A brief and free account creation step will allow you to execute the web-services using either the graphical interface or a command line via a dedicated API.

We expect the number of web services to grow over time and invite interested parties to contact us should they wish to contribute the multimedia web service offer of A||GO.

List of multimedia content analysis tools currently available on A||GO:
- Audio Processing
        SaMuSa: music/speech segmentation
        SilAD: silence detection
        Radi.sh: repeated audio motif discovery
        LORIA STS v2: speech transcription for the French language from LORIA
        Multi channel BSS locate: audio source localization toolbox from IRISA-PANAMA
        A-spade: audio declipper from IRISA-PANAMA
        Transvox: voice faker from LORIA
- Natural Language Processing
        NERO: name entity recognition
        TermEx: keywords/indexing terms detection
        Otis!: topic segmentation
        Hi-tost: hierarchical topic structuring
- Video Processing
        Vidseg: video shot segmentation
        HUFA: face detection and tracking
Shortcuts to Linkmedia services are also available here: https://www-linkmedia.irisa.fr/software/

For more information don't hesitate to contact us (contact-multimedia-allgo@irisa.fr).

Gabriel Sargent and Guillaume Gravier
--
Linkmedia
IRISA - CNRS
Rennes, France

Back

Top

5-3-5

Clickable map - Illustrations of the IPA

Clickable map - Illustrations of the IPA

We have produced a clickable map showing the Illustrations of the International Phonetic
Alphabet.

The map is being updated with each new issue of the Journal of the International Phonetic
Association.

https://richardbeare.github.io/marijatabain/ipa_illustrations_all.html

Marija Tabain - La Trobe University, Australia
Richard Beare - Monash University & MCRI, Australia

Back

Top

5-3-6

LIG-Aikuma running on mobile phones and tablets

Dear all,

LIG is pleased to inform you that the website for the app Lig-Aikuma is online: https://lig-aikuma.imag.fr/

In the same time, an update of Lig-Aikuma (V3) was made available (see website).

LIG-AIKUMA is a free Android app running on various mobile phones and tablets. The app proposes a range of different speech collection modes (recording, respeaking, translation and elicitation) and offers the possibility to share recordings between users. LIG-AIKUMA is built upon the initial AIKUMA app developed by S. Bird & F. Hanke (see https://en.wikipedia.org/wiki/Aikuma for more information)

Improvements of the app:

Visual upgrade:
+ Waveform visualizer on the Respeaking and Translation modes (possibility to zoom in/out the audio signal)
+ File explorer included in all modes, to facilitate the navigation between files
+ New Share mode to share recordings between devices (by Bluetooth, Mail, NFC if available)
+ French and German languages available. In addition to English, the application now supports French and German languages. Lig-Aikuma uses by default the language of the phone/tablet.
+ New icons, more consistent to discriminate all type of files (audio, text, image, video)

Conceptual upgrade:
+ New name for the root project: ligaikuma ?> /! Henceforth, all data will be stored into this directory instead of ?aikuma? (in the previous versions of the app). This change doesn?t have compatibility issues. In the file explorer of the mode, the default position is this root directory. Just go back once with the left grey arrow (on the lower left of the screen) and select the ?aikuma? directory to access to your old recordings
+ Generation of a PDF consent form (from informations filled in the metadata form) that can be signed by linguist and speaker thanks to a pdf annotation tool (like Adobe Fill & Sign mobile app)
+ Generation of a CSV file which can be imported in Elan software: it will automatically create segmented tier, as it was done during a respeaking or a translation session. It will also mention by a ?non-speech? label that a segment has no speech.
+ Géolocalisation of the recordings
+ Respeak an elicit file: it is now possible to use in Respeaking or Translation mode an audio file initially recorded in Elicitation mode

Structural upgrade:
+ Undo button on Elicitation to erase/redo the current recording
+ Improvement session backup on Elicitation
+ Non-speech button in Respeaking and Translation modes to indicate by a comment that the segment does not contain speech (but noise or silent for instance)
+ Automatic speaker profile creation to quickly fill in the metadata infos if several sessions with a same speaker

Best regards,

Elodie Gauthier & Laurent Besacier

Back

Top

5-3-7

Python Library

Nous sommes heureux d'annoncer la mise à disposition du public de la

première bibliothèque en langage Python pour convertir des nombres écrits en

français en leur représentation en chiffres.

L'analyseur est robuste et est capable de segmenter et substituer les expressions

de nombre dans un flux de mots, comme une conversation par exemple. Il reconnaît les différentes

variantes de la langue (quantre-vingt-dix / nonante?) et traduit aussi bien les

ordinaux que les entiers, les nombres décimaux et les séquences formelles (n° de téléphone, CB?).

Nous espérons que cet outil sera utile à celles et ceux qui, comme nous, font du traitment

du langage naturel en français.

Cette bibliothèque est diffusée sous license MIT qui permet une utilisation très libre.

Pypi : https://pypi.org/project/text2num/

Sources : https://github.com/allo-media/text2num

Doc : http://text2num.readthedocs.io/

--

Romuald Texier-Marcadé

http://www.allo-media.fr

Back

Top

5-3-8

Evaluation des troubles moteurs de la parole MONPAGE version 2.0.s

Chères et chers collègues,

Après plusieurs années de travail, nous avons le plaisir de vous annoncer la mise en ligne du protocole d?évaluation des troubles moteurs de la parole MONPAGE, version 2.0.s, à présent normalisé et validé.

Cet outil, mis librement à disposition de la communauté, est destiné à l?évaluation clinique des troubles moteurs de la parole légers à modérés (dysarthries et apraxies de la parole) chez l?adulte francophone. Il a été élaboré par un groupe de chercheurs et cliniciens belges, suisses, français et québécois. Il s?agit d?une batterie d?évaluation de la parole comprenant une passation informatisée (avec enregistrement des productions des patients) et des analyses perceptives et acoustiques semi-automatiques. Sa prise en main nécessite un minimum de compétences en phonétique acoustique.

Vous trouverez la présentation, les références, les ressources et les liens de téléchargement de MonPaGe-2.0.s sur le site : https://lpp.in2p3.fr/monpage/
Nous recommandons de commencer la prise en main en lisant le manuel de l'utilisateur.

N'hésitez pas à diffuser l'information!
Pour l?équipe MonPaGe,
Véronique

Prof. Véronique Delvaux, PhD
Chercheur qualifié FNRS à l'UMONS
Chargée de cours UMONS & ULB
Service de Métrologie et Sciences du Langage
Local ?1.7, Place du Parc, 18, 7000 Mons
+3265373140

Back

Top

5-3-9

VocalTractLab3D: articulatory synthesis software.

Bonjour à tous,

Je vous informe par ce mail que le logiciel de synthèse articulatoire VocalTractLab3D est maintenant en ligne et à la disposition de tous librement:

https://vocaltractlab.de/index.php?page=vocaltractlab-download

VocalTractLab est un logiciel de synthèse articulatoire développé principalement par Peter Birkholz à la chaire de technologies de la parole et systèmes cognitifs de l’université de Dresde.

Au cours de mon postdoc j’ai travaillé à développer une version spéciale, VocalTractLab3D, qui inclut des simulations acoustiques 3Ds efficaces pilotées par une interface graphique.

Contrairement aux simulations acoustiques couramment utilisées dans l’étude de la parole, qui reposent sur une approche 1D basée sur la function d’aire, les simulations 3Ds décrivent le champ acoustique dans toutes les dimensions de l’espace et prennent en compte la forme 3D précise du conduit vocal.

Elles sont de fait plus précises, en particulier en haute fréquence (à partir d’environ 2-3 kHz).

Leur limitation est cependant le temps de calcul. Dans notre projet nous avons travaillé à repousser cette limite et notre logiciel permet de réaliser ce type de simulation dans un temps raisonnable (environ 1 heure pour une géométrie statique avec une solution précise).

Une autre limitation courante des simulations 3Ds est la nécessité de maitriser des methods de simulations assez techniques tells que les éléments finis, différences finies ou autres. Cela passe souvent par l’utilisation d’un language de programmation.

Nous avons également travaillé à repousser cette limite pour rendre accessible ce type de simulation au plus grand nombre: dans VocalTractLab3D ces simulations sont pilotées par une interface graphique et il n’est pas nécessaire de comprendre exactement comment fonctionne la méthode pour pouvoir calculer des fonctions de transfert ou des champs acoustiques.

Si suffisamment de personnes sont intéressées, je peux faire une présentation en ligne du logiciel, pour expliquer plus en detail en quoi il consiste, à quoi il peut servir et comment l’utiliser.

Ecrivez-moi si cela vous intéresse.

N’hésitez pas également à me contacter si vous avez des questions par rapport à ce logiciel.

Bien à vous,

Rémi Blandin

Back

Top

Organisation	Events	Membership	Help
> Board	> Interspeech	> Join - renew	> Sitemap
> Legal documents	> Workshops	> Membership directory	> Contact
> Logos			> FAQ
			> Privacy policy