ISCApad #244 |
Friday, October 12, 2018 by Chris Wellekens |
PhD Project – Call for Applications Situated Learning for Collaboration across Language Barriers People working in development are often deployed to remote locations where they work alongside locals who speak an unwritten minority language. Outsiders and locals share knowhow and pick up phrases in each other’s languages. They are performing a type of situated learning of language and culture. This situation is found across the world, in developing countries, border zones, and in indigenous communities. This project will develop computational tools to help people work together across language barriers. The research will be evaluated in terms of the the quality of the social interaction, the mutual acquisition of language and culture, the effectiveness of cross-lingual collaboration, and the quantity of translated speech data collected. The ultimate goal is to contribute to the grand task of documenting world’s languages. The project will involve working between France and Australia, and will include fieldwork with a remote indigenous community. We’re looking for outstanding and highly motivated candidates to work on a PhD on this subject. Competencies in two or more of the following areas are mandatory: • machine learning for natural language processing; • speech processing for interactive systems; • participatory design; • mobile software development; • documenting and describing unwritten languages. The project will build on previous work in the following areas: mobile platforms for collecting spoken language data [6, 7]; respeaking as a technique for improving the value of recordings made ‘in the wild’ and an alternative to traditional transcription practices [12, 13]; machine learning of structure in phrase-aligned bilingual speech recordings [2, 3, 4, 8, 9, 10, 11]; participatory design of mobile technologies for working with minority languages [5]; managing multilingual databases of text, speech and images [1]. Some recent indicative PhD theses include: Computer Supported Collaborative Language Documentation (Florian Hanke, 2017); Automatic Understanding of Unwritten Languages (Oliver Adams, 2018); Collecter, Transcrire, Analyser : quand la Machine Assiste le Linguiste dans son Travail de Terrain (Elodie Gauthier, 2018); Enriching Endangered Language Resources using Translations (Antonios Anastasopoulos, in prep); Digital Tool Deployment for Language Documentation (Mat Bettinson, in prep); Bayesian and Neural Modeling for Multi Level and Crosslingual Alignment (Pierre Godard, in prep).
[2] Oliver Adams, Graham Neubig, Trevor Cohn, and Steven Bird. Learning a translation model from word lattices. In Interspeech 2016, pages 2518–22, 2016. [3] Antonios Anastasopoulos, Sameer Bansal, David Chiang, Sharon Goldwater, and Adam Lopez. Spoken term discovery for language documentation using translations. In Proceedings of the Workshop on Speech-Centric NLP, pages 53–58, 2017. [4] Antonios Anastasopoulos and David Chiang. A case study on using speech-to-translation alignments for language documentation. In Proc. Workshop on Use of Computational Methods in Study of Endangered Languages, pages 170–178, 2017. [5] Steven Bird. Designing mobile applications for endangered languages. In Kenneth Rehg and Lyle Campbell, editors, Oxford Handbook of Endangered Languages. Oxford University Press, 2018. [6] Steven Bird, Florian R. Hanke, Oliver Adams, and Haejoong Lee. Aikuma: A mobile app for collaborative language documentation. In Proceedings of the Workshop on the Use of Computational Methods in the Study of Endangered Languages. ACL, 2014. [7] David Blachon, Elodie Gauthiera, Laurent Besacier, Guy-No¨el Kouaratab, Martine Adda-Decker, and Annie Rialland. Parallel speech collection for under-resourced language studies using the Lig-Aikuma mobile device app. In Proceedings of the Fifth Workshop on Spoken Language Technologies for Under-resourced languages, volume 81, pages 61–66, 2016. [8] V. H. Do, N. F. Chen, B. P. Lim, and M. A. Hasegawa-Johnson. Multitask learning for phone recognition of underresourced languages using mismatched transcription. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26:501–514, 2018. [9] Ewan Dunbar, Xuan Nga Cao, Juan Benjumea, Julien Karadayi, Mathieu Bernard, Laurent Besacier, Xavier Anguera, and Emmanuel Dupoux. The zero resource speech challenge 2017. In Automatic Speech Recognition and Understanding (ASRU), 2017 IEEE Workshop on. IEEE. [10] Long Duong, Antonios Anastasopoulos, David Chiang, Steven Bird, and Trevor Cohn. An attentional model for speech translation without transcription. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 949–959, 2016. [11] Pierre Godard, Gilles Adda, Martine Adda-Decker, Alexandre Allauzen, Laurent Besacier, Helene Bonneau-Maynard, Guy-No¨el Kouarata, Kevin L¨oser, Annie Rialland, and Franc¸ois Yvon. Preliminary experiments on unsupervised word discovery in Mboshi. In Interspeech 2016, 2016. [12] Mark Liberman, Jiahong Yuan, Andreas Stolcke, Wen Wang, and Vikramjit Mitra. Using multiple versions of speech input in phone recognition. In ICASSP, pages 7591–95. IEEE, 2013. [13] Anthony C. Woodbury. Defining documentary linguistics. Language Documentation and Description, 1:35–51, 2003. |
Back | Top |