ISCA - International Speech
Communication Association

ISCApad Archive  »  2014  »  ISCApad #194  »  Events  »  ISCA Events  »  (2014-09-14) INTERSPEECH 2014 Plenary talks

ISCApad #194

Monday, August 04, 2014 by Chris Wellekens

3-1-19 (2014-09-14) INTERSPEECH 2014 Plenary talks

-            INTERSPEECH 2014 - SINGAPORE           -
-               September 14-18, 2014               -
-           -

ISCA, COLIPS and the organizing Committee of INTERSPEECH 2014
are proud to announce that INTERSPEECH 2014 will feature
five plenary talks by internationally renowned experts.

- keynote speech
  by the ISCA Medallist 2014

- 'Decision Learning in Data Science:
  Where John Nash Meets Social Media'
  by Professor K. J. Ray Liu

- 'Language Diversity: Speech Processing In A Multi-Lingual Context'
  by Dr. Lori Lamel

- 'Sound Patterns In Language'
  by Professor William Shi-Yuan WANG 王士元

- 'Achievements and Challenges of Deep Learning
  From Speech Analysis And Recognition To Language
  And Multimodal Processing'
  by Dr. Li DENG

Details of the keynote speeches and biographies of the presenters are given below.

Looking forward to welcome you in Singapore,
the organizing committee

* On Monday, 15th of September                                         *

The ISCA Medallist 2014 will give a keynote speech.
The name of the Medallist and subject of the talk will be
disclosed on the first day of INTERSPEECH 2014.

* On Tuesday morning, 16th of September                                *

Professor K. J. Ray Liu
Department of Electrical and Computer Engineering
University of Maryland, College Park

will give a presentation on:

'Decision Learning in Data Science: Where John Nash Meets Social Media'


    With the increasing ubiquity and power of mobile devices,
    as well as the prevalence of social media, more and more
    activities in our daily life are being recorded, tracked,
    and shared, creating the notion of “social media”.
    Such abundant and still growing real life data, known as
    “big data”, provide a tremendous research opportunity in many fields.
    To analyze, learn and understand such user-generated big data,
    machine learning has been an important tool and various
    machine learning algorithms have been developed.
    However, since the user-generated big data is
    the outcome of users’ decisions, actions and their socio-economic
    interactions, which are highly dynamic, without considering users’
    local behaviours and interests, existing learning approaches
    tend to focus on optimizing a global objective function at
    the macroeconomic level, while totally ignore users’ local
    decisions at the micro-economic level. As such there is a growing
    need in bridging machine/social learning with strategic decision
    making, which are two traditionally distinct research disciplines,
    to be able to jointly consider both global phenomenon and local
    effects to understand/model/analyze better the newly arising
    issues in the emerging social media. In this talk, we present
    the notion of “decision learning” that can involve users's
    behaviours and interactions by combining learning with strategic
    decision making.
    We will discuss some examples from social media with real data to
    show how decision learning can be used to better analyze users’
    optimal decision from a user’ perspective as well as design a
    mechanism from the system designer’s perspective
    to achieve a desirable outcome.

Biography of the speaker

    Dr. K. J. Ray Liu was named a Distinguished Scholar-Teacher
    of University of Maryland in 2007, where he is Christine Kim
    Eminent Professor of Information Technology.
    He leads the Maryland Signals and Information Group conducting
    research encompassing broad areas of signal processing and
    communications with recent focus on cooperative communications,
    cognitive networking, social learning and decision making,
    and information forensics and security. Dr. Liu has received
    numerous honours and awards including IEEE Signal Processing
    Society 2009 Technical Achievement Award and various best paper
    awards from IEEE Signal Processing, Communications, and Vehicular
    Technology Societies, and EURASIP. A Fellow of the IEEE and AAAS,
    he is recognized by Thomson Reuters as an ISI Highly Cited
    Dr. Liu was the President of IEEE Signal Processing Society,
    the Editor-in-Chief of IEEE Signal Processing Magazine and
    the founding Editor-in-Chief of EURASIP Journal on Advances
    in Signal Processing. Dr. Liu also received various research
    and teaching recognitions from the University of Maryland,
    including Poole and Kent Senior Faculty Teaching Award,
    Outstanding Faculty Research Award, and Outstanding Faculty
    Service Award, all from A. James Clark School of Engineering;
    and Invention of the Year Award (three times)
    from Office of Technology Commercialization.

* On Tuesday afternoon, 16th of September                              *

Dr. Lori Lamel
Senior Research scientist (DR1), LIMSI-CNRS

will give a presentation on

'Language Diversity: Speech Processing In A Multi-Lingual Context'


    Speech processing encompasses a variety of technologies
    that automatically process speech for some downstream processing.
    These technologies include identifying the language or dialect
    spoken, the person speaking, what is said and how it is said.
    The downstream processing may be limited to a transcription or
    to a transcription enhanced with additional meta-data, or may
    be used to carry out an action or interpreted within a spoken
    dialogue system or more generally for analytics.  With the
    availability of large spoken multimedia or multimodal data there is
    growing interest in using such technologies to provide structure
    and random access to particular segments. Automatic tools can also
    serve to annotate large corpora for exploitation in linguistic
    studies of spoken language, such as acoustic-phonetics,
    pronunciation variation and diachronic evolution,
    permitting the validation of hypotheses and models.
    In this talk I will present some of my experience with speech
    processing in multiple languages, drawing upon progress in the
    context of several research projects, most recently the Quaero
    program and the IARPA Babel program, both of which address the
    development of technologies in a variety of languages, with the aim
    to some highlight recent research directions and challenges.

Biography of the speaker

    I am a senior research scientist (DR1) at the CNRS, which I joined as
    a permanent researcher at LIMSI in October 1991.
    I received my Ph.D. degree in Electrical Engineering and Computer Science
    in May 1988 from the Massachusetts Institute of Technology.
    My research activities focus on large vocabulary speaker-
    independent, continuous speech recognition in multiple languages
    with a recent focus on low-resourced languages; lightly and
    unsupervised acoustic model training methods; studies in acoustic-
    phonetics; lexical and pronunciation modelling. I contributed to
    the design, and realization of large speech corpora (TIMIT, BREF,
    TED). I have been actively involved in the research projects, most
    recently leading the activities on speech processing in the OSEO
    Quaero program, and I am currently co-principal investigator for
    LIMSI as part of the IARPA Babel Babelon team led by BBN.
    I served on the Steering committee for Interspeech 2013 as
    co-technical program chair along with Pascal Perrier, and I am now
    serving on the Technical Program Committee of Interspeech 2014.

* On Wednesday, 17th of September                                      *

Professor William Shi-Yuan WANG 王士元
Centre for Language and Human Complexity,
Chinese University of Hong Kong
Professor Emeritus, University of California at Berkeley
Honorary Professor, Peking University
Academician, Academia Sinica

will give a presentation about

'Sound Patterns In Language'


    In contrast to other species, humans are unique in having developed
    thousands of diverse languages which are not mutually
    intelligible. However, any infant can learn any language with ease,
    because all languages are based upon common biological
    infrastructures of sensori-motor, memorial, and cognitive
    faculties.  While languages may differ significantly in the sounds
    they use, the overall organization is largely the same.
    It is divided into a discrete segmental system for building words
    and a continuous prosodic system for expressing, phrasing,
    attitudes, and emotions. Within this organization, I will discuss a
    class of languages called 'tone languages', which makes special use
    of F0 to build words.  Although the best known of these is Chinese,
    tone languages are found in many parts of the world, and operate on
    different principles. I will also comment on relations between
    sound patterns in language and sound patterns in music, the two
    worlds of sound universal to our species.

Biography of the speaker

    William S-Y. Wang received his early schooling in China, and his
    PhD from the University of Michigan.  He was appointed
    Professor of Linguistics at the University of California at
    Berkeley in 1965, and taught there for 30 years.
    Currently he is in the Department of Electronic Engineering and in
    the Department of Linguistics and Modern Languages of the Chinese
    University of Hong Kong, and Director of the newly established
    Joint Research Centre for Language and Human Complexity. His
    primary interest is the evolution of language from a multi-
    disciplinary perspective.

* On Thursday, 18th of September                                      *

Principal Researcher and Research Manager
Deep Learning Technology Centre,
Microsoft Research, Redmond, USA

will give a presentation on the

'Achievements and Challenges of Deep Learning
From Speech Analysis And Recognition To Language And Multimodal Processing'


    Artificial neural networks have been around for over half a century
    and their applications to speech processing have been almost as
    long, yet it was not until year 2010 that their real impact had
    been made by a deep form of such networks, built upon part of the
    earlier work on (shallow) neural nets and (deep) graphical models
    developed by both speech and machine learning communities. This
    keynote will first reflect on the path to this transformative
    success, sparked by speech analysis using deep learning methods
    on spectrogram-like raw features and then progressing rapidly to
    speech recognition with increasingly larger vocabularies and scale.
    The role of well-timed academic-industrial collaboration will be
    highlighted, so will be the advances of big data, big compute, and
    the seamless integration between the application-domain knowledge
    of speech and general principles of deep learning. Then, an
    overview will be given on sweeping achievements of deep learning in
    speech recognition since its initial success in 2010 (as well as in
    image recognition and computer vision since 2012). Such
    achievements have resulted in across-the-board, industry-wide
    deployment of deep learning. The final part of the talk will look
    ahead towards stimulating new challenges of deep learning ---
    making intelligent machines capable of not only hearing (speech)
    and seeing (vision), but also of thinking with a “mind”; i.e.
    reasoning and inference over complex, hierarchical relationships
    and knowledge sources that comprise a vast number of entities
    and semantic concepts in the real world based in part on multi-
    sensory data from the user.  To this end, language and multimodal
    processing --- joint exploitation and learning from text,
    speech/audio, and image/video --- is evolving into a new frontier
    of deep learning, beginning to be embraced by a mixture of research
    communities including speech and spoken language processing,
    natural language processing, computer vision, machine learning,
    information retrieval, cognitive science, artificial intelligence,
    and data/knowledge management. A review of recent published studies
    will be provided on deep learning applied to selected language and
    multimodal processing tasks, with a trace back to the relevant
    early connectionist modelling and neural network literature and
    with future directions in this new exciting deep learning frontier
    discussed and analyzed.

Biography of the speaker

    Li Deng received Ph.D. from the University of Wisconsin-Madison.
    He was a tenured professor (1989-1999) at the University of
    Waterloo, Ontario, Canada, and then joined Microsoft Research,
    Redmond, where he is currently a Principal Research Manager of its
    Deep Learning Technology Centre.
    Since 2000, he has also been an affiliate full professor at the
    University of Washington, Seattle, teaching computer speech
    processing. He has been granted over 60 US or international
    patents, and has received numerous awards and honours
    bestowed by IEEE, ISCA, ASA, and Microsoft including the latest
    IEEE SPS Best Paper Award (2013) on deep neural nets for speech
    recognition. He authored or co-authored 4 books including the
    latest one on Deep Learning: Methods and Applications. He is a
    Fellow of the Acoustical Society of America, a Fellow of the IEEE,
    and a Fellow of the ISCA. He served as the Editor-in-Chief
    for IEEE Signal Processing Magazine (2009-2011), and currently as
    Editor-in-Chief for IEEE Transactions on Audio, Speech and Language
    Processing. His recent research interests and activities have been
    focused on deep learning and machine intelligence applied to
    large-scale text analysis and to speech/language/image
    multimodal processing, advancing his earlier work with
    collaborators on speech analysis and recognition using deep neural
    networks since 2009.

Back  Top

 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA