| Three Funded PhD Research Studentships at the Centre for Speech Technology Research, University of Edinburgh.
Please see http://www.cstr.ed.ac.uk/opportunities for full details, eligibility requirements, application procedure and deadlines.
1. Embedding enhancement information in the speech signal
Speech becomes harder to understand in the presence of noise and other distortions, such as telephone channels. This is especially true for people with a hearing impairment. It is difficult to enhance the intelligibility of a received speech+noise mixture, or of distorted speech, even with the relatively sophisticated enhancement algorithms that modern hearing aids are capable of running. A clever way around this problem might be for the sender to add extra information to the original speech signal, before noise or distortion is added. The receiver (e.g., a hearing aid) would use this to assist speech enhancement.
Funding: Marie Sklodowska-Curie fellowship
2. Broadcast Quality End-to-end Speech Synthesis
Advances in neural networks made jointly in the fields of automatic speech recognition and speech synthesis, amongst others, have led to a new understanding of their capabilities as generative models. Neural networks can now directly generate synthetic speech waveforms, without the limited quality of a vocoder. We have made separate advances, using neural networks to discover representations of spoken and written language that have applications in lightly-supervised text processing for almost any language, and for adaptation of speaker identity and style. The project will combine these techniques into a single end-to-end model for speech synthesis. This will require new techniques to learn from both text and speech data, which may have other applications, such as automatic speech recognition.
Funding: EPSRC Industrial CASE award (in collaboration with the BBC)
3. Automatic Extraction of Rich Metadata from Broadcast Speech (in collaboration with the BBC)
The research studentship will be concerned with automatically learning to extract rich metadata information from broadcast television recordings, using speech recognition and natural language processing techniques. We will build on recent advances in convolutional and recurrent neural networks, using architectures which learn representations jointly, considering both acoustic and textual data. The project will build on our current work in the rich transcription of broadcast speech using neural network based speech recognition systems, along with neural network approaches to machine reading and summarisation. In particular, we are interested in developing approaches to transcribing broadcast speech in a way appropriate to the particular context. This may include compression or distillation of the content (perhaps to fit in with the constraints of subtitling), transforming conversational speech into a form that is more easy to read as text, or transcribing broadcast speech in a way appropriate for a particular reading age.
Funding: EPSRC Industrial CASE award (in collaboration with the BBC)
-- |