Font Size: a A A

Key Technology Research Of Dynamic Captioning Without Script

Posted on:2016-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y T ChenFull Text:PDF
GTID:2308330473960204Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
There are more than 66 million people in the world suffering from hearing impairment, which brings them difficulty in video content understanding. Captioning technology can help them in a certain degree by synchronously illustrating the content, the identity of characters and the meaning of dialogues during the playing of videos. However, most of the existing captioning technologies are far from satisfactory in assisting the hearing-impaired users to enjoy videos. Dynamic captioning without script is put forward in this thesis, including many technologies, such as speaker segmentation and clustering, blind source separation, automatic speech recognition, face detection and so on.Different from the static captioning, dynamic captioning described in this thesis puts the subtitles around the speaker faces automaticly, insteading of simply placing in a specific location, like the bottom of the screen. So that it can help the hearing-impaired audiences quicker recognize the speaking characters, understand the plots of the story as well as the dialogures between speakers. They can better enjoy the video in case of distraction by switching the pictures and the words constantly.The scheme focuses on the script independent, intending to apply the audio visual technology, which makes the voice into text directly through automatic speech recognition. It doesn’t depend on the subtitle-script file any more, so that it can be widely used, compared to the existing dynamic captioning.Dynamic captioning without script contains three main components:1. Subtitle-face matching; 2. Script location; 3. Interface designing and error correction. This thesis will introduce the first component, which is the basis of the work and the core of the whole system. The correct rate influences the feasibility of the system and the worktime in the third component at the same time. It will prove theoretically that faces can successfully match corresponding speeches through the improvement of experiments in speaker segmentation and clustering as well as blind source separation. Finally it will finish subtitle-face matching by speech recognition technology, turning speeches into words.
Keywords/Search Tags:hearing impairment, voice, script independent, dynamic captioning
PDF Full Text Request
Related items