Font Size: a A A

Research Of Lip-reading Recognition Based On Long Short-term Memory

Posted on:2018-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:N MaFull Text:PDF
GTID:2348330542979708Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Lip-reading recognition is utilizing computer to analyze videos of people's speech and to recognize what people are saying according to the movements of lips of people.Lip-reading recognition can be implemented because visual speech information is the important carrier of conversation.People tend to watch movements of lips of other people to help them understand the speech,and in noisy environment,the visual speech information plays more important or even the only one role for people to understand the speech.If we could design reasonable algorithms to analyze the visual information in speech,the computer could complete the task of lip-reading recognition with a high accuracy.However,visual speech information can be various due to various appearance of lips,various ways of talking of different people and various backgrounds even the content of the speech is the same.This is a great challenge for lip-reading recognition.To address the problem of variety of visual speech information,we propose a new approach for lip-reading recognition based on Long Short-Term Memory(LSTM),trying to learn invariant spatio-temporal features in speech video automatically and improve the accuracy of lip-reading recognition.Our approach is evaluated on three public databases(GRID,MRIALC and OuluVS)for lip-reading recognition of isolated words or phrases in speaker independent experiments.On GRID and MRIALC,the accuracy our approach obtained outperforms the conventional approach with more than 30%improvement.On OuluVS,the accuracy our approach obtained is comparable to state-of-the-arts.Our main contributions are as follows:1.Different with most previous approaches processing appearance information of lips,we compute the position of lips landmarks which describes the dynamic information of the shape as the feature of the lip-reading video.Such method holds the characteristics of within-class consistency and between-class distinctiveness.2.We use LSTM to process the visual features of speech video,which can learn spatio-temporal features that have the ability of discrimination and generalization.Our results indicate that our lip-reading recognition approach solves the problem of variety of visual speech information effectively.3.We discuss the reasons why LSTM could be utilized in lip-reading recognition.And our approach in lip-reading recognition could inspire using LSTM in some other sequential tasks similar to lip-reading recognition.
Keywords/Search Tags:Lip-reading recognition, Long Short-Term Memory, Computer Vision
PDF Full Text Request
Related items