Research Of Lip-reading Recognition Based On Long Short-term Memory

Posted on:2018-04-01

Degree:Master

Type:Thesis

Country:China

Candidate:N Ma

Full Text:PDF

GTID:2348330542979708

Subject:Control engineering

Abstract/Summary:

Lip-reading recognition is utilizing computer to analyze videos of people’s speech and to recognize what people are saying according to the movements of lips of people.Lip-reading recognition can be implemented because visual speech information is the important carrier of conversation.People tend to watch movements of lips of other people to help them understand the speech,and in noisy environment,the visual speech information plays more important or even the only one role for people to understand the speech.If we could design reasonable algorithms to analyze the visual information in speech,the computer could complete the task of lip-reading recognition with a high accuracy.However,visual speech information can be various due to various appearance of lips,various ways of talking of different people and various backgrounds even the content of the speech is the same.This is a great challenge for lip-reading recognition.To address the problem of variety of visual speech information,we propose a new approach for lip-reading recognition based on Long Short-Term Memory(LSTM),trying to learn invariant spatio-temporal features in speech video automatically and improve the accuracy of lip-reading recognition.Our approach is evaluated on three public databases(GRID,MRIALC and OuluVS)for lip-reading recognition of isolated words or phrases in speaker independent experiments.On GRID and MRIALC,the accuracy our approach obtained outperforms the conventional approach with more than 30%improvement.On OuluVS,the accuracy our approach obtained is comparable to state-of-the-arts.Our main contributions are as follows:1.Different with most previous approaches processing appearance information of lips,we compute the position of lips landmarks which describes the dynamic information of the shape as the feature of the lip-reading video.Such method holds the characteristics of within-class consistency and between-class distinctiveness.2.We use LSTM to process the visual features of speech video,which can learn spatio-temporal features that have the ability of discrimination and generalization.Our results indicate that our lip-reading recognition approach solves the problem of variety of visual speech information effectively.3.We discuss the reasons why LSTM could be utilized in lip-reading recognition.And our approach in lip-reading recognition could inspire using LSTM in some other sequential tasks similar to lip-reading recognition.

Keywords/Search Tags:

Lip-reading recognition, Long Short-Term Memory, Computer Vision

Related items

1	Acceleration Gesture Recognition Based On Long-short Term Memory Network
2	Research On Video Action Recognition Based On Improved Long Short-term Memory Network
3	Research On Group Behavior Recognition Based On Multi-stream Architecture And Long Short-term Memory Network
4	Research And Application Of The Short-term Memory Network For Adjusting Gate Length
5	Long Short Term Memory Recurrent Neural Network Application To Handwritten Recognition
6	Chinese Sign Language Recognition Based On Convolutional Network And Long Short Term Memory Network
7	Online Handwritten Math Expression Label Recognition Based On Long Short Term Memory Recurrent Neural Network
8	Design Of Speaker Recognition Algorithm Based On Long Short-term Memory Networks
9	Research On Sequence Recommendation Model Integrating User’s Long And Short-Term Interests
10	Research On Fall Detection Based On Long Short-term Memory Artificial Neural Network And Wrist Sensor