Research On Lip Recognition Algorithm Based On Optimized Mobile Net

Posted on:2023-09-05

Degree:Master

Type:Thesis

Country:China

Candidate:K X Li

Full Text:PDF

GTID:2568306788456444

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

Lip recognition is the process by which a computer learns the changes in dynamic lip image sequence of the speaker’s lips to recognize the content of the language spoken by the speaker.Lip recognition technology is widely adopted in the fields of national public safety protection,medical correction,etc.Most of the current research uses deep learning for lip recognition,however,in order to achieve high recognition rates,the network models for lip recognition are getting larger and larger,which are difficult to deploy to mobile devices.Therefore,in this thesis,we optimize and improve the traditional MobileNet network to address the above problems,and propose a more lightweight FD-MobileNet network,and use the combination of FD-MobileNet network and GRU network to recognize the two-dimensional image features and temporal features of lips.To further improve the recognition rate,our work incorporates the attention mechanism into the GRU network and demonstrates the effectiveness of this model through a large number of experiments.Finally,we design an application system with interface function to make lip recognition implemented in real life.The main research contents are as follows:(1)Studying lip-movement video sequences.In this thesis,a semi-random frame extraction algorithm is designed to extract frames from the lip-movement video,then68 key points of the Dlib library is utilized to locate the face in the extracted images,and finally segmented using the geometric location feature points of the lips.This study not only facilitates the subsequent neural network for feature extraction,but also greatly reduces the redundant information of the image.(2)Optimizing the MobileNet network model.In this thesis,by analyzing and comparing the lightweight network models MobileNet and Shuffle Net,we can see that the basic network module of MobileNet is simple,which gives it fast prediction speed,while the fast down sampling strategy adopted by Shuffle Net can learn more information with less computational cost.Therefore,this thesis optimizes the network structure of MobileNet and proposes a new network model that takes into account both the computational cost and the prediction speed: FD-MobileNet.through experimental comparison,we find that FD-MobileNet is better than MobileNet in terms of prediction accuracy and has a huge improvement in actual prediction time than Shuffle Net.(3)Constructing a lip recognition model that combines FD-MobileNet and GRU networks.FD-MobileNet can extract the two-dimensional features of images,and GRU network can learn the action changes between sequences.Our work relies on the advantages of these two networks,and proposes a combination of FD-MobileNet and GRU for lip recognition.In order to learn the image features more accurately,the attention mechanism is incorporated into the GRU network.Finally,this thesis shows that the proposed model has strong intra-class similarity and inter-class variability,and can perform the prediction task of video lip recognition well by introducing six indexes:loss function,accuracy of test set,performance of common models on self-made dataset,confusion matrix of words,recall rate and lip variability.In terms of performance,it is found that the introduction of fast down sampling strategy and attention mechanism reduces the redundant information of the video and meets the requirement of noise suppression.(4)Developing a lip recognition application system.Considering the real-life needs,we designed a lip recognition system with user interface,which contains three modules of selecting videos,visual display and recognition results,not only to make lip recognition implemented in real life,but also to provide researchers with a tool for subsequent improvement and optimization of the model.

Keywords/Search Tags:

lip recognition, fast-down sampling strategy, convolutional neural network, recurrent neural network, application system

PDF Full Text Request

Related items

1	Recurrent Convolutional Neural Networks With Applications
2	Research On Speech Emotion Recognition Based On Convolutional Recurrent Neural Network
3	The EEG Recognition Combined With Convolution Neural Network And Recurrent Neural Network
4	Research On Key Technologies Of High Performance Accelerator For Convolution And Recurrent Neural Networks
5	Research On Two-stream Convolutional Neural Network Algorithm And Its Application In Violent Action Recognition
6	Speech Emotion Recognition Based On Improved Convolutional Recurrent Neural Network
7	Research On Sign Language Recogniton Method Based On Convolutional Neural Networks And Recurrent Neural Networks
8	Research Of Speech Emotion Recognition Method Based On Convolutional Recurrent Neural Networks
9	Research On Text Detection And Recognition In Natural Scene Based On Deep Neural Network
10	Research On Activity Recognition Based On Convolutional Neural Networks And Recurrent Neural Networks