Font Size: a A A

Research On Lip Recognition Technology Based On Deep Learning

Posted on:2024-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:J W JiaFull Text:PDF
GTID:2568307139495814Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Lip recognition is a visual language-based technology that recognizes what is said by analyzing the movement and shape of a person’s lips.Lip recognition is used in many fields,such as speech recognition in noisy environments,where lip can replace speech.In security systems,you can use the lip to authenticate.With the development of deep learning technology,significant progress has been made in lip recognition technology based on deep learning.This thesis focuses on technology in this field.This thesis conducted in-depth research on improving lip-reading recognition accuracy,focusing on the lip-reading dataset and the lip-reading recognition model.Deep learning algorithms are based on data,and the quality of data directly affects the modelling effect of deep learning algorithms.A lip shape automatic labelling system based on the deep learning optical flow estimation algorithm is proposed to improve the quality of the lip-reading recognition dataset.This system realizes functions such as lip detection,lip tracking,and lip feature extraction and has scalability and customizability.It is found that the difficulty of lipreading recognition lies in accurately capturing and extracting lip movement information.Therefore,this thesis proposes a lip-reading recognition model based on deep learning and an attention mechanism,which can recognize lip-reading information at the word level and has achieved significant results in lip-reading recognition tasks.On LRW and LRW-1000 datasets,the Top1 accuracy of the model is 86.8% and 41.6%,respectively.The main work and innovations of this thesis are as follows:(1)A lip-reading automatic labelling system is proposed based on a deep-learning optical flow estimation algorithm.This system addresses the difficulties in constructing a lip-reading dataset.It uses professional software to perform rough segmentation on lip motion videos,followed by face detection and lip area cropping to obtain a sequence of lip images.Finally,the system tracks lip changes using the deep learning optical flow estimation algorithm to generate lip motion labels.The CCUPD,a commonly used Chinese phrase dataset,is established using this system to achieve Chinese lip-reading recognition tasks.(2)A lip-reading recognition model based on a deep learning attention mechanism is proposed.This model uses Mix Up technology for data augmentation,extracts spatiotemporal features of images through spatiotemporal 3D convolution,weights image features through the residual network and channel attention mechanism,model temporal relationships of features through bidirectional gated recurrent units,and weights feature weights at different times through temporal attention mechanism.The model performs well on mainstream datasets and verifies the effectiveness of the channel attention mechanism and temporal attention mechanism.(3)Through many comparative experiments,the thesis analyzes the influence of different data preprocessing methods,training schemes,datasets,and input sizes on the model performance.It performs comparative experiments on the LRW,LRW-1000,and CCUPD datasets.The results show that the proposed lip-reading recognition model performs well on small vocabulary lip-reading recognition datasets.
Keywords/Search Tags:Lip Recognition, Deep Learning, Attention Mechanism, Optical Flow Estimation, Bidirectional Gated Recurrent Unit
PDF Full Text Request
Related items