Font Size: a A A

Research On Lip Recognition Model Compression Based On Deep Learning

Posted on:2022-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:J WenFull Text:PDF
GTID:2518306494470924Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The basic purpose of lip recognition is to recognize the speaker's expression content by analyzing the lip dynamic,which is a research project in the field of human-computer interaction.Because the traditional lip recognition model has the characteristics of slow speed and difficult training,it needs a lot of artificial design and empirical processing in lip recognition technology based on different scenes.Therefore,this paper chooses to build lip recognition model based on deep learning.However,on the devices with compact resources,most of the convolutional neural networks with excellent performance cannot demonstrate their efficient recognition ability.Therefore,the compression and acceleration of convolutional neural network has become a hot research topic for many scholars.Through network compression,the model can reduce the utilization rate of computing and storage resources on the device.Among them,the most commonly used method is pruning the network,pruning can reduce the loss of network performance,and effectively remove the parameters by setting the judgment criteria to achieve the purpose of reducing the redundant information.In this paper,we proposed a structured network compression and acceleration method for convolutional neural network,which integrates the pruned convolutional neural network and the cyclic neural network and applies it to the lip recognition system.The specific work contents of this paper are as follows:1.Expanded the self-made lip data set,and perform pre-processing operations such as key frame extraction and lip region positioning on the original video.Firstly,we extracted the semi-random key frame from the recorded data set video and convert the video into a static image that can be input into the network model.Then,the MTCNN algorithm is used to detect the face region and correct the abnormal Angle.Then,the DLIB face 68 key point detection method is applied to accurately locate the lip region through the key points of the lips.2.Network model compression and acceleration is the main part of this paper.The method of pruning is used to compress the VGG16 network with good performance but large number of model parameters.Method in this paper,based on structured channel pruning,batch normalized layer to adjust parameters,make its weight value of the joint training with the passage network as a pruning measure,set up channels of importance,by setting the global threshold channel of sorted clip,the method using the batch normalization that exist in the network layer has the characteristics of convergence and improve the performance of network.3.Circulating neural network is directed expansion on time series.In this paper,cyclic neural network is used to solve the problem of semantic information between images after lip-language video extraction.Since the input lip-language data is a time series image with context semantics,cyclic neural network can solve the problem of contextual feature extraction of time series.Gradient instability is a problem that cannot be ignored by traditional cyclic neural networks.We selected bidirectional short and long time memory network(BILSTM)to learn the characteristics of lip movement sequence.After feature extraction is completed in the pruned convolutional neural network,it is input into BILSTM to learn the semantic information of temporal series feature.4.Finally,the network architecture integrated with the pruned convolutional neural network and BILSTM designed by us will output the recognized lip language prediction results through the Softmax layer.The performance of the network based on different threshold pruning is compared with that of the original network and the results are analyzed.
Keywords/Search Tags:lip recognition, prune, convolutional neural network, network compression
PDF Full Text Request
Related items