Font Size: a A A

Research And Application Of Lip Reading Recognition And Model Compression Based On Deep Learning

Posted on:2022-12-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:X B ZhangFull Text:PDF
GTID:1488306764959069Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Chinese lipreading is one of the challenging topics,which aims to recognize the corresponding texts by observing the speaker's lip movements.Due to the richness and ambiguity of Chinese language,as well as no publicly available Chinese lip dataset,the research on this subject has been developing slowly.In recent years,with the wide application of neural network in the field of computer vision,lipreading based on deep learning has made great progress.By collecting a Chinese lip dataset CCTVDS and analysing the Chinese word formation characteristics,this dissertation proposes an end-to-end Chinese lipreading model Lip CH-Net,which achieves the automatic recognition from lip pictures to Chinese sentences.Training and optimization of Lip CH-Net relys on Graphics Processing Unit(GPU),which has high requirements on computing capacity and storage space.Therefore,it is temporarily in the theoretical research stage,and difficult to popularize Lip CH-Net on a large scale.To achieve the transformation of this scientific research achievement and demonstrate its universality in current intelligent environment,the model compression algorithm can be used to enable portable devices with limited hardware resources to carry Lip CH-Net,thus manifesting the practical application value of lipreading model in assisting deaf-mute communication.Therefore,the effective model compression method is prerequisite and necessary to realize the universal application of Lip CH-Net.Considering the above practical problem,this dissertation explores how to use Knowledge Distillation(KD)algorithm to solve model compression by mutual learning and fitting between features,and realizes the compression application of Lip CH-Net to a certain extent.According to the idea that it is feasible for the theory of lipreading model first,and then it is necessary for model compression to meet the practical application conditions,this dissertation explores from the perspectives of what is Chinese lipreading,how to build lipreading model,and the methods to realize lipreading model compression step by step,and obtains the corresponding research results respectively.The main research contents are summarized as follows:(1)To solve the problem of assisting deaf-mute patients to communicate with others normally,the sentence-level end-to-end Chinese lipreading model Lip CH-Net is proposed.Based on the Chinese pronunciation and word formation rules,Chinese lipreading is first divided into two parts: Picture-to-Pinyin recognition and Pinyin-to-Hanzi recognition.When the two sub-tasks are pre trained to convergence by using their respective neural network sub-models and optimization techniques,they are combined to form an end-to-end Chinese lipreading model Lip CH-Net to complete the training and recognition process from lip pictures to Chinese character sequence.In addition,we collect a Chinese lipreading dataset CCTVDS containing 20495 samples in a semi-automatic way.This work provides reference and data support for the related work of Chinese lipreading in the future.(2)To solve the problem of simplified transferring features,an offline model compression algorithm MKTN via transferring multifarious features is proposed.In MKTN,the teacher network is first trained under two very different but complementary tasks to capture diverse and multifarious features.A lightweight yet competent student network is then trained by mimicking both pixel-level and spatial-level feature distribution of the resourceful teacher network under the guidance of feature loss and adversarial loss,respectively.It can enhance the learning and utilization of the transferred features,and greatly improve the feature extraction ability of the student,which is easy to deploy but empowered to make approximate performance as the teacher,achieving the purpose of model compression.The experimental verification on CCTVDS dataset shows that under the condition of generating effective accuracy,MKTN can realize the compression application of Lip CH-Net to a certain extent,and the compression rate is up to 50%.(3)To solve the problem of lacking of mutual learning between middle features,an innovative adversarial-based online model compression algorithm AMLN is proposed.On the one hand,AMLN uses the outcome-driven learning algorithm to realize the mutual fitting between the final prediction distributions of peer models.On the other hand,AMLN introduces the process-driven learning for augmented online knowledge distillation.A block-wise training module with the discriminator and alignment container is designed which guides to learn low/middle-level features under corresponding intermediate supervision as well as supervision from the last layer of the peer network,and this spreads until the final network layer which captures more high-level information.It can improve the utilization of intermediate features and the interaction between the middlelevel and high-level features,so as to accelerate the training convergence and enhance the model robustness and performance effectively.The comparative results show that AMLN can further optimize the recognition accuracy of Lip CH-Net and its simplified model on CCTVDS dataset under the condition of the same compression rate.(4)To solve the problem that features around decision boundary are ignored,an online model compression algorithm OKDCR based on consistent regularization is proposed.In OKDCR,each model is equipped with a pair of task-specific classifiers that share the same feature extractor.It defines an intra-model consistency for each model that is measured by the distribution distance between the two classifiers of the model,and an intermodel consistency that is evaluated based on the classifier distributions across models.The two types of consistency guide to update the shared feature extractor and regularize the feature learning around the decision boundary.In addition,the intra-model consistency generates adaptive weights as the mean prediction of each model in the final model ensemble,and the weighted ensemble guides all classifiers to learn for joint alignment of peer models.This can highly improve the discriminative ability of classifiers as well as the classification capacity of peer models.Experimental results on CCTVDS dataset show that several OKDCR-trained simplified model can make similar performance as Lip CHNet,which provides a theoretical foundation and experimental support for Lip CH-Net to be embedded in mobile portable terminal for universal application.
Keywords/Search Tags:Deep learning, Neural network, Chinese lipreading, Model compression, Knowledge distillation
PDF Full Text Request
Related items