Font Size: a A A

Research On Lip-reading With A Sequential Gradient Boosting Algorithm Based On CTC Loss

Posted on:2022-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:2518306311964079Subject:Statistics
Abstract/Summary:PDF Full Text Request
Labeling problem refers to the prediction problem in which both input variables and output variables are variable sequences.Its purpose is to learn a sequence model so that it can predict the correct marking sequence for unknown input sequences.Generally speaking,probabilistic graph models are often used to solve such problems in traditional machine learning,such as Hidden Markov Model(HMM)and Conditional Random Field(CRF).In deep learning methods,the network model is generally improved and constructed based on Recurrent Neural Network(RNN).For the labeling problem of input-output sequence of unequal length,that is,the length of output sequence is less than the length of input sequence,most of them are processed by CTC-based Recurrent Neural Network.It is usually more complicated to deal with the labeling problem based on common classification model,such as Support Vector Machine(SVM),which requires manual construction and processing of sequential features.Boosting,as a serial-generated integration algorithm,can enhance the effect of ensemble model by combining various weak learners,while maintaining a certain order relationship between these base learners.The generation and training methods of the Gradient Boosting Decision Tree model mainly rely on the basic learner fitting the steepest descent direction of the loss function of the current integrated model in the next iteration,that is,the negative gradient direction of the loss function,so as to gradually reduce the gap between the model prediction and the target.This training method is very similar to the way of back propagation in the training neural network in deep learning.Based on this characteristic,this paper mainly studies the Gradient Boosting Decision Tree algorithm,combined with the CTC algorithm,and finally proposes a new sequential Boosting method called SeqGBDT,which can better deal with the labeling problem of the sequence of unequal length.Starting from Boosting principle,this paper first introduces the principles of binary classification and multiple classification in the Gradient Boosting Decision Tree algorithm in detail,as well as the gradient derivation needed for model op-timization,which lays a foundation for the subsequent construction of SeqGBDT model.Then,the calculation principle of CTC loss and the way to maximize the sequence output probability based on CTC loss are explained in detail.The forward-back algorithm of CTC is carefully deduced to solve the gradient of CTC loss,and the Beam Search method is used to decode.Based on the above theoretical analysis,this paper proposes a complete and feasible SeqGBDT algo-rithm for sequence labeling,which makes Boosting algorithm able to be applied to sequence labeling problems,as well as to deal with uneven length labeling problems.In order to intuitively verify the operability and practicability of the algo-rithm,this paper starts with the sequential laboring task of lip-reading and the performance of SeqGBDT model on MIRACL-VC lip-reading dataset was inves-tigated.Lip-reading,also known as visual speech recognition,is a labelling task that decodes the text according to the visual information of the speaker's lip-s.This article first to extract the lip-read lip area of the MIRACL-VC dataset,and then use the pre-trained Inception model to obtain image features.Then,these features are input into the SeqGBDT model as lip-reading inputs.CTC loss is used for model training and parameter tuning.Finally,the correspond-ing text sequence is decoded by Beam Search algorithm.Character Error Rate(CER)is used to evaluate SeqGBDT model.The model was trained from two aspects:speaker dependence and speaker independence.The results show that the SeqGBDT model can learn more accurate lip pronunciation rules without the participation of more speakers,and the accuracy of the training set can reach 100%,and the highest accuracy of the test set can also reach 40%.A visual inter-pretation of the model can be provided by means of feature importance ordering of Boosting method and class activation map of CNN.
Keywords/Search Tags:Labeling Problem, Gradient Boosting Decision Tree, CTC Algorithm, Lip-reading
PDF Full Text Request
Related items