Font Size: a A A

Research On Chinese Lip Reading Recognition Based On Deep Learning

Posted on:2022-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:J WuFull Text:PDF
GTID:2518306521479914Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Chinese lip recognition refers to observing the law of human lip movement from video images,and then identifying the corresponding Chinese.However,this is difficult to predict.The main reason is that people's lips movement is limited,the rich content of Chinese language also greatly increases the difficulty of lip recognition,making the progress of lip reading slow.From the perspective of Chinese composition,Unlike English,Chinese is only made up of letters,while Chinese letters are more complex,Pinyin has more than 1000 pronunciation combinations and more than 9000 Chinese characters,which makes Chinese lip recognition more difficult.But now deep learning technology is more and more mature.Application scenarios are more and more wide.This give us enough confidence to complete the task of lip This paper mainly studied the data processing and the model of lip recognition.In this paper,By analyzing the previous network structure,the model of lip recognition is divided into two language models,the Pinyin model(PM)and the Chinese model(CM),.This method can improve the accuracy of Chinese characters.and then the forward attention is used to reduce the sequence length to extract lip micro change features.On this basis,the processed feature vectors are input into the encoder.Then the Pinyin sequence is decoded in the decoder.In CM module,the Pinyin sequence of PM module is input.The function of the model is to predict the Pinyin sequence into the correct Chinese character sequence.Besides,we should solve the problem of input and output length alignment using a sequence ordering model;on the other hand,we need to make the Pinyin sequence training set that is similar to the output of PM module.However,The LSS rule presented in this document plays an important role.The embedded layer is used to map Pinyin and Chinese characters into 256 dimensional word vectors.This also reduces the complexity of the model.In terms of data,the compressed file size of CMLR(Chinese Mandarin lip reading dataset)is 38.5GB.In this paper,opencv is used to obtain video frames,and then corner net algorithm is used to locate lips.At the end of the paper,the effectiveness of the method is proved.Experiments on CMLR data sets show that although the error rate of Pinyin is not significantly reduced,the training time is significantly shorter.When LSS is used in CM module,the word error rate is significantly reduced.Finally,the error rate of Focus lip net is 28.68%,Initially,the original model without LSS and feed forward attention was 34.01%.
Keywords/Search Tags:Deep learning, Sequence to sequence model, Lip recognition, FeedForward attention, Convolutional neural network
PDF Full Text Request
Related items