Font Size: a A A

Research And Design Of Lip Reading Based On 3D Convolution

Posted on:2020-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:D WangFull Text:PDF
GTID:2428330596975117Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As deep learning technology matures,the scenes that can be applied are becoming more and more extensive.Because lip reading requires extracting the information needed from the lips in the speaker's motion to capture what the speaker wants to express.However,due to the diversity of language types and the subtlety of lip movements,the development of lip reading has been slow.Unlike alphabetic languages,it involves more than 1,000 pronunciations as Pinyin,and nearly 90,000 pictographic characters as Hanzi,which makes lip reading of Chinese very challenging.In the research process of lip language recognition in this paper,the network model is decomposed into two parts,image mode and language model.This decomposition method helps us to carry out experiments.In terms of image,This paper first use 3D convolutional neural network to extract image features containing time series information.In the second part,This paper use the Decoder-Incoder neural network model based on the language model to process.In the first part,This paper use the modified 3D convolution network to extract the spatiotemporal information of the input image sequence.Then this paper enter the output of3 D ConvNet into the modified ResNet.At the end of the Pinyin Sequence Identification Network,this paper use CTC as a loss function for our network to train the network.The construction process of this network constitutes the PTP(Pictures to Pin Ying)network model of our Chinese Pinyin Sequence Recognition Network.After that,this paper use the language model of the Encoder-Decoder encoding module to process the Pinyin sequence of our previous network output.The input Chinese Pinyin sequence is encoded by the Encoder encoding module,and then decoded by the Decoder decoding module,and finally the Chinese character sequence corresponding to the input image is output.The network construction step described above is the PTC(Pin Ying to Chinese Characters)network module of our network.And in the research this paper used a more self-made Chinese lip language dataset,The size of the data set is 20.95 GB.The experimental results show that the accuracy of the sentence obtained by using the 3D convolution experiment is 47.3%,and the original accuracy of the 44.9The original 2D convolution experiment yielded a 44.9%sentence accuracy rate,the overall accuracy of the system model has been significantly improved.According to the results,our scheme not only accelerates training and reduces overfitting,but also overcomes syntactic ambiguity of Chinese which provides a baseline for future relevant work.
Keywords/Search Tags:Deep Learning, Convolutional Neural Networks, 3D Convolutional Neural Networks, Lip-Reading, Recurrent neural network
PDF Full Text Request
Related items