Font Size: a A A

Research On Lip Reading Technology Based On Convolutional Neural Network

Posted on:2021-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:H Y JiangFull Text:PDF
GTID:2428330611480582Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Automated lip reading is a comprehensive technology that integrates machine vision,artificial intelligence and natural language processing.Automated lip reading is a new mode of human-computer interaction and can directly recognize the speech content from the lip motion image sequence of the speaker.In recent years,with the fast development of artificial intelligence technology that lip reading recognize technology has becoming more and more mature,and the recognition accuracy of its network model has been significantly improved.In this paper,a lip reading system with fixed structure is built on the GPU platform by using the video data from the GRID corpus,which includes command,color,preposition,letter,number and adverb,such as "place blue in m one soon".All sentences adopt this structure.This system is a lip reading system at the sentence level.We use a network architecture that combines a three-dimensional convolutional neural network(3D-CNN)and a bidirectional long-term short-term memory network(Bi-LSTM)to perform feature extraction on the input continuous 75 image data.At the same time,we use the CTC loss function as the model training loss for the lip reading system to avoid manual alignment of each frame of input image data and label data.The lip reading system model training using CTC as the loss function is a completely end-to-end lip reading system model training without the need to pre-align the data.Only need a continuous action mouth image input sequence and a label output sequence.The network structure used in this paper is small,and the data set used for learning and training is not large.The accuracy of lip reading has been significantly improved in similar methods.And this paper has realized the recognition of lip language from a single word and a single number.The recognition of a sentence and a sentence has made a meaningful exploration for the landing of automatic lip reading technology products.The results and application experience can be easily extended to other devices and smart home systems.
Keywords/Search Tags:lip reading, 3D convolutional neural network, bidirectional long-term and short-term memory network, CTC loss function
PDF Full Text Request
Related items