Font Size: a A A

Research On Lipreading Algorithm Based On Deep Learning

Posted on:2021-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:C Y BiFull Text:PDF
GTID:2518306308990319Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Lip recognition technology is a branch developed from speech recognition.In the past few decades,people have increasingly used visual cues to help decode speech and mimic humans’ ability to perform lip reading,thus developing automatic lipreading system.However,compared with audio or audition systems,the performance of automatic lipreading systems still lags behind.One of the main challenges of lip recognition technology is visual ambiguity.They increase the difficulty of words due to homophones.They produce the same or similar lip movements,making it easy to confuse the parameters of the trained characters.In addition,head position changes,lighting conditions,spatiotemporal resolution,and effective coding of spatiotemporal information also greatly affect the robustness of lip reading recognition.This paper proposes a method of Chinese lipreading algorithm.First,the lip information is recognized by Chinese Pinyin,and finally,the attention mechanism is used to recognize the Chinese characters in the Pinyin sequence,the algorithm performs well.Thedetails are as follows:(1)A feature coding structure based on dual-branch networks is proposed.Aiming at the phenomenon that words with the same or similar lip movement are difficult to be recognized,the Res Net-34 network is used to extract the features.For other words,using Dense3DNet-56 network.the input lips features are typically encoded,and a mask structure is used to fuse the information of the two modules.By jointly learing patented patterns on multiple spatiotemporal scales and effectively fuse this information,a new network structure NTCN is proposed to identify the extracted lip motion feature maps,and to improve the internal structure of the original TCN,make it has the ability to describe the association between pixels which are far away from each other and to capture the relationship between sentence.(2)A lip recognition algorithm based on self-attention mechanism is proposed.Instead of using the traditional sequence recognition model,which must be combined with the inherent model of convolutional neural network or recurrent neural network.The proposed scale point multiplication attention mechanism operator is effective.It reduces the amount of model calculation and improves the parallel efficiency of the network.The multi-head scale point multiplication self-attention mechanism uses the information of the different representation subspaces obtained by the model and combines them,so that the model can pay attention to the information of different positions of the lip sequence and judge the Chinese syntax and semantic structure based on pinyin.
Keywords/Search Tags:Lip reading, Pinyin sequence recognition, Dual-branch network, Self-attention mechanism, Convolutional neural network
PDF Full Text Request
Related items