Font Size: a A A

Research And Implementation Of Lip Reading System Based On Finite State Automaton

Posted on:2021-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:T T DuanFull Text:PDF
GTID:2428330632462641Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence technology,the types and modes of human-computer interaction have undergone tremendous changes,from the traditional machine-centric interaction mode to a gradual transition to the user interaction-centric intelligent interaction mode.Among them,the non-contact and convenient interaction mode represented by intelligent speech interaction has been widely welcomed and widely applied.In quiet scenes,the capability of automatic speech recognition can parallel with humans,but in noisy scenes,or in the scenes without speech signal,it is difficult for speech recognition to take advantage of its own advantages.Therefore,it has become an urgent need to develop a more targeted recognition technology.Lip reading is such a technology used to make up for the shortcomings of existing interaction methods.Lip reading is a technique to modeling changes in lip shape and decoding the corresponding text using a sequence to sequence model.Without analyzing audio speech signals,which can effectively overcome the shortcomings of automatic speech recognition.By analyzing and modeling lip movements during speech,lip reading using a sequence to sequence model to decode feature vectors into the corresponding text.Obviously,lip reading is a subject with great development potential and application value,but the complex and variability of facial signals and the large number of homophones have brought challenges to the training of a lip reading system.Based on the statistical framework of lip reading,this paper improves the existing lip feature extractor,improves the existing end-to-end acoustic training principles,and builds a decoding framework based on finite state machine,effectively combined with the acoustic model and language model,trying to improve the performance of lip reading model through statistical knowledge obtained from natural language,and reduce the time and space consumption required for the decoding search process.The main contributions of this paper are summarized as:(1)An improved multi-stream fusion feature extraction network is proposed.This article combines optical flow,lip landmarks,and convolution feature maps into a robustness lip motion feature.(2)An improved end-to-end acoustic model training criterion is proposed.Based on the analysis of the current mainstream acoustic model,an improved multi-task end-to-end acoustic model training loss is proposed.(3)A static lip reading decoding space based on finite state machine is proposed.This article implements a lip reading system based on finite state automaton.It can effectively combine different knowledge constraints and improve the accuracy and efficiency of the recognition system.
Keywords/Search Tags:lip reading, finite state automaton, end to end
PDF Full Text Request
Related items