Font Size: a A A

Chinese Lipreading And Keyword Detection Based On Deep Learning

Posted on:2021-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:X J ChenFull Text:PDF
GTID:2428330611462395Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Lipreading is a technology that combines computer vision and natural language processing.It recognizes what the speakers say only by visual information.Traditional lipreading methods require handcrafted features,and classifier is difficult to train.Therefore,the progress of lipreading research is slow.In the last few years,deep learning has achieved significant development in many fields.So,the research of lipreading based on deep learning has gradually become hot.There are many Chinese characters,which are more complicated than other language which only consisting of letters.That's the reason why the task of Chinese lipreading is more difficult.In reality,there are some scenarios where only key words need to be identified.Therefore,the detection and identification of keywords is also very important in practical applications.The research work of this paper includes the following two parts.(1)Chinese sentence-level lipreading method.The research of Chinese sentence-level lipreading is divided into two stages.The first stage is to recognize the lip picture sequence as a pinyin sequence.This stage uses three-dimensional convolution and two-dimensional DenseNet to extract visual information,and uses resBi-LSTM(residual bidirectional Long Short-Term Memory)to decode visual features.It reduces the error rate of pinyin on Chinese dataset NSTDB,and also reduce the word error rate on the public English dataset GRID.The second stage is to recognize the pinyin sequence as a Chinese character sequence.It uses a stack of multi-head attention to learn context information in the pinyin sequence,which establishes a mapping relationship with the Chinese character sequence.The error rate of Chinese character sequences obtained at this stage is about 8% higher than that of Pinyin sequences.(2)Lip keyword detection method.A sample-based lip keyword detection method is proposed to determine whether the query example appears in the video.First,it needs to extract the posterior probabilistic features of the query examples and videos.In this paper,the first stage of lipreading network model is used to extract features.Second,based on the extracted features,a similarity metric matrix is ??calculated.Finally,a convolutional neural network classifier,consisting of a 6-layer convolution and a 1-layer full,is used to classify the similarity metric matrix graph.The lip keyword detection in this paper is studied on the GRID dataset.The results in experiment show that the convolutional neural network classifier performs well on precision,recall and F1-score.
Keywords/Search Tags:Deep learning, Chinese lipreading, Attention mechanism, Query-by-example, Keyword detection
PDF Full Text Request
Related items