Research On Lip Reading Recognition Based On Deep Learning | | Posted on:2019-04-08 | Degree:Master | Type:Thesis | | Country:China | Candidate:D J Wu | Full Text:PDF | | GTID:2428330623962371 | Subject:Instrumentation engineering | | Abstract/Summary: | PDF Full Text Request | | Machine lip-reading is a very novel technology that uses only visual information to understand speech content.Lip-reading recognition is an important research topic in the field of artificial intelligence and computer vision.With the identification of lip features,it can be applied to the field of language function recovery,criminal investigation,identity authentication and other fields.Artificial intelligence has been widely used in various disciplines and fields of modern society,and has achieved good results in various fields.The artificial intelligence technology with deep learning as the core overcomes the difficulty of artificially extracting features in the general machine learning method,and realizes the process of machine autonomous extraction of features.Lip-reading recognition can be divided into two categories: word level and sentence level.Word level can be regarded as discriminant classification problem,and sentence level can be regarded as discriminant sequence to sequence problem.At present,the study of lip-reading recognition in natural scenes has been studied by foreign scholars and some achievements have been made,but mainly around English.Research on lip-reading recognition in the natural scene of Chinese is rarely touched.Therefore,after a thorough investigation of lip-reading recognition technology,this paper focuses on the problem of Chinese lip-reading recognition in natural scenes.The main research work of this paper is as follows:1.In-depth comparative research on lip-reading recognition technology at home and abroad,especially the study of lip-reading recognition based on deep learning,preliminarily determined the entire workflow of the research topic.2.One of the main obstacles to the progress in the field of lip-reading recognition is the lack of data sets.At present,the English lip reading data set is not sufficient,and the amount of data available is far from enough to train the scalable model.In Chinese,there is no publicly available data set.Based on this situation,this topic firstly produced the Mandarin Chinese lip reading data set TMLRD-20(Tianjin University Mandarin Lip Reading dataset 20 hours)by means of automation,and gave a detailed production process in detail.3.With reference to the existing research results in the field of motion recognition,several word-level lip-reading recognition applications were designed and tested on the LRW(Lip Reading Word)dataset,and the experimental results were given.These designs also provide a reference for designing feature extraction front ends for later sentence-level lip-reading applications.4.The improved CTC(connectionist temporal classification)Chinese sentence-level lip-reading recognition model was designed and the experimental results and analysis were given on TMLRD-20.The recognition results show that the model is feasible for Chinese sentence-level lip-reading recognition applications.5.The improved Encoder-Decoder Chinese sentence-level lip-reading recognition model MLRN(Mandarin Lip Reading Network)is designed.The model is tested on the TMLRD-20 dataset and Grid dataset.The experimental results show that the performance of the model is better than the improved one.The performance of the CTC Chinese sentence-level lip-reading recognition model also shows very competitive recognition results on the Grid dataset. | | Keywords/Search Tags: | Lip-reading, Deep learning, Word level, Sentence level, TMLRD-20, Chinese, CTC, Encoder-Decoder | PDF Full Text Request | Related items |
| |
|