Design And Implementation Of Multimodal Language Recognition System

Posted on:2022-09-20

Degree:Master

Type:Thesis

Country:China

Candidate:J He

Full Text:PDF

GTID:2518306605988559

Subject:Control Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,speech recognition and image recognition have gradually become the mainstream way of human-computer interaction,and speech recognition has become a key factor in promoting the development of artificial intelligence.In addition,research based on speech recognition under noisy backgrounds is also gradually emerging.Although in the test environment,the recognition accuracy of isolated words has reached 99%,from a practical point of view,when we speak,there is not only the sound itself,but also the background sound produced by the surrounding environment.Therefore,the recognition accuracy is not as high as expected.A new algorithm model is urgently needed to overcome this problem.With the rapid development of deep learning,Markov model based on deep learning has gradually become the mainstream speech recognition model,replacing the traditional Gauss Markov model.Based on the above background,in order to further improve the accuracy of speech recognition,This paper presents a multimodal based language recognition system.Language recognition method based on audio visual fusion.On the basis of traditional speech recognition,visual factors of lip recognition are added.When the audio background is too noisy,lip language is used to supplement the understanding of semantics.This thesis mainly includes the following four parts:firstly,extracted the audio feature parameters.Extract the required FBanK features and MFCC features through the MFCC parameter extraction method.Secondly,the video image features are extracted,and the visual features are extracted after the video is preprocessed by frame and window.Thirdly,feature fusion.GMM-HMM model is trained as the baseline model,then convolution neural network and depth neural network are selected for feature fusion,modeling and training of visual information and auditory information in the first place.Eventually,recognition accuracy of the two network models is tested.Fourthly,make the program interface and conduct the overall test.

Keywords/Search Tags:

Multimodal language recognition, Image feature extraction, Audio feature extraction, GMM-HMM, Convolutional neural network, Deep neural network

PDF Full Text Request

Related items

1	A Novel Non-loss Function Deep Convolutional Neural Network Based Image Feature Extraction Method
2	Static Sign Language Recognition System Based On Convolutional Neural Network
3	Effective Feature Extraction On Sound Event Recognition
4	Face Feature Extraction Based On Depth Convolution Neural Network
5	Image Feature Extraction Based On Deep Learning
6	Research On Automatic Segmentation Method Of Multimodal Image Based On Convolutional Neural Network
7	Research On Feature Extraction Method For Facial Image Based On Joint Encoding And Convolutional Network
8	Research And Application Of Deep Neural Network In Image Recognition
9	Deep Rotation Vector Feature Convolutional Neural Network And Its Application In Face Recognition
10	Research On Predominant Instrument Recognition Based On Deep Learning