Font Size: a A A

Research On Three-dimensional Features Recognition Based On Deep Learning Speaker

Posted on:2021-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2428330611950445Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Speech recognition not only plays an important role in Human-Computer Interaction,Artificial Intelligence(AI),Natural Language Processing(NLP)and other aspects,but also is a current research hotspot.The speaker's three-dimensional features recognition is to analyze the information that represents the gender,age and emotion of speaker through the speaker's voice signal,and to identify the speaker's gender,age and emotion,which is of great practical significance to criminal case investigation,intelligent hospital,intelligent court,for example:to identify the driver's emotional states can be reminded in advance to reduce the occurrence of traffic accidents,in psychological counseling accurate identification of visitors' emotions is conducive to the smooth completion of the consultation process,etc.The traditional classifier Softmax,Support Vector Machine(SVM)and e Xtreme gradient boosting(XGBoost)on individual feature such as speaker's gender,age and emotion is better,and the classification effect of multi-dimensional features above two-dimensional(gender and age)is poor.Multi-modal fusion method is used to fuse two single-modal deep learning models Bi LSTM and CNN as deep feature extraction model(i.e.Bi LSTM?CNN).the multimodal features fusion method is used to fuse the single-modal time domain feature,frequency domain feature,text feature to obtain features data that can better represent the speaker's speech information.aiming at the low learning ability of deep neural network for a small number of speech samples,this paper proposes to transfer the deep feature extraction model(Bi LSTM?CNN)depth learned feature knowledge to Softmax?SVM and XGBoost for target task learning.The experiment proves that the proposed model Bi LSTM?CNN have a good classification effect on the recognition of three-dimensional gender,age and emotion on the target task learning SVM.
Keywords/Search Tags:Deep learning, Multi-modal fusion, Gender identification, Age recognition, Emotion recognition
PDF Full Text Request
Related items