Font Size: a A A

Multimodal Emotion Recognition From Speech And Text

Posted on:2021-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y X HuFull Text:PDF
GTID:2428330614958550Subject:Control engineering
Abstract/Summary:PDF Full Text Request
At present,social media is playing an increasingly important role in human daily life.Social software,such as We Chat,Weibo and QQ,have become the indispensable tools for our daily communication.At the same time,as one of the important foundations of intelligent human-computer interaction,emotion recognition has also made great progress.In the process of our social communications,a great deal of various information including text,speech and pictures is utilized.Consequently,the study of multimodal emotion recognition from the social media is becoming a hot topic which is of great significance to realize the applications of intelligent human-computer interaction.Considering the intrinsic relevance and complementarity between text and speech,the multimodal emotion recognition for speech and text was studied in this thesis.The feature extraction methods for speech and text firstly were investigated based on deep learning.Then,bimodal fusion algorithms for speech and text based on deep learning were proposed.On this basis,an application system of multimodal emotion recognition for speech and text was designed and developed.The main contents of this thesis are as follows:1.The feature extraction methods for speech and text are studied based on deep learning.Considering the differences between different modes,Convolutional Neural Network(CNN)and Long Short-Term Memory(LSTM)were combined in bi-channels to fully learn the global and local acoustic emotion features for speech recognition.Then,Bi-directional Long Short-Term Memory(Bi-LSTM)network was resorted to capture the textual features.Related experiments were carried out on the public database IEMOCAP.The experimental results show that,compared with other models,the recognition accuracy of the proposed single-modal emotional feature learning models was higher,which verified the superiority of the proposed single-modal feature learning models.2.The fusion model for multimodal emotion recognition was presented based on deep learning.Through the method of feature level fusion,the extracted single mode features from speech and text are effectively fused.Furthermore,the fusion features are learned again using the deep neural network,and the abstract information of high-level emotion features is extracted for the final emotion analysis and classification decision.Finally,regularization method was further applied to optimize the fusion model,and a recognition rate of 70.4% was achieved on the IEMOCAP database.Compared with other researches,the experimental results have certain advantages and fully verify the effectiveness of the proposed model.3.The application system for speech and text emotion recognition was designed and developed.The research of multimodal emotion recognition based on speech and text played a good supporting role for the realization of human-computer interaction.Therefore,this thesis proposed an application framework for speech-text multimodal emotion recognition.Based on Python user interface development kit—PYQT,the emotion recognition system was designed,developed and finally tested using the recorded data and IEMOCAP dataset.The experimental results show that the system can effectively identify the emotional state contained in the input information using the proposed fusion model and has better practicability.
Keywords/Search Tags:emotion recognition, deep learning, feature extraction, multimodal emotion recognition, long short term memory network
PDF Full Text Request
Related items