| The progress of artificial intelligence technology has promoted the rapid development of learning science and cognitive science.In this context,it is an urgent key issue in the field of artificial intelligence to explore the application prospect of artificial intelligence technology in individual behavior,cognition and emotion,deepen the deep integration of data science and educational science,and realize the in-depth mining of data-driven individual potential characteristics.The perception and measurement of individual emotional state has always been the focus of researchers in the fields of education,psychology and computer science.How to accurately collect and extract the explicit behavioral data of individuals and discover the hidden internal psychological characteristics is the key issue of artificial intelligence research in the future.Through a lot of literature research,we found that the main problems in the field of multi-modal emotion recognition include:the research on multi-modal emotion recognition is not enough,and no systematic research model has been formed.The representation of multimodal data is different,and data fusion is difficult.The value density of emotional information in single-modal data is unknown.There is a lack of systematic analysis of the effectiveness of multimodal data fusion.We constructed a multimodal emotion recognition model named UDP-MIF(Uni-modal Data Perception and Multi-modal Information Fusion),using the advanced technology model of artificial intelligence,to construct a Data Perception Channel for the single modality,realize the integration of text,voice,video,and explore the Information Complementarity Mechanism between multi-modal data.We constructed the model of uni-modal emotion classification and multi-modal emotion recognition based on neural network.We mainly completed the following work:1.The construction of a multi-modal emotion recognition model named UDP-MIF.Using the idea of model fusion,a multi-modal emotion recognition model based on "Unimodal Data Perception and Multi-modal Information Fusion" is constructed to realize the multiple perception and fusion of "text-voice-video" data.The information complementary mechanism between different modal data is used to improve the accuracy of emotion recognition.We constructed the data perception channels for text,voice and video using the advanced research results in the field of artificial intelligence,to generate text emotion vector(TSV),voice emotion vector(ASV)and video emotion vector(VSV).We explored the influence mechanism of multi-modal data on the accuracy of emotion recognition through the splicing and fusion of uni-modal emotional feature vectors,and made a deep discuss on the value density of emotion information and the fusion mechanism of different modal data,so that we can form a systematic research methods for multi-modal emotion recognition.2.We propose a multi-modal fusion algorithm based on deep learning.We use advanced technology models in the field of artificial intelligence,and choose the most suitable models and methods for uni-modal data to optimize the effect of single-modal emotion recognition,so as to improve the accuracy of multi-modal emotion recognition.For text modal,in the upstream stage,we made a transfer learning based on Google BERT model,and constructed the text word vector.In the downstream stage,we used TextCNN model and BiLSTM+Attention model to extract the word vector respectively,and made a model integration to promote the accuracy of text emotion classification.For speech modal,we constructed a speech emotion classification model based on LSTM+Attention.For video mode,we constructed a video emotion classification model based on CNN.Finally,through the fusion of uni-modal emotion feature vectors,the individual’s true emotional state is analyzed.3.Through a large number of research experiments,the emotional information value density of single-modal data and the fusion mechanism of multi-modal data are systematically discussed.On the basis of the IEMOCAP dataset developed by the university of southern California’s SAIL laboratory,we carried out a lot of emotion recognition experiments based on uni-modal and multi-modal data,to verify the emotion recognition model we proposed in this paper and discuss the emotional information value density of uni-modal data and the fusion mechanism of multi-modal data from several dimensions.The research results show that the information complementation mechanism of multimodal data can effectively improve the accuracy of emotion recognition;in the IEMOCAP dataset selected in this paper,the emotional value density of text data is higher than that of audio and video data;"text-speech" data can represent individual’s emotional state to the greatest extent;under the same conditions,the UDP-MIF multi-modal emotional recognition model proposed in this paper can achieve good performance in multi-modal emotional recognition problems. |