Font Size: a A A

Research On Emotion Recognition Based On Multimodal Physiological Signals

Posted on:2024-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:G Y GuoFull Text:PDF
GTID:2530307058977679Subject:Computer Science and Technology
Abstract/Summary:
Emotion is a psychological state produced by humans in response to external stimuli,which can influence human cognition,coordinate and facilitate interpersonal interactions,and even affect human physical health.With the deepening of people’s knowledge about emotions,the promotion of portable acquisition devices,the development of neural networks and the research of emotion recognition methods,the demand for high-accuracy emotion recognition has become stronger in various fields.In the real life,the behavioral signals from humans,especially facial expressions,are highly disguisable,and it is difficult for the algorithm to identify the hidden real emotional states of humans through this signal.However,physiological signals,as physiological responses triggered by emotional changes,are more applicable to accurately identify the real emotions of subjects because they are difficult to be controlled and changed by humans.The paper researched multimodal fusion methods,data augmentation techniques and physiological signal-based emotion recognition methods,and analyzed the performance of different physiological signal combinations in emotion recognition tasks.The main research content of the paper is the followings.(1)To overcome the problems that the accuracy of the emotion recognition based on unimodal physiological signals is poor and the traditional features require high expertise and experience,an emotion recognition based on multimodal spatio-temporal feature fusion is proposed.The method utilizes physiological signals such as ECG,RSP and eye movement signals that can be acquired by portable devices as the input to the model.The physiological signal is converted into a one-dimensional grayscale image input to 2D-CNN to extract the spatial features,and the temporal features are extracted with LSTM.Then spatial features and temporal features are fused by a multimodal compact linear pooling layer,and emotion classification is performed using SVM classifier.The experimental results show that the fused spatio-temporal features have better emotion characterization ability and improve the accuracy of emotion recognition compared with single temporal and spatial features.(2)To overcome the problems of small data size,insufficient network training and poor recognition effect in the physiological signal-based emotion recognition method,a multimodal emotion recognition based on CNN-SVM and data augmentation is proposed.The method transforms the time-domain data and time-frequency-domain data through compression,stretching,flipping and recombination operations to obtain the augmented data.The method transforms the time-domain data and time-frequency-domain data through compression,stretching,flipping and recombination operations to obtain the augmented data.The 1D-CNN and 2D-CNN are trained by the augmented data respectively,and then the convolutional layers of the two networks are frozen.The time domain features and frequency domain are extracted with the trained network,and subsequently the two types of features extracted are fed into the SVM classifier to obtain the classification results,and finally the final results are obtained by decision-level fusion.The experimental results show that by comparing the effect of emotion recognition with different data combinations,the use of data augmentation to increase the amount of network training data effectively improves the performance of emotion recognition.(3)To overcome the problem of small data size of emotion dataset and large quality difference of data generated by different data augmentation techniques,a multimodal emotion recognition based on data-level and feature-level data augmentation is proposed.Data augmentation techniques are mainly implemented at the data level and feature level.The former is to generate the underlying data by data augmentation techniques for subsequent feature extraction or network.The latter generates new features for network training or recognition tasks based on the features extracted from the underlying data.The method performs data augmentation by data transformation and Wasserstein Generative Adversarial Networks at the data level and the feature level,respectively,and discusses the performance of different methods in a multimodal emotion recognition task.The experimental results show that data generated by data transformations such as geometric transformation,recombination and noise injection at the data level for model training can effectively improve the accuracy of emotion recognition,while the features generated by WGAN at the feature level also have good emotion characterization ability and can improve the performance of emotion recognition models.However,the quality of generated data or features varies,and the poor quality data may even affect the network training effect.In future research,work will be carried out around how to generate and select high-quality data to improve recognition accuracy.
Keywords/Search Tags:Emotion Recognition, Affective Computing, Multimodal Technology, Physiological Signal, Neural Network
Related items