Font Size: a A A

Research On Affective Computing Methods Based On Auditory Cognition

Posted on:2023-10-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:1528307376481054Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Giving machines skills of emotional intelligence will be the upcoming breakthrough in artificial intelligence.Improving the ability to recognize and respond intelligently to emotion can make a human-machine interaction more natural.To realize the emotional intelligence of machines,the fundamental step is to accurately identify human emotional states and capture the changes in emotions.Most of the current emotion recognition methods are based on statistics or machine learning methods.These methods lack the guidance of brain science and cognitive science.The emerging field of affective computing focuses on exploring the neural mechanisms of emotion and developing brain-inspired computational models of emotion.Rising to the challenge of auditory affective computing,this thesis focuses on the cognitive mechanisms of auditory emotion,emotion recognition method based on EEG signals,and brain-inspired speech emotion recognition technology.Our research aims to accurately identify human emotions from speech or EEG signals and ultimately endow machines with emotional intelligence for improving natural humanmachine interaction.The main contents include the following aspects:(1)Aiming at solving the problem of modeling the dynamic process of auditory emotion perception,this thesis proposes a dynamic modeling method based on microstate analysis.As the complexity of the emotional cognitive process,determining the optimal number of microstates automatically is a challenge for applying microstate analysis to emotion.This research proposes dual-threshold-based atomize and agglomerate hierarchical clustering(DTAAHC)to determine the optimal number of microstate classes automatically.This method uses two optimization criteria,named global explained variance(GEV)and global map dissimilarity(GMD),to estimate the quality of the candidate microstates during clustering.After microstate classes are identified,the original individual EEG data can be labeled as a microstate sequence,by fitting back of these microstate classes to topographies at the sample point.Then,temporal parameters of the microstate sequence are extracted for explaining the temporal dynamics of the brain’s response to emotional stimuli.In the speech emotion cognition experiment,it was found that under different emotional experiences,the brain has similar and finite activation patterns(microstate classes).The differences in the duration,the frequency of occurrence,and states transition of each microstate can reflect the differences in emotion.These findings reflect the temporal dynamics in speech emotion perception.(2)Considering that the dynamics of emotional perception are ignored by traditional EEG feature extraction,this thesis proposes a multi-granularity feature extraction method from the microstate sequence for emotion analysis.First,we use detrended fluctuation analysis and multi-scale permutation entropy to analyze the long-range correlation and multi-granularity characteristics of EEG signals.Then,we propose a multi-granularity feature extraction method based on the microstate sequence.Considering that the microstate sequence is composed of non-metric random variables,this method uses the k-mer frequency to extract the fine-grained features of the microstate sequence.k-mer frequency can measure the similarity between sequences and effectively characterize the temporal relationship under finegranularity.At the same time,the method extracts four temporal parameters as coarsegrained features.Finally,coarse-grained and fine-grained features are fused for EEGbased emotion recognition.The experimental results show that the effectiveness of the proposed method in EEG emotion recognition is verified.(3)Aiming at the problem of lack of effective representation in EEG timefrequency feature extraction,this thesis proposes an EEG emotion recognition method based on modified time-frequency features of intrinsic mode and SPP-net.The traditional ensemble empirical mode decomposition method decomposes nonlinear non-stationary EEG signal into restricted intrinsic mode functions(IMFs).In this way,the signal is automatically separated into different time scales.As the added noise cannot be filtered completely,spurious modes are generated due to the residual noise.Therefore,it is crucial to perform intrinsic mode function(IMF)selection to find the most valuable IMF components that represent brain activities.Furthermore,the number of decomposed IMFs is various from different original signals,thus how to unify feature dimensions needs better solutions.To solve these issues,we propose a denoising ensemble empirical mode decomposition method to effectively eliminate residual noise in the IMFs and select the most valuable IMFs.It defines evaluation criteria based on various statistical indicators to select IMF components containing important EEG information.Then,time-domain and frequency-domain features are extracted from the selected IMFs.Finally,SPP-net is employed as the classifier to recognize emotions,which can effectively transform various-sized feature maps into fixed-sized feature vectors through the pyramid pooling layer.The experimental results demonstrate that our proposed method can effectively reduce the effect of spike-in white noise,accurately extract EEG features,and significantly improve the performance of emotion recognition.(4)Aiming at the problem of feature extraction for speech emotion recognition,this thesis proposes a novel speech emotion recognition framework based on braininspired temporal features and multi-granularity attention LSTM.This method is inspired by the multi-granularity characteristics of auditory emotional processing.First,we extract emotional features at frame granularity,segment granularity(based on vowel-like regions),and global granularity respectively.These multi-granularity features represent the context information of time series.Then,we construct a multigranularity attention LSTM network to capture temporal relationships at various temporal granularities.The network uses LSTM units to model the frame and segment granularity features respectively.In addition,it uses the attention mechanism to calculate the emotional attention weight at different times.According to the weight,it calculates the emotional semantic vector(ESV).By fusing the ESV with the global granularity feature,a fully connected layer is used to realize speech emotion recognition.The experimental results show that the proposed method can better extract emotional features with temporal relationships.Compared with the existing methods,this method can effectively improve the accuracy of speech emotion recognition.
Keywords/Search Tags:Auditory Affective Computing, Microstate Analysis, Ensemble Empirical Mode Decomposition, Multi-granularity Feature Extraction, Emotion Recognition
PDF Full Text Request
Related items