Font Size: a A A

The Research Of Speech Emotion Recognition Algorithm Based On Imporoved Attention Mechanism

Posted on:2020-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:J H WangFull Text:PDF
GTID:2428330572461587Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
As an important medium for communication and emotional transmission between people,speech has always been an important direction of artificial intelligence research.In the research of traditional emotion recognition system,how to extract more discriminative emotion-related features has always been the subject of much attention in the research community.At present,the selection of the characteristic parameters of the system has certain blindness,and the system operation complexity and time complexity are high.Especially in some complex scene recognition,such as:large-scale speech data sets,complex mood categories,etc.,cannot effectively describe the complex spatial distribution of speech data,the use of context information for speech is also extremely limited,the traditional speech emotion recognition method can not A good solution to the above problem.As a model that can be "self-learned",the neural network model has been proved to be effective in solving the problem of feature extraction classification.In this paper,based on the deficiency of traditional speech emotion feature learning method,based on the principle of attention mechanism,a deep learning emotion recognition optimization algorithm based on Improved Attention Mechanism(IAM)is proposed.The algorithm mainly proposes an improved method.Attention Itti(Attention Itti);Then,based on the global feature loss problem of AItti model,a deep learning emotion recognition optimization algorithm based on improved spatial weight is proposed.Based on the AItti model,a constrained space weight is proposed.Network(Constraint-Space-Weight Networks,CSWNet),the specific research content is as follows:(1)Adeep emotion recognition optimization algorithm based on improved attention mechanismBased on the spectral map technology,the algorithm combines the IAM principle of the image extraction model with the acoustic characteristics,and proposes a new feature extraction model to realize the feature extraction of speech emotion.The method comprises the following steps:pre-processing the speech signal to extract the spectrogram;then obtaining the salient map through the improved attention mechanism model,mainly extracting four attention maps through the Gaussian pyramid and the LBP multi-scale algorithm,and then passing through the central periphery The difference and the auditory sensitivity are weighted to obtain the final saliency map;the saliency map is obtained by the fine-tuned hybrid neural network to obtain a final emotional feature representation of the audio,combined with the labeled label for supervised training,and finally the final score is obtained by the classifier.Seven kinds of sentiment classification evaluations were carried out on the natural database FAU-AEC.The recognition rate of emotion-related features learned by this method was significantly higher than that of traditional acoustic features and benchmark models under the same conditions.The performance of the model is evaluated.The algorithm can increase the distance between classes and improve the system recognition rate.(2)A deep learning emotion recognition optimization algorithm based on improved spatial weight structureThe The feature extracted based on AItti is a Strong Emotion Feature(SEF)compared with the traditional global feature,but the spectral map after the model processing will lose some global information.This feature may have some influence on the emotion recognition..Based on this,the paper proposes an algorithm based on improved spatial weight structure deep learning emotion recognition optimization.The main steps of the method are:extracting the primary features from the first two elements of the hybrid neural network;constructing the CSWNet structure to obtain the weight features,mainly through the feature space transformation and the threshold judgment,giving the SEF features similar from the spatial dimension The feature has high weight,other features have low weight,and the calibration weight feature is obtained.Finally,the calibrated weight feature obtains a final depth emotional feature representation of the audio through the unit after the fine-tuned hybrid neural network,and the supervised training is carried out in combination with the labeled tag.The classifier gets the final rating.Experiments in the natural database FAU-AEC and the German German database(EMO-DB)demonstrate the validity and good generalization of the model.Evaluate the complexity and performance of the model.In the presupposition the model has a small increase of the complexity,the recognition rate and emotional discrimination of the model are improved.
Keywords/Search Tags:emotion recognition, deep hybrid neural network, attention mechanism, squeeze-and-excitation networks, constraint-space-weight networks
PDF Full Text Request
Related items