Font Size: a A A

Research On Speech Emotion Recognition Method Based On Hybrid Neural Network

Posted on:2022-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y YuFull Text:PDF
GTID:2518306548961159Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the wide application of man-machine interaction in People's Daily life,the research of speech signal direction becomes more and more important,and speech signals contain more than just basic semantic information,there are also implicit emotional states.Speech emotion recognition,as one of the key technologies of speech signal processing,has been paid more and more attention by researchers.In order to improve the accuracy of speech emotion recognition and solve the complexity of speech emotion features,the deep learning model is improved.The main work and innovations of this paper are as follows:(1)For long short-term memory network in the current cell state calculation,easy to lose the previous sequence of feature information,therefore,the long short-term memory network is improved.Link the previous unit state to the forgetting gate and the input gate as peepholes,adds the state of the previous unit to the calculation of the gated unit,to ensure the integrity of the current information state,and integrate with the self-attention mechanism.(2)For the multi-head attention mechanism,after low-dimensional projection of speech features,the calculated total parameter will be different from the total value of the joint distribution in the attention mechanism,and it is difficult to approximate the total value of the joint distribution,thereby affecting the subsequent model expression.Therefore,the multi-head attention mechanism is improved,and the low-rank distribution and similarity of each attention are superimposed and calculated,connect each of the originally independent sub-attention mechanisms,and then normalize them to calculate the final feature expression.(3)In view of the fact that a single deep learning model cannot accurately identify the emotional categories of speech.In this paper,a two-channel network model is proposed,in which convolutional neural network is used to extract spatial features from spectral images and bidirectional long short-term memory network is used to extract temporal features.And in order to further extract the feature vectors of high importance and fuse the features,using the improved multi-head attention mechanism method,the feature extraction results of the two-channel model were calculated by attention,complete connection operation is carried out,and classification output is carried out by classification layer.In order to verify the effect of the algorithm,the two models proposed in this paper will be compared and tested on the EMO-DB and IEMOCAP data sets.In the paper,the traditional neural network model and the existing model with better effects are selected as the baseline model for the comparison experiment,and the ablation experiment is used to verify the effectiveness of the innovation point.The experimental results show that the speech emotion recognition model proposed in this paper has achieved better accuracy than the comparison model on the two data sets,and the recognition effect in the ablation experiment is better than the comparison model,verifies the effectiveness of the innovation points,and proves that the model proposed in this paper can achieve better results in speech emotion recognition.
Keywords/Search Tags:CNN, RNN, Speech emotion recognition, Attentional mechanism, Deep learning
PDF Full Text Request
Related items