Font Size: a A A

Research An Speech Dimensional Emotion Recognition Method In Social Media

Posted on:2020-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ChenFull Text:PDF
GTID:2428330623467021Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Speech emotion recognition has always been a research focus in the filed of artificial intelligence.Due to the complexity of emotion itself and the lag of the update of emotional theory,there is still a big gap between current research and mature application.Therefore,giving the research situation and demands in these days,this thesis struggled to make research on SER at different levels,from the feature fusion over recognition methods to practical applications in social media scenarios.The main research contents include:(1)Mel Frequency Cepstral Coefficient(MFCC),has the problem of neglecting the correlation between the adjacent frame spectral features due to frame division processing,making it susceptible to loss of much useful information.To solve this problem,this thesis proposes an improved method,which extracts the time firing series feature and the firing position information feature from the spectrogram to supplement the MFCC,and applies them in speech emotion estimation respectively.Based on the predicted values,the proposed method calculates the correlation coefficients of each feature from three dimensions,P(Pleasure-displeasure),A(Arousal-nonarousal),and D(Dominance-submissiveness),as feature weights and obtains the final values of PAD in emotion speech after the weighted fusion,and finally maps it to PAD 3D emotion space.The experimental results show that the two added features could not only detect the emotional state,but also consider the correlation between the adjacent frame spectral features,complementing to MFCC features,improve speech emotion recognition accuracy.(2)Conventional context-based speech emotion recognition system risks of losing the context details of the label layer and neglecting the difference of the two-level due to solely limited to the feature layer.This thesis proposed a Bidirectional Long ShortTerm Memory(BLSTM)network with embedded attention mechanism combined with hierarchical context learning model.The model completed the speech emotion recognition task in three phases.The first phase extracted the feature set and used the SVM-RFE feature sorting algorithm to reduce the feature in order to obtain the optimal feature subset;in the second phase,the feature subset was input into the BLSTM network learning feature layer context to obtain the initial emotional prediction result;the third phase used the emotional value to train another independent BLSTM network for learning label layer context information.According to the information,the final prediction was completed based on the initial result.The experimental results show that performance is better optimized than the baseline model.(3)Aiming at the emotional characteristics of voice conversation in spcial media application scenarios,Firstly,the UcanUB-Voice speech emotion database for training and testing is constructed by exporting and editing the speech data of the debate program.The database has rich emotion types,many dialogue themes,close to real life,and conforms to the expression habits,it has laid a reliable and effective data foundation for the PAD prediction model train and test.Then,the dimensional speech emotion PAD prediction model is proposed by integrating the former feature fusion and recognition method,the experimental results show that the model can improve the recognition accuracy without losing the time cost,and achieve better recognition results in social media scenarios.
Keywords/Search Tags:Speech emotion recognition, Dimensional emotion model, Attention mechanism, Label layer context
PDF Full Text Request
Related items