Font Size: a A A

Research On Speech Emotion Recognition Technology Based On Feature Research And Multi-Output BLSTM

Posted on:2020-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2518305954487874Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Speech not only conveys semantic information but also conveys emotional information.The computer enables its users to perform their tasks efficiently and with high quality through natural interaction.With the continuous development of speech emotion mining,it has practical applications in social services,medicine,security,industrial control and other related fields,but due to the uncertainty of emotional definition and the lack of a unified database of establishing standards and the blurring of emotional characteristics.Sexuality makes speech emotion recognition technology a difficult problem,and there are still many difficulties to be overcome.In order to solve the problem that the recognition rate in speech emotion recognition is not high and the migration learning can not be done and the robustness is poor,this paper improves the following aspects:1.Proposed improved MFCC features,EEMFCC and FOMFCC.The prosody features and spectral features are combined to extract improved MFCC features.In the experiment,the improved MFCC feature,the combination of EEMFCC and FOMFCCC and traditional features,using SVM as the identification method,obtained 85.59% recognition rate on the EMODB database,which was 2.68% higher than the MFCC feature without adding these two improvements.The recognition obtained on the EMODB database is good.However,due to the high redundancy of the fusion features of the combined features,the timeliness is not good,and from the experimental results,the 'happy' recognition rate is not very high,the highest recognition rate is 54.17%,and the experiment does not effectively improve the 'happy' emotion recognition rate,so there are still some inadequacies in the features selected in the experiment,and there is still room for improvement in the validity and robustness of the features.Therefore,the preferred experiments of the features are performed below.2.In view of the above problem of low happiness recognition rate and high feature redundancy,it is hoped that the recognition rate can be improved by selecting an appropriate combination of features.The BP algorithm is used for feature selection.In order to select the features that contribute to the network,the importance of the feature is measured by the sensitivity of the input node signal.The feature that is valid and not redundant is selected andsent to the classifier.In the experiment,the characteristics of the feature optimization were applied,and the SVM was used as the recognition method,and the recognition rate of 85.66%was obtained on the EMODB database.The BP feature selection algorithm is used to optimize the feature and reduce the redundancy of the feature.Only 8 features are used to achieve the recognition result slightly better than the combination feature of the previous section,which improves the recognition efficiency and achieves good in the traditional recognition algorithm.The recognition effect indicates the validity of the set of features.However,from the experimental results,the recognition rate of ‘happy' is still 54.17%,and the feature optimization experiment failed to improve the recognition rate of ‘happy' emotions.The literature found that the EMODB database's confusion matrix ‘happy' emotions are low,and it is highly mixed with‘anger',because of its fast speech expression,strong tone expression and strong strength.Therefore,the classification algorithm used in the experiment still has some shortcomings,and it is necessary to introduce a classification algorithm which is more suitable for the field of speech emotion recognition.The multi-output BLSTM network model is introduced in the next chapter to obtain better recognition results and better robust mobility.3.Since the LSTM structure makes full use of the timing information of the speech,the bidirectional LSTM,that is,the BLSTM reverse timing information is also extracted.And inspired by the cross-layer connection of ResNet,considering that the different layers of LSTM have output,if combined,the features can be more fully utilized.The BLSTM multi-layer information is output,and each layer feature of the BLSTM is added and merged.In fact,the high-level network information is supplemented by the low-level network information,thereby achieving a good recognition result.On the basis of feature optimization,the segment features of111-frame 70-segment/segment are extracted,and the multi-output BLSTM network model is applied to obtain a good recognition effect.On the basis of feature optimization,the segmentation feature of 111-frame 70-frame/segment and the recognition method of three-layer six-output BLSTM are applied.The recognition rate of WA is 91.17% and UA is 89.79% on the EMODB database.The recognition effect is good.The recognition rate of the EMODB database in this paper is at a fairly high level with the current research results.The validity and robustness of the preferred features and classification models are verified on multi-and multi-databases.
Keywords/Search Tags:Speech emotion recognition, improved MFCC feature, BP feature selection algorithm, deep learning, multi-output BLSTM
PDF Full Text Request
Related items