Font Size: a A A

Application Research Of Dubbing Emotion Recognition Based On Deep Learning

Posted on:2023-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:J C LiFull Text:PDF
GTID:2545306800460134Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of film and television animation,audio books,radio dramas and other industries,dubbing has gradually entered the public’s field of vision.More and more dubbing enthusiasts hope to learn dubbing in their spare time,which can not only increase their income,but also realize their "dubbing dream".However,dubbing seems to have a low threshold.In fact,it needs professional learning and systematic training and practice.The expression of dubbing emotion is one of the key factors to measure the dubbing quality.The teaching effect of various online dubbing quick training classes on the Internet is mixed,which is difficult to provide real-time guidance for students’ dubbing training practice.And students can not accurately evaluate their dubbing emotional expression effect.The learning effect is poor,and the final learning success rate is also relatively low.Based on this,this paper makes an indepth study on speech emotion recognition technology,aiming to obtain an efficient and generalization ability prominent speech emotion recognition method by improving the feature extraction method and emotion classification model.And apply it to the dubbing emotion evaluation system to provide online dubbing practice platform for dubbing learners,assist them to detect the emotional expression effect of their own dubbing recording,and reduce the threshold of dubbing learning,Improve the efficiency and professional level of dubbing learning.The main research contents of this paper are as follows:1)In terms of speech emotion feature extraction,in order to solve the problem of low recognition rate caused by insufficient emotion information of Mel frequency cepstrum coefficient extracted by the conventional methods.This paper not only extracts the 3D Log-Mel features caused by the first-order and second-order differential structure of MFCC,but also extracts the Mel spectrum coefficients Mid-MFCC and IMFCC in the medium and high frequency band.These two features are fused with 3D Log-Mel through the second-order pooling method based on vector outer product.The experimental results on CASIA and IEMOCAP corpus show that the improved fused MFCC feature set has a better recognition effect than single or any two MFCC fusion feature sets.2)For the improved fused MFCC feature set extracted in 1),a single feature loses the emotional information such as time factors of speech.Then the global feature set is introduced to fuse the feature set in 1)to supplement the emotional information.And the feature-level and decision-level fusion experiments are carried out on the 3D CNNLSTM model,which proves that the emotion recognition rate of the fusion of the two feature sets is higher than the recognition result of a single feature set.In terms of speech emotion recognition model,aiming at the fact that 3D CNN-LSTM model cannot effectively obtain the emotional information in the large amount of data of fusion features,this paper successively introduces the attention mechanism and Bi LSTM optimized by highway network,and obtains the 3D CNN + HBI LSTM model based on attention mechanism.The comparative experiments on two feature fusion methods at feature level and decision level verify the superiority of the improved model.3)An online dubbing emotion evaluation system was designed.The optimal speech emotion recognition scheme obtained from the experiment is applied to the dubbing emotion system.And a dubbing emotion evaluation system with separated front and rear ends is built.The system supports online dubbing input and dubbing file upload.The background will carry out emotion recognition on the uploaded dubbing data,and then combine the obtained recognition results with the textual emotions of the corresponding sentences to give comprehensive evaluation feedback,and then combine the obtained recognition results with the textual emotions of the corresponding sentences to give comprehensive evaluation feedback.The system can provide diverse dubbing emotion detection and evaluation services for different users,which proves the practical application research value of speech emotion recognition in the dubbing emotion evaluation system.
Keywords/Search Tags:Improved fused MFCC, Feature fusion, Attention mechanism, Dubbing emotion recognition
PDF Full Text Request
Related items