Font Size: a A A

Application Research Of Mixed Feature Speech Emotion Analysis Based On Deep Learning

Posted on:2024-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y GeFull Text:PDF
GTID:2568307100989209Subject:Electronic information
Abstract/Summary:PDF Full Text Request
The world is constantly changing,and voice emotion analysis technology,as a developing background music,is driving the wheels of history forward.With the development of film and television entertainment,more and more speech programs have entered the public eye.A good speech can convey correct values to the public,and can guide people in dark times.A good speech requires systematic learning and training.Among them,emotional expression during the speech process is one of the most critical considerations in an excellent speech performance.In today’s world of highly developed online information,there are a large number of online crash courses luring enthusiasts to learn.The quality of these classes is uneven,and their guidance to learners is truly worrying.Based on this situation,this article conducts research on speech emotion analysis technology,improves the speech feature extraction and fusion scheme and speech emotion analysis model,and applies them to speech learning systems for people to systematically and efficiently learn speech.(1)Optimization of speech emotion feature extraction fusion scheme.Due to Mel filters,the high-frequency parts of speech audio become sparse,which can lead to signal leakage.Relatively speaking,Gammatone filters perform significantly better than Mel filters in the high-frequency part of speech.So this article fuses the results of the two feature extraction,fully utilizing the high effectiveness of MFCC features in emotion recognition,as well as the good noise resistance and speech tracking ability of GFCC.(2)Optimization of speech emotion classification model.There are two steps in the optimization of speech emotion.First,in response to the problem of information overload caused by excessive parameters in the CNN-Bi GRU model,we introduce an attention mechanism to lock key information in massive information input.Secondly,due to the gradient vanishing or exploding problem for CNN networks,we introduce Dense Net,a dense convolutional network,to reduce the number of parameters while slowing down the phenomenon of gradient disappearance.Through these improvements,we jointly conduct speech emotion analysis based on feature extraction and fusion optimization,and conduct experimental verification to prove the advantages of the improved model.(3)Building a speech emotion analysis system.Apply the improved speech emotion analysis model to the speech emotion analysis system.While building a user visual front-end operation interface,the design background processes and analyzes user data,and then comprehensively analyzes the results obtained and returns them to the front-end user interface.The speech emotion analysis system can provide an excellent learning platform for speech enthusiasts to conduct real-time analysis and feedback on speech audio,and provide practical application value for speech emotion analysis models at the same time.
Keywords/Search Tags:Feature Fusion, Deep Learning, Attention Mechanism, Speech Emotion Recognition
PDF Full Text Request
Related items