Font Size: a A A

Research On Speech Emotion Recognition Based On Deep Learning

Posted on:2023-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:M HuangFull Text:PDF
GTID:2568307034482694Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence and social networks,more and more intelligent devices are changing people’s way of life and communication.Voice is one of the most natural and effective ways of human communication.With the deep integration of intelligent devices into people’s life,it is particularly important for machines to understand human emotions.Speech emotion recognition has become one of the hot research directions in the field of speech.As the mainstream model of speech emotion recognition in recent years,speech emotion recognition model based on deep learning also has some problems to be solved.At present,speech emotion recognition system has the problems of low emotion recognition rate and unclear emotion characteristics.How to design the network structure,extract effective speech emotion features and improve the existing speech emotion recognition model is the focus of research.Therefore,it is undoubtedly an important research topic of speech emotion recognition to design the network structure based on the mature depth model and select the emotional features that can improve the emotion recognition rate.In view of the above background,the main research and innovation of this paper are as follows:1.According to the traditional acoustic characteristics and CRNN network,a speech emotion recognition model based on HSF-ACRNN is proposed.Firstly,an endto-end speech emotion recognition model is established,and then the effects of different features on speech emotion are explored.Finally,through experiments,it is found that the traditional acoustic features combined with depth neural network to extract depth features can better improve the accuracy of speech emotion recognition.Based on this,this paper cuts and maps the original audio data differently,and designs an HSF-ACRNN emotion recognition model,which can better distinguish the types of emotions.2.The speech emotion features from different angles are extracted and fused,and a multi feature speech emotion recognition model based on time pyramid pooling is proposed.According to previous studies,the features extracted from different angles perform differently in different data sets,and the fusion of features can better and stably identify the emotion categories of different speech data sets.Based on this,this paper extracts the audio emotion features from different angles,obtains the audio segment level features through the time pyramid pooling algorithm,and fuses the features.Experiments are carried out on three speech emotion data sets,and the improvements are 3.8%,4.8% and 17.41% respectively.3.In order to explore the influence of prosody on emotion,a speech emotion recognition model combined with prosody cleaning is proposed.The classification features of emotion should be non-personalized emotional features,and the network model of classification should be further improved and explored deeper speech emotion features combined with cognitive theory.Non-personalized features such as screening and strengthening prosody will be better for emotion recognition.This paper combines the self-supervised learning model to extract speech representations and cleans the prosodic features through the prosodic encoder network,which has a certain improvement in the recognition of speech emotion categories.This paper studies the selection of speech emotion features and the optimization of network structure.Combined with the existing results,through the analysis of experiments on different data sets,we obtain better emotional features for speech emotion recognition results,and improve speech emotion recognition.Finally,the influence of non-specific personalized features such as the rhythm of the audio itself on emotion recognition was explored,and good recognition results were obtained.
Keywords/Search Tags:Speech emotion recognition, Deep learning, Time pyramid pooling, Feature fusion, Rhythmic cleaning
PDF Full Text Request
Related items