Font Size: a A A

Research On Speech Emotion Recognition Of Terminal Based On Convolutional Neural Network

Posted on:2024-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2558307163988529Subject:Engineering
Abstract/Summary:PDF Full Text Request
Emotion is one of the basic elements in human communication and communication.An important expression of emotion is human emotional state.Emotional states play an important role in human interaction with all parties,affecting attention,problem solving,decision-making,work,life,entertainment,and strategies chosen to interact with computers and applications.Speech affective recognition has become an unavailable part of advanced speech processing systems along the road of human-computer interaction research.Speech affective recognition extracts meaningful semantics from speech to improve the performance of speech recognition systems.This paper aims to study the speech emotion recognition method of mobile terminals based on convolutional neural network,and apply the speech emotion recognition algorithm proposed in this paper to the speech emotion online recognition system,so as to use the information in the speech to judge the emotion.The main research contents and innovations of this paper include:● Construct an efficient and lightweight full-convolution neural network LightFCN for speech emotion recognition.A hierarchical deep learning model is set up to automate feature extraction.Three parallel convolution neural networks are used to extract features with different attributes from the Meyer cepstrum coefficient energy map.This helps to extract advanced features from deep convolution blocks while ensuring sufficient separability.The extracted features are fed back to the deep convolution neural network,which is used to classify the emotions of voice signal segments.Finally,the final prediction results are obtained through the normalized exponential function to complete the final speech emotion recognition.Compared with the existing models,the Light-FCN network has a smaller size and achieves the same or higher recognition performance on multiple datasets.● Research a speech emotion recognition method based on multi-scale feature representation to learn multi-scale feature representation with global perception fusion module to represent emotional information.Multiscale characteristics indicate that the module uses an Identity Mapping multilayer residual network to parallelize convolution layers of different sizes at the same level.The convolution kernel iterations at different scales are used to learn multiple feature representations.Then,a global perception fusion module is used to get the most important information globally.The IEMOCAP database is used to validate the model.Compared with the advanced methods,the network improves the performance of the listed indicators and proves the validity of global perception fusion and multiscale feature representation.● Based on the speech emotion recognition algorithm model proposed in this paper,the method and application are combined,and an interactive speech emotion online recognition web page is designed and implemented,which verifies the practicability of the model.The interactive voice emotion online recognition web page uses Spring MVC back-end framework to complete the development of Web site back-end functions,and uses HTML,CSS,Java Script language to complete the design of Web site front-end pages.The website can monitor the speaker’s emotional state in real time,record the speaker’s voice information and store it in the database,so as to establish a more complete voice emotional dataset in the future.
Keywords/Search Tags:Speech Emotion Recognition, Speech Signal Processing, Full-convolution Network, Multiscale Feature Fusion, Global Perception
PDF Full Text Request
Related items