Font Size: a A A

Research Of Robot Speech Emotion Recognition Based On Time-Frequency Context Information

Posted on:2020-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:W WeiFull Text:PDF
GTID:2428330590995684Subject:Engineering
Abstract/Summary:PDF Full Text Request
Speech emotion recognition has always been a research hotspot in the field of computer vision and machine learning.The concept of "emotional computing" has attracted the attention of many emotional analysts at home and abroad in recent years.The speaker's voice signal often contains much emotional information to help him communicate information better.When someone express the same sentence with different emotions,the conveyed information maybe different.It is necessary to improve the accuracy of the speech emotion recognition in order to make the computer understand the human emotion better.Nowadays,speech emotion recognition is more and more widely used in human-computer interaction fields such as manual customer service,distance education,medical assistance,and automobile driving.The different moods of the speech signal may convey different emotions,it is a very meaningful task for the computer to recognize the speech emotion accurately.Many publicly available emotional speech database recognition algorithms have poor recognition effects on speech signals collected under unrestricted conditions.These emotion recognition algorithms still have a large gap from actual applications.Traditional emotion recognition algorithms only use time information of speech or frequency domain information,but the change of emotion is a dynamic process,which has very obvious dynamic change characteristics in the process of change,called emotional context information.The information is generally represented by continuous frame speech information.We often use the LSTM algorithm to acquire the emotional context information.The context information features of the speech emotion can effectively improve the emotion recognition accuracy.In view of the above,this paper adopts a method based on time-frequency context information to improve the accuracy and robustness of speech emotion recognition.The research content of this paper is as follows:(1)Investigated the commonly used speech emotion feature extraction algorithm and classification method,and introduced the classical speech emotion recognition method,compared the recognition accuracy of various speech emotion recognition methods,and analyzed the pros and cons of these recognition methods.;(2)A feature extraction method for robot speech emotion recognition is proposed.Since convolutional neural network has achieved great success in the field of recognition in recent years,this paper introduces it into the field of emotion recognition and achieves good results;In addition to the time domain context feature of speech information,this paper introduces and introduces the context information features of the speech and audio domain to identify and improve the recognition accuracy of the whole system.(3)A method of merging time-frequency context information is proposed.The extracted time domain and frequency domain context information features are merged,and experiments are carried out on well-known speech emotion data sets,and a good recognition effect is obtained.Finally,the speech emotion recognition method based on time-frequency context information is applied to the speech emotion recognition module of the robot back-end to realize the speech emotion recognition function of the intelligent robot.
Keywords/Search Tags:speech emotion recognition, time domain context, frequency domain context
PDF Full Text Request
Related items