Font Size: a A A

Applied Gaussian Mixture Model In Speech Emotion Recognition Research

Posted on:2017-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:G L CaiFull Text:PDF
GTID:2348330488475447Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The speech emotion recognition, which analyzes the speech signal to identify the emotional state of the speaker, is an important branch of artificial intelligence (AI). With the interaction and development of computer science and various related other disciplines such as physiology, psychology and statistics, the speech emotion recognition technology has gained remarkable progress. Human emotions are both of subjectivity and complexity, the investigation in emotion modeling and emotion computation implies a great theoretical and practical significance. With the development of computer technology and machine intelligence technology, more and more artificial intelligence equipment and products will be devised, and will be widely used in various fields of social life, such as the education sector, the medical sector, the services industry and industrial areas. This thesis makes an attempt to systematically investigate the speech emotion recognition problem based on Gaussian mixture modelFirstly, the experimental corpus based on the theory of emotion is established and four basic emotional states, anger, joy, fear and sadness, are selected as the recognition identity in the paper. Then, the speech signal is preprocessed for extracting the emotional features effectively.Secondly, due to those extracted features have direct impacts on the recognition results, thus this thesis extracts characteristic parameters taking account three aspects:prosody, acoustics and spectrum. These features are the five types:speech rate, short-term energy, pitch frequency, formant parameters and Mel Frequency Cepstral Coefficients (MFCC). After a series of feature extraction, calculation and analysis, finally 21 different speech emotion features based on these five types are selected as the input parameters to GMM.Thirdly, different emotional features and different recognition models, in the experiment, are selected for finding the useful emotion feature which can effectively distinguish a variety of emotions. The support vector machine (SVM) model is firstly used for the emotion recognition. Upon the comparison of the recognition results of the different kernel functions, the SVM model, finally, is implemented by linear kernel function. And the grid search method is used to determine the optimal parameter for the SVM model. Then the decision tree model and the hierarchical model are used to recognize different emotions together.Finally, four basic emotions, on the training data set of the corpus, are modeled by the Gauss mixture model (GMM) using the selected 21 speech features. In the process of modeling, the maximum likelihood estimation (MLE) and expectation maximization (EM) algorithm are used to optimize different parameters of GMM. The maximum likelihood, in the training process for optimizing parameter, gives a detailed deduction about the iterative process until the final convergence for the EM algorithm. The optimal Gauss mixture model is ultimately determined and is used to recognize different emotions on the test data set, while the experimental results of support vector machines, decision tree and hierarchical model is analyzed in detail. The comparative analysis of the results of four models illustrates that the Gaussian mixture model has a stronger ability to recognize all selected basic emotions and also verifies that those parameter estimated by the Gauss mixture model has a significant impact on the recognition rate of this model.
Keywords/Search Tags:Speech Emotion Recognition, Feature Extraction, Support Vector Machine, Gauss Mixture Model, Expectation Maximization
PDF Full Text Request
Related items