Font Size: a A A

Research And Application Of Speech Emotion Recognition

Posted on:2010-11-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:1118360302958555Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of human-computer interaction technology, the research of human-computer interface has gradually entered the era of multimedia interface from the era of mechanization. As one of the key technologies in intelligent human-computer interaction, speech emotion analysis and recognition has been a hot spot. Researchers from various fields concerned about how to make the computer automatically to recognize speakers' emotional states from speech signals, and respond more targetedly and more humanly.The research significance of speech emotion recognition and the main research content of this paper are summarized firstly. Then we recall some key issues in the current studies of speech emotion, including the kinds of emotional states, the overview of emotional corpus, acoustic features of speech signals, feature dimensionality reduction, classification algorithm, and semi-supervised learning based speech emotion classification.This paper presents several models of feature selection and feature extraction. The speech emotion recognition based on a fusion of all-class and pairwise-class feature selection is a new type of model structure. It focus on the discrimination between every two emotional states, and simultaneously take the overall distribution of samples into account, so the all-class feature selection and the pairwise-class feature selection are both involved. The model structure is suitable to many classification algorithms and it can effectively improve the performance of recognition system. Feature selection based on feature projection matrix uses the projection matrix from feature extraction to evaluate the importances of initial acoustic features, and then complete feature subset selection based on the importances. The experimental results show that, compared to the feature extraction method which simply uses the projection matrix to do data mapping, this feature selection algorithm has more advantages. Through the analysis of the data, a hierarchical framework of feature extraction for speech emotion recognition selects a variety of dimensionality reduction algorithm to process different gender or different emotional states of corpus. This idea can be extended to other corpus, by constructing a suitable recognition system based on hierarchical dimensionality reductio, and it will improve recognition performance. Enhanced Lipschitz embedding algorithm based on manifold learning is a nonlinear dimensionality reduction algorithm. Through the calculation of geodesic distance, the high-dimensional feature vectors are mapped into a low-dimensional subspace. The algorithm improves the recognition accuracy dramatically in speaker-dependent and speaker-independent speech emotion recognition under controlled laboratory environment, as well as in speaker-dependent speech emotion recognition under the environment of Gaussian white noise and sinusoidal noise.In the traditional system of speech emotion recognition, each acoustic feature is regarded as one component of a simply composed feature vector which is the input of classifiers. Speech emotion recognition based on covariance descriptor and Riemannian manifold considers the the correlation between different acoustic features. The experimental results show that the correlation could reflect the emotional information, and the recognition system established on the correlation has high stability and anti-noise ability.On a small number of labeled samples and a large number of unlabeled samples, this paper presents an enhanced co-training algorithm to build a classification model based on semi-supervised learning. It introduces a restriction on label predictors to improve the standard co-training algorithm. This algorithm reduces the production of classification noises and improves the performance of classifiers.Considering the practicality of the researchs on speech emotion, this paper proposes a classification model of AdaBoost+C4.5 to analyze the emotional states of real-time speech signals. We realize a complete real-time emotion recognition model and apply it in a real-time facial animation system driven by emotional speech.
Keywords/Search Tags:Speech emotion recognition, all-class and pairwise-class feature selection, feature selection based on feature projection matrix, hierachical feature extraction, enhanced Lipschitz embedding algorithm, covariance descriptor and Riemannian manifold
PDF Full Text Request
Related items