Font Size: a A A

Speech Emotional Recognition Research Based On Features Extraction And Multi-modal Combination

Posted on:2015-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:R DaiFull Text:PDF
GTID:2268330428981495Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the development of social science and technology, it is very popular to apply personal computer in everyday life. Intelligent human-computer interaction increasingly become a hot topic, at the same time, the human have started to desire to interact with the computer more and more humane and harmony. The human voice is the most natural form of communication which includes not only the semantic information but also contains a lot of emotional messages, in other word intelligent human-computer interaction is the human-computer interaction that can identify accurately the emotions. It is a key for more intelligent human-computer interaction to research the speech emotion recognition. In the current speech emotion recognition studies, there are a lot of methods of extracting emotion features and recognizing the emotion. Simultaneously, emotional speech database studied is also not uniform standard which leads to different recognition results.The significance of speech emotional information recognition is analyzed in this paper and a number of relevant knowledge is accumulated largely. Secondly, the experimental Chinese emotional speech database is recorded. This database is constituted by selecting emotional speech expressed well and the text of the statement itself without emotional information which is read by using the six kinds of different emotions such as pleasure, anger, surprise, sadness, fear and quiet. Then the26characteristic parameters such as the fundamental frequency, short-time energy, short-time zero-crossing rate, formants, and multi-fractal dimension of speech signal are detected under the different emotional states. These features are analyzed and compared and a part of them is improved effectively. The traditional method of short time average magnitude difference function(AMDF)often appears the mean downward trend that leads to finding the valleys which are not global lowest point.It also occurs the halving frequency and the doubling frequency errors in pitch tracking. To resolve this problem and enhance the valley value features, an improved algorithm based on AMDF is proposed in this paper. Finally, the recognition algorithm in this paper combined Gaussian mixture model (GMM) and support vector machine (SVM) is proposed on the basic of the analysis of the existing recognition algorithms of speech emotion in common areas. These feature parameters are evaluated by using Gaussian mixture model parameters probability distribution statistics and probability distributions of these features are treated as feature vectors which are classified by using SVM, the experimental results show satisfactory recognition outcomes.The program design of the algorithm in this paper is also completed and the above detected characteristic parameters are recognized by using GMM-SVM in Matlab simulation environment, however, the better experimental results are obtained in some experiments.
Keywords/Search Tags:Speech emotionnal feature, Speech emotional recognition, Empiricalmode decomposition, Multi-fractal, Hybrid models of GMM-SVM
PDF Full Text Request
Related items