Speech Emotional Recognition Research Based On Features Extraction And Multi-modal Combination

Posted on:2015-03-05

Degree:Master

Type:Thesis

Country:China

Candidate:R Dai

Full Text:PDF

GTID:2268330428981495

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

With the development of social science and technology, it is very popular to apply personal computer in everyday life. Intelligent human-computer interaction increasingly become a hot topic, at the same time, the human have started to desire to interact with the computer more and more humane and harmony. The human voice is the most natural form of communication which includes not only the semantic information but also contains a lot of emotional messages, in other word intelligent human-computer interaction is the human-computer interaction that can identify accurately the emotions. It is a key for more intelligent human-computer interaction to research the speech emotion recognition. In the current speech emotion recognition studies, there are a lot of methods of extracting emotion features and recognizing the emotion. Simultaneously, emotional speech database studied is also not uniform standard which leads to different recognition results.The significance of speech emotional information recognition is analyzed in this paper and a number of relevant knowledge is accumulated largely. Secondly, the experimental Chinese emotional speech database is recorded. This database is constituted by selecting emotional speech expressed well and the text of the statement itself without emotional information which is read by using the six kinds of different emotions such as pleasure, anger, surprise, sadness, fear and quiet. Then the26characteristic parameters such as the fundamental frequency, short-time energy, short-time zero-crossing rate, formants, and multi-fractal dimension of speech signal are detected under the different emotional states. These features are analyzed and compared and a part of them is improved effectively. The traditional method of short time average magnitude difference function(AMDF)often appears the mean downward trend that leads to finding the valleys which are not global lowest point.It also occurs the halving frequency and the doubling frequency errors in pitch tracking. To resolve this problem and enhance the valley value features, an improved algorithm based on AMDF is proposed in this paper. Finally, the recognition algorithm in this paper combined Gaussian mixture model (GMM) and support vector machine (SVM) is proposed on the basic of the analysis of the existing recognition algorithms of speech emotion in common areas. These feature parameters are evaluated by using Gaussian mixture model parameters probability distribution statistics and probability distributions of these features are treated as feature vectors which are classified by using SVM, the experimental results show satisfactory recognition outcomes.The program design of the algorithm in this paper is also completed and the above detected characteristic parameters are recognized by using GMM-SVM in Matlab simulation environment, however, the better experimental results are obtained in some experiments.

Keywords/Search Tags:

Speech emotionnal feature, Speech emotional recognition, Empiricalmode decomposition, Multi-fractal, Hybrid models of GMM-SVM

PDF Full Text Request

Related items

1	Research And Implementation Of Gaussian Mixture Model-based Speech Emotion Recognition
2	Emotional Speech Conversion And Recognition Based On The Three-dimensional PAD Model
3	Researcb Of Emotional Speech Recognition And Synthesis
4	People Independent Chinese Speech Recognition Based On HMM And ANN
5	Research And Implementation Of Speech Recognition Based On HMM/BP
6	Research On Emotion Recognition Of Speech Signal Based On HMM
7	Research On Emotion Recognition Of Monomodal Speech And Multimodal Speech Vision Based On Transfer Learning
8	Research On Key Techniques Of Speech Emotion Recognition
9	Speech Emotion Recognition Based On Multifractal
10	Research On Speech And Emotional Recognition Of Specific Speakers