Font Size: a A A

Speech Emotional Recognition Research Fuses Facial Expression

Posted on:2013-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhaoFull Text:PDF
GTID:2248330395965664Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The emotion plays an important role in human perception and policy decision process.Human emotion is expressed mainly through language, facial expression, posture, etc way.Speech is the acoustic manifestations of language, which is the most common and effectiveway in human communication. The speech emotional recognition technology is making thecomputer receive speech signals to obtain the person’s emotional perception of the trueintention. While, facial expressions to express the emotional states mainly through eye, faceand mouth muscle changes. The facial expression recognition is to make the computer thinkingand reasoning, use the prior knowledge of human emotion, analysis and understand humanemotions. In recent years, with the development of emotion recognition, its theoretical valueand application prospect has been widely recognized in human-computer interaction,psychology and other fields.At present, the emotional recognition methods relying on a single mode of speech, facialexpressions, physiological signals, is more common, and has been made some achievements.But human beings express emotions of multiple channels by speech, facial expressions, tactileand so on. The emotional recognition relying solely on single mode information, have manylimitations, because it can not be fuses the emotional signals of different nature complementary,and unable to satisfy the current practical needs. Therefore, the study of emotional recognitionneeds development along the direction of multi-modal. It can be said that the bimodal emotionrecognition can use two channels information complementary to improve the robustness andemotional recognition rate of classifier, is the main way of emotion research at the moment.In order to improve the emotional recognition rate of single mode, the bimodal fusionmethod based on speech and facial expressions is proposed. First, we classify the emotions,establish emotional database which include speech and facial expressions. For differentemotions, calm, happy, surprise, anger, sad, we extract the prosodic feature parameters ofspeech signals and select the classifier method to recognize the speech emotions. Then weanalyze the bimodal emotional recognition of fusing facial expression information, whichinclude feature extraction, classification recognition and the level fusion algorithm, obtain therecognition results. The main contents of this thesis are as follows: First, we select calm, happy, surprise, sad and anger five typical emotions through studyand understanding. In laboratory environment, we record the Chinese speech signal and thespeaker’s facial expression for specific sample statement, establish the emotional database.Second, we preprocess the speech signals of emotional database and extract the prosodicfeature parameters for different emotions. This paper select the pronunciation duration, speechrate, amplitude average, amplitude range, pitch average, pitch range, pitch rate, formantaverage, formant range and formant rate as the ten prosodic feature parameters for emotionalrecognition experiments. We use the PCA method to recognize the speech emotion. Theaverage recognition rate of experiment is84.4%.Third, we process the facial expression information. Obtain the valid information bypre-processing, such as face detection and location, light compensation, normalization,gray-scale, Gaussian smoothing, histogram equalization. Extract the geometric features formedthe feature vectors and compared with the sample in the expression template library establishedby training. Then judge the emotional category of the expression image.Fourth, we study the fusion algorithm, build the bimodal emotional recognition system,fuses the speech and facial expressions information in the emotional recognition experiment.Finally, we compare and analysis the experiment data of single-mode and the bimodal.The results show that the emotional recognition rate with bimodal fusion is about6percentpoints higher than recognition rate with only speech prosodic features. The bimodal emotionalrecognition effectively fuses speech prosodic feature and facial expression, and improved therecognition rate.
Keywords/Search Tags:emotional database, speech, facial expression, emotional characteristics, emotional recognition, fusion algorithm
PDF Full Text Request
Related items