Font Size: a A A

Speech Based Identification And Emotion Information Extraction And Its Application In Pervasive Computing

Posted on:2008-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:C WangFull Text:PDF
GTID:2178360242966108Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
This paper mainly studies the speaker recognition and emotion expression based on the character of ubiquitous service in pervasive computing.The speaker recognition needs to take into account the efficiency and accuracy of recognition but not only the accuracy because of the requirement of time limitation on the real-time monitoring system. Therefore we need to improve the system's operating speed, and mean while to keep the accuracy of recognition. Here we mainly improve the feature extraction and classification algorithm for the system, and then we make some improvement on MFCC feature extraction and propose a quick MFCC algorithm. The proposed algorithm can reach the requirement of real-time system in case of the high precision. To prove it, this paper compares its algorithm with LPC and FFT based on Euclidean Distance classification method. The experiment indicates that the EER of LPC is 14.3% and the EER of FFT is 11.4%, but by using the Quick MFCC the EER is only 4.3%, and the run time of the system is about 4.0s that meet the real-time requirements.Then based on the quick MFCC we use differential MFCC to compare with others relying on the VQ fuse with GMM classification method. The experiment indicates that the EER of LPC is 14.4% and the EER of FFT is 12.5% and the EER of quick MFCC is 9.4% and the EER of differential MFCC is 6.9%.At last, we compare all the classification methods in this paper with the feature extraction algorithm of differential MFCC. Then the EER of Euclidean Distance method is 15% and the EER of VQ is 11.2% and the EER of GMM is 4.4% the EER of VQ fuse with GMM is 6.9%, although the GMM method can get best accuracy of recognition, the run time of it is about 6.0s not as good as the final method which only use about 4.5s to get the result.As to the emotion expression, we mainly use some pitch processing methods to decide the speaker's emotion. Then we use the two methods for the e_Learning system, which can be seen as a ubiquitous service that has the character of "Anytime, Anywhere, Invisible".
Keywords/Search Tags:MFCC, LPC, FFT, VQ, GMM
PDF Full Text Request
Related items