Font Size: a A A

Spectral Energy Features Based On Human Auditory Perception For Emotional Speech Recognition

Posted on:2013-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:J M YinFull Text:PDF
GTID:2248330371490438Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Emotional speech recognition, as an important branch of signal processing technology, it inherits the characteristics of the traditional speech signal processing technology, and also formed a new research hotspot crossing with human psychology, phonetics, acoustics and many other disciplines. The so-called emotional speech recognition is to enable the computer to be able to correctly identify the emotional state of the voice input. With the rapid development of computer science and communication technology, the emotional speech recognition in human-computer intelligent interaction has important theoretical significance and application prospects.The main content of this study is to extract the spectral energy features based on human auditory perception, and on this basis, several optimization improvements are conducted. In the experiments, TYUT emotional speech database and the Berlin emotional speech database EMO-DB are used. These two databases include three emotional states (Happiness, Anger, Neutral) and three languages (Chinese, English, German). Support vector machine is used as the recognition model. The classical features such as LPC, LPCC, MFCC and ZCPA are firstly introduced, and their characteristics are compared. Then the emotional recognition experiments are designed. All the experiment results are preserved as references for the followed research. Next the basic spectral energy features are researched:AUSEES feature and AUSEEG feature. The AUSEES feature and AUSEEG feature divided the entire spectral range with a linear scale, but this kind of method does not conform to the human auditory perception. So Bark scale and ERB scale based on the human auditory perception are introduced for dividing spectral sub-bands. These two methods are used to improve the original AUSEES, AUSEEG features. The improved features are named AUSEES-Bark, AUSEEG-Bark and AUSEES-ERB, AUSEEG-ERB, and they are used in emotional speech recognition experiments. The experiment results showed that the recognition rate of the improved features is significantly higher than the original features. The new features can represent the characteristic of different emotional states effectively. Especially the new features based on Bark scale frequency band division have got the highest recognition rate and the best stability.Then the follow-up research work make AUSEES-Bark and AUSEEG-Bark as the main objects to propose two new kind of improved methods. Firstly, use LPCC parameter’s advantage which is mainly reflecting the channel response to optimize AUSEEG-Bark feature; Then use the Teager Energy Operator on the role of moving energy in different frequency bands to improve AUSEES-Bark、 AUSEEG-Bark features. The experiment results show that these two improvements are feasible and efficiently, the new improved features have got better emotional classification results. Especially the spectral energy features based on Teager Energy Operator got the more satisfactory emotional recognition effect.
Keywords/Search Tags:emotional speech recognition, spectral energy features, humanauditory perception, support vector machine, TEO
PDF Full Text Request
Related items