Speech emotion recognition is an important and meaningful research topic in the area of affective computing, and has the potential to improve the computer intelligence. Also, research on speech emotion recognition helps create new interactive environments, and advance the speech signal processing and multimedia interactive technology. Currently, researchers have conducted considerable work in this filed, ranging from feature extraction and feature selection to speech emotion recognition. Due to the limited obtained performance in actual use, this dissertation focuses on extracting useful features from the raw speech signal, selecting discriminant features, and further exploring speaker-independent and speaker-dependent emotion recognition methods. Its main contributions include the following points.First, we constructed the elderly speech emotion database. Since there is no speech emotion database, to the best of our knowledge, regarding to Chinese seniors, we propose to construct a speech emotion database and describe the construction process in detail. Furthermore, we carried out speech emotion recognition on this database.Second, we presented a Fourier parameter model on the basis of Fourier series and applied it to speech emotion recognition. The effectiveness of Fourier parameter features are evaluated in two schemes:speaker-independent and speaker-dependent. Our experimental results show that the Fourier parameter features, especially the dynamic ones, work well in characterizing the speech signals and contribute to the enhancement of recognition rate. Specifically, in comparison with MFCC and FEZ features, the Fourier parameter features achieve better speech emotion recognition performance in the speaker-independent scheme, and the fusion of the Fourier parameter features and MFCC can further improve the recognition rate. These demonstrate that the Fourier parameter model is useful for speaker-independent emotion recognition, and show the power of the Fourier parameter features in emotion recognition.Then, we propose a feature extraction method based on wavelet packet and apply it to speaker-independent emotion recognition. Compared with the traditional features, the use of wave packet coefficient features can obtain better recognition rate in the case of speaker-independent recognition task. Also, analysis verifies the effectiveness of the proposed wavelet packet coefficient model in speech emotion recognition.In speech emotion recognition, identifying the discriminant and eliminating irrelevant and redundant features from the original speech signals largely determines the recognition performance. In this study, we use the orthogonal experiment design, and explore the sequential floating forward selection and harmony search methods in speaker-dependent and speaker-independent emotion recognition. Experimental results show that the proposed feature selection method not only greatly reduces the feature dimensionality, but also achieves comparable recognition rate.Finally, we used the artificial neural network, Gaussian mixture model, and support vector machine to construct the speech emotion recognition model. Considerable experimental results show the use of support vector machine outperforms the other two classification models in terms of the recognition rate. |