Font Size: a A A

Research On Several Issues In Speech Emotion Recognition Based On Spectrum Image Features

Posted on:2018-10-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:H W TaoFull Text:PDF
GTID:1318330542951425Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In order to make the human-computer interaction more natural and intelligent, more and more scholars pay attention to the research on speech emotion recognition. In recent decades,the research on speech emotion recognition has made great progress, and the performance of speech emotion recognition system has been greatly improved. However, the existing speech emotion recognition system still lacks some features which can accurately recognize different types of speech emotion. Therefore, the study on extracting feature related to speech emotion is still the focus of research in the speech emotion recognition. The emotional content in speech is closely related to the distribution of spectrum energy. As some scholars use the frequency coefficients of speech to construct spectrum image, and use the image descriptor to extract features related to speech emotion from the spectrum image, some achievements have been made. Since the method has just arisen in emotion recognition field, there are still many problems needed to be studied. One important problem is what kind of information in the spectrum image is related to the types of speech emotion. Another important problem is how to effectively extract the information related to speech emotion from the spectrum image. To solve these problems, the dissertation has carried out research on emotion recognition based on spectrum image features, which is based on the close relationship between the emotion information and texture information and local energy distribution information of spectrum image. Related research works are as follows:1. Based on the close relationship between the emotion types of speech and the texture distribution of speech spectrum image, a new feature extraction algorithm called Local Binary Pattern of Gabor Grayimage Spectrograms, denoted by GGSLBP, is proposed in this dissertation. First of all, the spectrogram gray image is constructed. Secondly, the local texture information of spectrogram gray image is highlighted using Gabor wavelet so that Gabor gray image spectrograms are obtained. Finally, LBP (Local Binary Pattern) is used to extract local texture information from Gabor gray image spectrograms to obtain the GGSLBP features. The simulation experiments show that GGSLBP has better recognition performance than traditional acoustic features.2. In view of the problems that LBP ignores the amplitude information in the spectrum image and GGSLBP has higher complexity, a new feature extraction algorithm called improved discriminative completed local binary pattern for speech emotion recognition,denoted by IDisCLBP_SER,is proposed. Firstly, the spectrogram gray image is obtained.Secondly, CLBP_M and CLBP_S are obtained through CLBP (Completed Local Binary Pattern) algorithm. Thirdly, different from the traditional DisCLBP (Discriminative Completed Local Binary Pattern) algorithm, IDisCLBP SER canceled the CLBP S and CLBP_M rotation invariant mapping processing, and discriminative feature learning model is directly used to calculate the CLBP_S and CLBP_M global dominate pattern sets. Finally,global dominate pattern sets are used to process CLBP_S and CLBP_M features, all processed features are joint and the IDisCLBP SER features are gained. Experimental results show that the proposed features can enhance the recognition performance of the speech emotion recognition system after fusing with existing acoustic features.3. To explore whether rotational invariance is suitable for Mel logarithmic energy spectrum image features, new spectrum image features based on local normalized center moments, denoted by LNCMSIF, are proposed. The proposed LNCMSIF firstly adopts 2nd order normalized center moments to describe local energy distribution of the Mel logarithmic energy spectrum, then normalized center moment spectrums are gained. Secondly, DCT(Discrete Cosine Transform) is used to eliminate the correlation of normalized center moment spectrums, then cepstral coefficients of normalized center moment spectrums are obtained.Finally, LNCMSIF is generated by combining normalized center moment spectrums and cepstral coefficients of normalized center moment spectrums. The rotational invariance test experiment shows that rotational invariance is not entirely suitable for features extracted from Mel logarithmic energy spectrum. The recognition experiment shows that the proposed LNCMSIF can achieve better recognition resluts.4. The representation ability of traditional image descriptors is limited, which cannot fully describe the emotional information in Mel logarithmic energy spectrum. Two new features called spectrum image features based on local Hu moments of Gabor spectrograms,denoted by GSLHuM, and spectrum image features based on local normalized center moments of Gabor spectrograms, denoted by GSLNCM, are proposed in this dissertation. In GSLHuM, Gabor wavelet is used to processed Mel logarithmic energy spectrum and Gabor spectrograms are gained,then the 1st-order Hu moment is used to describe local energy distribution of Gabor spectrograms. Finally, DCT is used to eliminate the correlation, then GSLHuM are obtained. The GSLNCM extraction process is similar as GSLHuM except that GSLNCM adopts the 2nd-order normalized center moments to describe local energy distribution of Gabor spectrograms. Simulation experiments verify the effectiveness of the proposed GSLHuM and GSLNCM features. In addition, compared with the Mel logarithmic energy spectrum image features,rotation invariance has a weak influence on Gabor spectrum image features.
Keywords/Search Tags:Speech Emotion Recognition, Spectrum Image Features, Gabor Wavelet, Local Binary Pattern, Discriminative Feature Learning Model, Hu Invariant Moments, Normalized Center Moments
PDF Full Text Request
Related items