Font Size: a A A

Research On Texture Feature Extraction Of Spectrogram Image For Speech Emotion Recognition

Posted on:2019-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LiuFull Text:PDF
GTID:2428330548485902Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
At present,the commonly used speech emotion features mainly include quality features,metrics features and spectral features,which all focused on the time domain or frequency domain of the voice,but rarely considered the time-frequency related features,making the extracted features incomplete.It is possible to study the time-frequency correlation of speech when the speech spectrum can be well connected.Based on this,this thesis studies the method of extracting texture feature from speech spectrums,as the following two aspects shows:1)In order to deal with the problem that the original Complete Local Binary Pattern(CLBP)feature has high dimension and relies too much on the central pixel in the absence of the central pixel point,this thesis builds the Uniform CLBP_Sign(UCLBP_S)and the Improved CLBP_Magnitude(ICLBP_M).At the same time,this thesis proposes a power exponential weighted fusion method based on the problem that classic decision-level weighted voting fusion method cannot play its part when the classifiers' recognition performances are generally the same.Firstly this thesis transforms the original speech sample into a spectrum diagram,then use the multi-scale,multi-directional log-gabor filter to deal with the language spectrum,in order to enlarge the detail information of the image.Then the block histogram features of the UCLBP_S feature and ICLBP_M feature are extracted and cascaded as a new fusion feature called ICLBP_S_M.Finally,based on SVM,these three features are weighted by the decision level in order to achieve the speech emotion recognition.2)The More Direction Weber Local Descriptor(MDWLD)operator is constructed for the deficiency of the Weber Local Descriptor(WLD)operator in the representation of the gradient change information of the diagonal direction.At the same time,the Complete Gradient Center-Symmetric Local Directional Pattern(CGCS-LDP)is constructed for the amplitude information that the Gradient Center-Symmetric Local Directional Pattern(GCS-LDP)cannot represent the change of the edge response value of the image gradient.In order to make up for the lack of a single texture feature representing of image texture information,on the basis of the acquisition of the log-gabor spectrum,this thesis extracts ICLBP_S_M features,MDWLD feature and fusion feature called CGCS-LDP,and the decision-making level of the three characteristics of decision level fusion,and these three features are weighted by the decision level in order to achieve the speech emotion recognition.The experimental results show that this method can enhance the performance of speech emotion recognition effectively.
Keywords/Search Tags:Speech Emotion Recognition, Improved Complete Local Binary Pattern, Power Exponential Class Weighted Fusion, More Direction Weber Local Descriptor, Complete Gradient Center-Symmetric Local Directional Pattern
PDF Full Text Request
Related items