Font Size: a A A

Local Feature Analysis,Extraction,and Model Validation For Speech Emotion Recognition

Posted on:2019-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:H T GuanFull Text:PDF
GTID:2428330593951048Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence and the technology of human-computer interaction in recent years,increasing attention has been directed to the study of the affective computing.As one of the most direct way of human communication,speech plays a major role in the expression of emotion.Speech emotion recognition is the study to recognize the emotional states from the speech of human with the help of knowledge like speech signal processing,pattern recognition,machine learning,and so on,which is helpful for us to get a deeper understanding about the production and perception of human's emotion,and to improve the computers' capabilities of harmonious human computer interaction.Speech emotional information is usually characterized by its dynamic changes.However,in the traditional research on speech emotion recognition,global acoustic features of an utterance are usually adopted to eliminate the content differences and reduce the number of features,which may disregard some local dynamic information of emotion in speech.In addition,non-sequential models such as SVM and DNN are commonly used,which fail to model the sequential information directly.Pitch,as a kind of prosodic features,conveys significant emotion related information,and has been found discriminative across different emotions,to some extent.And the histogram can reflect the distribution of the values to a certain degree.Therefore,a novel pitch histogram feature is proposed as a local dynamic prosodic feature by combining the pitch and the histogram in order to capture the distribution of the pitch.Bidirectional-LSTM(BLSTM),which is a kind of sequential model,can take advantage of the information of both past and future,and therefore can improve the classification accuracy.In this paper,speech emotion recognition based on dynamic segmentation and dynamic model are studied.In the feature level,time and energy based segmentation and pitch histogram are utilized to capture the temporal information of the emotional speech.Several comparative experiments have been conducted to validate the effectiveness of the proposed method.In the model level,k-means-and BLSTM-based approaches are respectively proposed by improving the existing one.The experimental results suggest that the utilization of dynamic model can improve the recognition result,which also prove the existence and importance of the dynamic information in the emotional speech.
Keywords/Search Tags:Speech emotion recognition, Local features, Dynamic model, Prosodic features, Speech segmentation
PDF Full Text Request
Related items