Local Feature Analysis,Extraction,and Model Validation For Speech Emotion Recognition

Posted on:2019-01-13

Degree:Master

Type:Thesis

Country:China

Candidate:H T Guan

Full Text:PDF

GTID:2428330593951048

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of artificial intelligence and the technology of human-computer interaction in recent years,increasing attention has been directed to the study of the affective computing.As one of the most direct way of human communication,speech plays a major role in the expression of emotion.Speech emotion recognition is the study to recognize the emotional states from the speech of human with the help of knowledge like speech signal processing,pattern recognition,machine learning,and so on,which is helpful for us to get a deeper understanding about the production and perception of human's emotion,and to improve the computers' capabilities of harmonious human computer interaction.Speech emotional information is usually characterized by its dynamic changes.However,in the traditional research on speech emotion recognition,global acoustic features of an utterance are usually adopted to eliminate the content differences and reduce the number of features,which may disregard some local dynamic information of emotion in speech.In addition,non-sequential models such as SVM and DNN are commonly used,which fail to model the sequential information directly.Pitch,as a kind of prosodic features,conveys significant emotion related information,and has been found discriminative across different emotions,to some extent.And the histogram can reflect the distribution of the values to a certain degree.Therefore,a novel pitch histogram feature is proposed as a local dynamic prosodic feature by combining the pitch and the histogram in order to capture the distribution of the pitch.Bidirectional-LSTM(BLSTM),which is a kind of sequential model,can take advantage of the information of both past and future,and therefore can improve the classification accuracy.In this paper,speech emotion recognition based on dynamic segmentation and dynamic model are studied.In the feature level,time and energy based segmentation and pitch histogram are utilized to capture the temporal information of the emotional speech.Several comparative experiments have been conducted to validate the effectiveness of the proposed method.In the model level,k-means-and BLSTM-based approaches are respectively proposed by improving the existing one.The experimental results suggest that the utilization of dynamic model can improve the recognition result,which also prove the existence and importance of the dynamic information in the emotional speech.

Keywords/Search Tags:

Speech emotion recognition, Local features, Dynamic model, Prosodic features, Speech segmentation

PDF Full Text Request

Related items

1	Research On Key Issues Of Mandarin Speech Emotion Recognition
2	Vocal Emotion Analysis For Mandarin Speech
3	Research On Emotion Recognition Of Speech Signal Based On HMM
4	Research On Several Issues In Speech Emotion Recognition Based On Spectrum Image Features
5	Research Of Emotion Recognition Based On Combined Speech Feature
6	Analysis Of Effective Fused Features And Model Evaluation For Speech Emotion Recognition
7	Research On Speech Emotion Recognition Based On Spectrogram And Statistical Features
8	Speech Emotion Recognition Based On Features
9	The Research Concerning The Features Of Mandarin Speech
10	The Research Of Dimensional Speech Emotion Recognition Based On Neural Network And Fusion Features