Font Size: a A A

A Comparative Study Of Speech And Text In Emotion Recognition

Posted on:2020-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:T T HuFull Text:PDF
GTID:2438330578977077Subject:Education Technology
Abstract/Summary:PDF Full Text Request
Emotion plays an important role in human communication.Therefore,emotion recognition which makes it important for machines to perceive and recognize emotions is an important research topic in human-computer interaction.In the study of emotion recognition,researchers use different models of information,different features,and different classifiers to identify emotions to achieve different recognition results.According to our daily life experience,voice information and text content information contain rich emotional information.Voice and text are the two most commonly used models in emotion recognition,but according to previous research and analysis,two models has different performances in emotion recognition.Based on these,the study attempts to analyze the different manifestations of speech and text in emotion recognition.The research goals of this research mainly include the following two points:1)Using speech and text for emotion recognition,comparing the different expressions of speech and text in emotion recognition,and analyzing the tendency of each of the speech and text in the emotion recognition to the specified emotion.2)The feature analysis is carried out at the feature level to further fundamentally analyze the causes of the different expressions of speech and text in emotion recognition,and select the important emotional features contained in the two models to provide feature information for subsequent research.The research experiment is divided into two parts:1)speech and text emotion recognition experiments.The emotion recognition model is trained by using speech and text features respectively.Each type of emotion recognition result is visualized by using the confusion matrix,and the different effects of speech and text on each type of emotion recognition are compared and analyzed.2)Speech and text sentiment feature analysis experiments.The LSTM based on the attention mechanism is used as the feature selection method,and the features are selected according to the attention matrix.The important speech acoustic features and important text emotional keywords were selected respectively,and the emotional information contained in the two models was analyzed from the features,and the cause of the emotional recognition results in the previous experiment was analyzed.Through the analysis of the experimental results,it is found that the speech and text emotion recognition results are compared from three perspectives in experiment 1:1)in the discrete emotion model,the speech has high accuracy of anger and sadness.The text is better for neutral and happiness in emotion recognition.2)In the dimensional emotion model,the emotion recognition effect of the speech is better in the activation dimension,and the emotion recognition effect of the text in the valence dimension is better.3)In the natural and non-natural state,the samples are divided into natural and non-natural groups,and the experimental comparison shows that the speech recognition rate is higher in the natural state,and the text emotion recognition rate is higher in the non-natural state.The feature analysis in Experiment 2 finds out:1)According to the attention weights,the important acoustic features of speech are found.The acoustic features such as FO fundamental frequency,F2 bandwidth and MFCC play an important role in emotion recognition.2)Sorting the text keyword features according to attention weights,some words that contain emotions,or adjectives that modify emotions or interjections and modal particles that appear in emotional state play an important role in text emotion recognition.The conclusion of the study:speech and text do contain a large amount of emotional information,which can effectively recognize emotion.The different forms and functions of emotional information contained in speech and text have different tendencies in emotion recognition.The conclusions of this study can effectively explain why the emotional recognition rate increases when speech and text information are combined.In addition,important speech acoustic emotional features and text keyword features are found,which have important reference significance for feature selection in subsequent research.The innovations of this paper are as follows:1)By using the confusion matrix to visually analyze the recognition results,the difference between speech and text in emotion recognition is compared.2)Comparing the performance of speech and text recognition from different perspectives,from the perspective of discrete emotion model,dimensional emotion model,natural and unnatural state,the performance of speech and text emotion recognition is analyzed in detail..3)The LSTM based on attention mechanism is used as the feature selection method.The common attention mechanism is used to select the important part of the segment.This paper introduces it as the feature selection method,selects and sorts the features according to the attention matrix,and combines them.Previous emotional recognition performance was analyzed for feature selection results.
Keywords/Search Tags:speech emotion recognition, text emotion recognition, feature selection, deep learning, attention mechanism
PDF Full Text Request
Related items