Research On The Speech Emotion Analysis Model For English Short Essay Reading

Posted on:2023-08-26

Degree:Master

Type:Thesis

Country:China

Candidate:Q P Chen

Full Text:PDF

GTID:2555306836964019

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As one of the most fundamental characteristics that distinguishes intelligent life forms from other life forms,sentiment is an integral part of daily conversations.In this thesis,sentiment analysis is applied to English teaching to make English learners better able to read English aloud.Sentiment analysis model can be generally divided into two categories:unimodal and multimodal.Research in unimodal sentiment uses only raw audio signals or text,whereas research in multimodal sentiment leverages both audio signals and lexical information,and in some cases,visual information.Speech emotion analysis is a difficult task due to the complexity of emotions.Its performances are heavily dependent on the effectiveness of emotional features extracted from the speech.In this thesis,we proposed the dual attention-based bidirectional long short-term memory networks(DABLSTM),which can take advantage of the strengths of raw audio signals,in which extracts log mel-spectrograms and MFCCs from audio simultaneously.Experiments on the IEMOCAP databases show the advantage of our proposed approach.The average recognition accuracy of our method is 70.29% in unweighted accuracy(UA)and the corresponding performance improvements are 1.06 compared to the best baseline methods.The weighted accuracy(WA)was 70.98%,which was 2.88% higher than the existing methods.In multimodal sentiment analysis,existing models usually perform forced word-alignment before the neural network training to settle the issue of unaligned multimodal sequential data.Without forced word-alignment,this thesis designs the Cross-modal Attention Mechanism with Sentiment Prediction Auxiliary Task(CAM-SPAT)model.The core of the CAM-SPAT is weighted cross-modal attention mechanism,which not only captures the temporal correlation information and the spatial dependence information of each modality,but dynamically adjust the weight of text modality and other modalities to better recognize different emotional expressions.Our model gets a new state-of-the-art record on the CMU-MOSI dataset and brings noticeable performance improvements on all the metrics.For the CMU-MOSEI dataset,the 7-class task and the regression task of our model are still the highest among all models and the proposed model is only lower than the DISRFN model with aligned data on the accuracy the F1 score of the binary classification,showing the great performance of the suggested method.Meanwhile,the overall performance of the model is evaluated in the dataset of English learners’ reading pronunciation collected from our school,and satisfactory results are obtained.

Keywords/Search Tags:

Affective computing, Speech emotion analysis, Multimodal sentiment analysis, Deep learning, Feature fusion

PDF Full Text Request

Related items

1	Research On Emotion Analysis In Composition Based On Deep Learning
2	Music Emotion Representation Learning And Application Based On Multi-source Data Fusion
3	Research On Tibetan Emotion Analysis Method Based On Deep Learning
4	Research On Key Issues Of Multi-granularity Sentiment Analysis Of English Text
5	Studies On Physiological Signals Based Emotion Recognition
6	Sentiment Analysis Of Maoyan Movie Reviews Based On Deep Learning
7	Research On Sentiment Analysis Model Of Movie Reviews Based On Further Pre-training And Feature Fusion
8	A Study On The Processing Of Affective Meaning In Interpreting From The Perspective Of Multimodal Discourse Analysis
9	Multimodal Sentiment Analysis And Intelligent Music Generation Method Of Research
10	Research On Emotion Recognition Method Of Dance Movement Based On Deep Learning