| In recent years,emotion recognition in dialogue text has gradually attracted the attention of researchers as a key task in the study of emotional intelligence.The dialogue emotion task aims to identify the emotion label of each utterance based on historical discourse information,dialogue participant information,and external common sense knowledge in multi-turn conversations.In order to improve the performance of dialogue emotion recognition models,this paper conducts in-depth research on dialogue emotion recognition methods from three perspectives:more effective modeling of dialogue historical context,discourse structure,and common sense knowledge enhancement and emotion dynamics modeling,using techniques such as graph neural network models,knowledge graphs,and multi-task joint learning.The main research content and innovative points are as follows:First,in order to model the dialogue historical context more effectively,a method based on temporal and relational graph attention networks is proposed to model the dialogue context.It combines the advantages of the two mainstream dialogue context modeling methods,namely temporal modeling and graph structure modeling,and models the dialogue as a temporal graph structure.This better simulates the internal structure and information flow in the conversation,aggregates more meaningful historical context representations for utterances,and achieves better emotion recognition.Experimental results show that our proposed temporal graph structure modeling method is superior to temporal modeling and traditional static graph structure modeling methods.Second,the discourse structure of the dialogue reveals the adjacent or distant dependencies between utterances,providing prior knowledge for semantic interaction between utterances.In addition,to model deeper emotional interactions,we introduce the common sense knowledge of the speaker and listener implied in the utterances as clues for emotion inference to enhance the representation of dialogue utterances.Specifically,we construct a graph structure based on the dialogue,enhanced by discourse structure and common sense knowledge,and use graph convolutional networks to aggregate historical context information and common sense knowledge information for utterances.Experimental results show that incorporating discourse structure and utterance common sense knowledge can effectively improve the performance of the model.Finally,the emotions conveyed in dialogues exhibit dynamism.On one hand,the emotions of the speaker possess a certain inertia.To some extent,the speaker is unlikely to change their emotions easily during the course of the conversation.On the other hand,emotions are influenced by both the content of the utterances and the speaker’s influence.As the topics and the speaker’s state change,the expressed emotions in the dialogue also undergo changes.In this paper,we propose an emotionenhanced encoder-decoder framework.During the encoding stage,we explicitly track the emotional flow of individual speakers and the entire conversation.During the decoding stage,we propose a novel emotion decoding approach that considers not only the emotions of preceding utterances but also utilizes the predicted emotion transition probabilities from auxiliary tasks to guide emotion recognition.We employ several mainstream dialogue context modeling methods to demonstrate the high scalability and flexibility of our proposed approach. |