Font Size: a A A

Research On Data Augmentation Method For Intention Identification

Posted on:2022-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZhaoFull Text:PDF
GTID:2518306572460254Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Intention recognition,which aims to determine the intention of a sentence fro m the user,needs to define all possible intention categories in the current question and answer field in advance,and then classifies the questions into a certain category by classification method.It can be identified as a categorization task and is a key technology in natural language understanding.Intent recognition task has been proposed for a long time,but it cannot be applied in real scenes due to the lack of annotated data,especially in the intention recognition task in specific fields.Additionally,deep learning method has shown good performance in many natural language processing tasks,but this method needs a lot of annotated data.Focusing on the research status and current problems of intention recognition,this paper presents a solution using data enhancement method to improve the accuracy of intention recognition of the model,and the accuracy performance of training with only a few data sets is close to that of training with all training sets.So in this paper,the intention recognition task of the question-and-answer customer service in the game field is taken as the entry point.After annotating the text data,deep learning model training is used to realize the intention recognition in the game field.Then,a variety of data enhancement methods are adopted to improve the accuracy performance of the model.The main research work of this paper is shown below:(1)Based on the domain characteristics,an intention system building method based on question entity information is proposed,which can realize automatic annotation of raw corpus and reduce the cost of corpus annotation.Based on this method,a relatively sufficient data set is obtained,and several deep learning models such as Text CNN,Fast Text,and Bert,Bert?WWM are used for training,and the preliminary training effect and accuracy performance are obtained.In the case of using only a small part of the training data set,the data enhancement method of the general domain is used to achieve the accuracy performance close to that of using the whole training data set.(2)For the text data enhancement method based on mixed and crossed,based on the characteristics of short text questions,the entity that has a great influence on the sentence pattern in the question is blocked,and the sentence vector of the sentence after the removal of the entity is used to cross and mix,so as to learn the sentence pattern of the question and obtain the syntactic features of all kinds of questions.Combined with the emphasis of information extraction of each layer of Bert model encoder,the sentence coding results of different layers are used for training.The method mixes the vectors of the hidden layer of sentences,which is more flexible,and also improves the performance of the model.(3)Based on the knowledge graph,a data enhancement method based on the fusion of knowledge is adopted.Its main feature is that the constructed domain knowledge graph is used to replace the adjacent entities in the usage graph of the entities in the questions.In this way,the enhanced new data has a higher quality and can improve the model effect most obviously.The method of finding adjacent entities at the same time can be carried out by rules and inference on the graph.
Keywords/Search Tags:Intention recognition, text annotation, data augmentation, deep learning, Knowledge graph
PDF Full Text Request
Related items