Font Size: a A A

Research On Multi-label Text Classification Based On Text And Label Representation Optimization

Posted on:2021-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LuoFull Text:PDF
GTID:2428330647450745Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Multi-label text classification has always been one of the research hotspots in the field of text research,and it has a great many of applications in many research fields,such as text retrieval systems,recommendation systems,sentiment analysis,and dialogue systems.In these fields,multi-label text classification has the following two characteristics relative to the traditional text classification problem:(1)Label information has a great influence on the representation of text features,and considering label information is essential for extracting useful text features;(2)The interdependence between labels,how to solve the correlation problem between labels is also a big difficulty.Compared to traditional machine learning methods,deep learning is more effective in the automatic extraction and expression of text semantic features,and deep learning models have stronger capabilities in the representation of label information and the representation of the relationship between labels.Hence,for solving the problems mentioned above,this thesis proposes two multi-label text classification models based on deep learning.The main work of this thesis is as follows:(1)A brief overview of the related work involved in the thesis research.First,the traditional text representation model is introduced,then the word embedding representation model in the deep learning model is introduced,and then several classic traditional machine learning classification methods and deep learning multi-label classification methods and various variants are introduced respectively.Finally,the advantages and disadvantages of various models and the advantages of deep learning models over traditional machine learning models are also analyzed.(2)Since the traditional multi-label classification method does not fully consider label information and correlation between labels,this paper proposes a multi-labeltext classification model based on the attention mechanism of label information called LSABN.The LSABN model uses inner product attention or parameter splicing attention mechanism based on label information to optimize the text feature vector representation,and learns different text feature representations for each label.In addition,the model iteratively optimizes the label embedding representation through the label relation directed graph,and introduces regularization terms to take into account the correlation between the labels to improve the classification effect.The experimental results verify the superiority of the model relative to the benchmark model.(3)Aiming at the problem of poor semantic representation at the semantic level of sentence structure and insufficient learning of label embedding representation,this paper proposes a multi-label classification model based on mixed semantics and graph attention mechanism called HSGAT.The HSGAT model solves the problem that traditional neural networks are insensitive to word position information in sentences by introducing capsule networks into multi-label text classification.In addition,the model considers using the graph attention mechanism to iteratively optimize the label embedding representation.The optimized label embedding representation is used for classification to greatly alleviate the problem of co-occurrence in the label relationship graph and non-co-occurrence in the prediction results.The experimental results verify the superiority of the model relative to the benchmark model.
Keywords/Search Tags:multi-label text classification, attention mechanism, capsule network, graph attention mechanism
PDF Full Text Request
Related items