Font Size: a A A

Research On Multi-label Text Classification By Integrating Label Informatio

Posted on:2024-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:X Y TianFull Text:PDF
GTID:2568307130472684Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Multi-label text refers to one text instance that contains more than one topic label.The task of multi-label text classification is to use computer technology to dig out the semantic information related to labels from unstructured text data and then assign corresponding labels to them.As a low-level task in the field of text mining,multilabel text classification can provide support for many downstream tasks,such as sentiment analysis and information retrieval.However,the semantic information of multi-label text data is complicated and the label space is huge due to the association of multiple labels at the same time,thus there are some problems such as difficulty in feature extraction,unbalanced category and sparse sample,which will lead to unsatisfactory feature extraction effect.In addition,in the process of tagging multilabel text data,taggers may ignore labels they do not know or are not interested in,or follow some tagging algorithms to automatically tag,thus causing the problem of missing labels.Therefore,this thesis makes full use of label semantic information,label co-occurrence relations and other correlation information.On this basis,three multi-label text classification methods combining label semantic information,label co-occurrence relation and label dependency constraint are proposed respectively.The main research work and achievements include:(1)Aiming at the problems of complex semantics and difficult feature extraction in multi-label data,this paper uses label semantic information to assist textual feature extraction,and proposes a deep modular label attention network which is composed of label attention network.For the label attention network,a bidirectional label attention unit and a self-attention unit are constructed to establish the semantic connection between text and label,so as to obtain the bidirectional dependence representation of label and text.The performance of the model is verified by comparing the model with the existing algorithm on the open dataset.(2)Aiming at the restriction on feature learning effect caused by unbalanced categories and sparse samples,this paper further fused the co-occurrence relationship between labels on the basis of obtaining the semantic information associated with labels in the text by using the semantic information of labels.In this process,the cooccurrence weight among labels is learned adaptively based on the graph attention mechanism,so as to realize the feature interaction between labels and obtain deeper semantic features.Experimental results show that this method further improves the classification performance.(3)Aiming at the situation of missing labels,the feature representation matrix of instance and labels is obtained based on matrix factorization.While using label correlation and instance correlation to establish regular constraints,the correlation between instances and label feature representation is established for the first time and used to establish regular constraints,so as to restore the missing label.We validate the performance on three datasets and the proposed algorithm is superior to the existing methods.
Keywords/Search Tags:Label attention, Label semantic information, Label co-occurrence relation, Matrix decompositio
PDF Full Text Request
Related items