Font Size: a A A

Research On Multi-label Text Classification Based On Deep Learning

Posted on:2022-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y TengFull Text:PDF
GTID:2518306536496704Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Text classification has become a basic direction of natural language processing because it can quickly and accurately obtain the core content of text information in a limited time.However,the traditional single label text classification is difficult to solve the problem of text semantic diversity in the real scene.Therefore,multi label text classification has gradually become a hot research direction in the text classification task of natural language processing.In this paper,the multi-label text classification task is studied.In order to fully capture the semantic information of the text,the hierarchical transformer CNN model is constructed.In order to learn the long-distance and discontinuous semantic features of the text,the text modeling is realized through the graph structure,and the hierarchical transformer is used for feature extraction.In order to alleviate the unbalanced distribution of label categories,the traditional loss function is modified In order to improve the classification performance of the model.Firstly,according to the expression characteristics of natural language,a hierarchical transformer CNN model is constructed to capture the semantic information of different levels of the text at the word and sentence levels respectively,and the sentence convolution neural network is used to extract the key semantic features.Secondly,in order to capture the long-distance and discontinuous semantic features of the text,a graph based text modeling method is proposed.The hierarchical graph transformer model is used to capture the semantic features of the text at the word and subgraph levels respectively.Thirdly,the traditional loss function can not capture the correlation between tags and the insufficient training caused by uneven distribution of tags.By fusing semantic features and structural features,the vector embedding of tags is constructed,and the similarity of different tags is calculated,which is introduced into the loss function optimization model.Finally,for the hierarchical transformer CNN model,sufficient experiments are conducted on RCV1 and AAPD datasets to verify the effectiveness of the model;for the hierarchical graph transformer model,the text modeling method based on graph and the loss function training model based on label similarity are adopted on RCV1 datasets,and the proposed model and algorithm are proved by comparing with the traditional multi label text classification model.
Keywords/Search Tags:Multi-label text classification, Deep learning, Transformer, Self-attention, Graph embedding
PDF Full Text Request
Related items