Font Size: a A A

Research On Label-aware Text Classification Methods

Posted on:2022-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:X HuangFull Text:PDF
GTID:2518306563980219Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text classification is a classical problem in natural language processing(NLP),which aims to assign labels or tags to textual units such as sentences,queries and documents.This task can be divided into multi-class text classification and multi-label text classification where the latter allows for the co-existence of more than one labels for a single document.Text classification has a wide range of applications including question answering,spam detection,sentiment analysis,news categorization and so on.The heart of the matter lies in the quality of representation learned from the input document.A good representation should focus on learning global contextual information as well as local discriminative features since the former provides general information for coarse-grained matching while the latter offers specific clues for fine-grained recognition,both are important for classification.However,most existing methods focus on determining a single representation for one input document,which is hard to sufficiently preserve the essential content and benefit the subsequent learning task.In addition,the multiple labels are usually correlated semantically,and it is beneficial for the multi-label learning process to exploit the correlation among different labels.In this paper,we propose two methods to improve the performance of text classification by exploiting both document content and label correlation.The main work and contributions are as follows:Firstly,we propose an explicit label-aware representation for each document with a hybrid attention deep neural network model(LAHA).LAHA consists of three parts.The first part adopts a multi-label self-attention mechanism to detect the contribution of each word to labels.The second part exploits the label structure and document content to determine the semantic connection between words and labels in a same latent space.An adaptive fusion strategy is designed in the third part to obtain the final label-aware document representation so that the essence of previous two parts can be sufficiently integrated.Extensive experiments have been conducted on six benchmark datasets by comparing with the state-of-the-art methods.The results show the superiority of our proposed LAHA method.Secondly,we propose a label-aware comprehensive representation learning method(La CRL).For text classification,both global contextual information and local discriminative features are important since the former provides general information for coarse-grained matching while the latter offers specific clues for fine-grained recognition.However,most existing methods focus on determining a single representation for one input document,which is hard to sufficiently preserve the essential content and benefit the subsequent learning task.La CRL aims to simultaneously capture the coarse-grained and fine-grained information with a well-designed joint optimization strategy.Specifically,the global and local representation learning are jointly optimized so that they can be seamlessly affected by each other.Extensive experimental results on well-known benchmark datasets have shown the efficacy of our approach by comparing with the stateof-the-art methods.
Keywords/Search Tags:Text Classification, Deep Neural Network, Attention Mechanism, Text Representation
PDF Full Text Request
Related items