Font Size: a A A

Fine-grained Text Classification

Posted on:2022-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:F ZhaoFull Text:PDF
GTID:2518306725481534Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Text classification is essential for document archiving,retrieval,analysis,and mining.The traditional text classification aims to predict the labels of a document,and most of them belong to the text classification task of single-text single-label.Although achieving remarkable performance,with the ever-increasing user needs and the everchanging application scenarios,the development of traditional text classification still suffers from two limitations.First,from the perspective of text granularity,the traditional text classification is to classify the whole document,which is a coarse-grained text classification task.However,with the ever-changing application scenarios,coarse-grained text classification research already can’t meet people’s needs.Take a real scenario as an example,there are a large number of product reviews on an internet platform,consumer needs to get its concerns about the analysis of the target commodity attribute,while manufacturers need to know from the comments its product quality and customer tastes.These needs promote the rapid development of fine-grained aspect-level text classification tasks.Second,from the perspective of label granularity,the traditional text classification mostly needs to predict a label of the document,which belongs to a single-label text classification task.However,with the ever-increasing number of documents and labels,the label granularity becomes finer and finer,and there are both explicit hierarchical relationships and implicit internal associations between labels.These needs promote the rapid development of fine-grained hierarchical multi-label text classification tasks.From the perspective of text granularity and label granularity,this thesis mainly focuses on two fine-grained text classification tasks: Aspect-based Sentiment Analysis(ABSA)and Hierarchical Text Classification(HTC).The former aims to detect the sentiment polarity of a given aspect term(as known as opinion target)in a review sentence,while the latter aims to predict the category of a document from top to bottom in a given label hierarchy.Recently,researchers have proposed many supervised learning methods to address these two tasks.For the ABSA task,abundant state-of-the-art ABSA works employ attention mechanism to capture the corresponding sentiment words of the opinion target,then aggregate them as evidence to infer the sentiment of the target.Though achieving promising performance,we argue that it fails to reach the full potential due to the limited ABSA labeled data.For the HTC task,existing HTC methods mainly leverage the parent-level label information to guide the child-level classification and achieve promising results.However,they primarily focus on the performance improvement brought by the parent-level label,and ignore the error propagation problem arising from the incorrect parent-level label in real-world scenarios.In order to solve the above problems,this thesis proposes the following solutions:1.Insufficient labeled data limits the effectiveness of attention-based models for the ABSA task.In contrast,there is a large amount of document-level sentiment classification reviews on the Internet,and these reviews contain substantial sentiment knowledge and semantic patterns.Therefore,this thesis proposes a novel attention transfer framework(ATN),which aims to exploit attention knowledge from resource-rich document-level sentiment classification corpus to enhance the attention process of resource-poor aspect-based sentiment analysis,and finally achieving the goal of improving the performance of ABSA.2.Existing HTC methods ignore the error propagation problem caused by incorrect parent-label labels in real scenes,so this thesis proposes a Label-correction Capsule Network(LCN)for hierarchical text classification task.Specifically,we take the hierarchical capsule network as the basic model.On this basis,we design two novel methods to enhance the tolerance of the model to the incorrect parent-level label,and finally achieving the goal of improving the robustness of HTC.This thesis focuses on the fine-grained text classification task.The model is proposed based on reasonable research motivation and achieves good results on benchmark datasets,which proves the effectiveness of the proposed model.
Keywords/Search Tags:Text Classification, Natural Language Processing, Neural Network, Attention Mechanism, Capsule Network
PDF Full Text Request
Related items