Research On Extreme Multi-label Text Classification Based On Label Knowledge

Posted on:2023-08-29

Degree:Master

Type:Thesis

Country:China

Candidate:T Xu

Full Text:PDF

GTID:2558307058463694

Subject:Control engineering

Abstract/Summary:

PDF Full Text Request

Multi-label text classification is one of the important tasks in the field of natural language processing.The explosive growth of text data and expensive computational cost are well recognized as challenges in the field of natural language processing.The number of class labels for multi-label text classification is gradually thousands or tens of thousands.Thus,multi-label text classification tasks with more than 1000 class labels,is called extreme multi-label text classification(XMTC)task.The key problem of XMTC task is the long tail problem.As an important external knowledge in the task,label knowledge is a potential factor to alleviate the long tail problem.Existing techniques cannot easily scale to XMTC problems of a severe power-law distribution of labels in the datasets.They focus on the use of label cluster structure knowledge,while making the balanced predictions through the co-occurrence of head labels and tail labels in the same label clusters.However,the above methods solidify the structure of label cluster,and information gains from label knowledge cannot apply to the dynamic and rich real semantic scene,which fails to achieve the ideal classification effect.To solve these problems,this paper explores feasible ways of label knowledge usage to mitigate the long tail problem of XMTC task under consideration of the important roles of lable knowledge.The research is as follows.1)Aiming at limitations of using label cluster structure knowledge to alleviate the long tail problem,an XMTC promotion strategy based on label knowledge presents to improve the poor performance resulted from the fixed label cluster structure knowledge.The teacher knowledge generated by text modeling optimizes the text representation and improves the prediction performance of tails labels.The experimental results show that the promotion strategy can effectively improve prediction performance of the existing XMTC methods on the tail labels and the whole labels.2)Aiming at the problem that the methods of introducing teacher knowledge strategy in1)has simple structure,and insufficient ability of network expression and feature extraction,an XMTC algorithm TReader XML based on teacher knowledge strategy is proposed.TReader XML proposes a framework that enables dual-stream collaborative network,in a way that naturally allows the teacher knowledge and text features to be embedded into the shared semantic space to achieve feature interaction.The experimental results prove that TReader XML has achieved the state-of-the-art results on the whole labels and the tail labels.3)Aiming at the cost and risk problems during enterprise deployment of XMTC achievements in academia,based on the research results of 1)and 2),a toolkit of XMTC LKRoad based on label knowledge is proposed.LKRoad formulates data standard and realizes tools of data analysis,data preprocessing,classification algorithms based on label knowledge and result evaluation.The experimental results prove the rationality of the framework design and its important value for the industrial implementation of academic methods.

Keywords/Search Tags:

Extreme multi-label text classification, long tail problem, label knowledge, dual stream collaboration network, algorithm framework

PDF Full Text Request

Related items

1	Research On Feature Extraction Of Multi-label Text Classification
2	Label Structure Based Deep Learning For Long-tail Distributed Classification
3	Research And Implementation Of Multi-label Text Classification Method For Threat Extraction
4	Multi-label Text Classification Based On Long Short-Term Memory
5	Research On Multi-label Data Stream Classification Method Based On Kernel Extreme Learning Machine
6	Research On Extreme Multi-label Classification Based On Parallel Label Trees
7	Research On Text Classification Technology For Asymmetric And Multi-label Problem
8	Research On Multi-label Text Classification By Integrating Label Informatio
9	Research On The Improvement Of Multi-label Text Classification Algorithm For Offensive Language In Social Media
10	Research On Label Coding Algorithms For Multi-label Classification