Font Size: a A A

Research On Multi-label Text Classification Algorithm Based On Deep Reinforcement Learning

Posted on:2021-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:L Q HouFull Text:PDF
GTID:2518306110987649Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The text classification problem is one of the core research directions of natural language processing.Among them,the multi-label text classification(MLTC)task is the most important and the most challenging.Multi-label text classification has a wide range of applications in the fields of information retrieval,recommendation systems,user portraits,etc.In different scenarios,the characteristics of its data are often different,so it further increases the difficulty of multi-label text classification tasks.In a multi-label text classification problem,a sample corresponds to multiple labels.There are usually some internal relationships between these labels.Early text classification tasks were usually based on traditional machine learning models.However,traditional methods tend to ignore the inner link between tags.With the development of deep neural networks,some deep learning-based sequence-to-sequence(Seq2Seq)and sequence-to-set(Seq2Set)models have been applied to multi-label text classification tasks,and these models have also shown excellent performance.However,for the sequence-to-sequence(Seq2Seq)model,it introduces the interference factor of label order.In practical tasks,labels should be an unordered set,not an ordered sequence.For a sequence-to-set(Seq2Set)model,the prediction result lacks interpretability,That is to say,the model can not explain which sentence or words in the sample should be corresponding to each label after classification.In view of the shortcomings of the existing methods,this paper proposes a novel algorithm framework and named it TC-SRM.This framework models the multi-label text classification task as a process of text serialization reading.The framework includes three core parts: a text feature extraction module,a deep reinforcement learning module,and an association relationship learning module between tags.In the text feature extraction section,this paper explores a variety of different methods of text vectorization,and finally the best method of experimental results is selected and applied to our framework.In the deep reinforcement learning module,this paper uses DQN algorithm implements the serialized reading and decision-making process of the text.In the learning part of the association relationship between tags,this article encodes the historical action information of the agent into the state of the environment,so that the algorithm learns the relationship between the tags.In this paper,the algorithm framework is applied to the legal case retrieval system related to private lending and has achieved good results.In addition,this paper also carried out some exploratory research in order to apply the algorithm framework proposed in this paper to more fields.The algorithm framework proposed in this article has been released for use as an open source tool.
Keywords/Search Tags:Multi-label Text Classification, Deep Reinforcement Learning, Text Sequential Reading, Legal Case Search
PDF Full Text Request
Related items