Multi-Label Text Classfication Algorithm Based On Seq2Seq Model

Posted on:2022-10-13

Degree:Master

Type:Thesis

Country:China

Candidate:W Xu

Full Text:PDF

GTID:2518306575965769

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Multi-label text classification is a method to assign multiple labels to each text by analyzing their semantics.With the success of seq2seq model in machine translation,style transfer,abstractive summarization,and other fields,the model can be used to generate positive labels corresponding to each text.However,there will exist two problems including exposure bias and the need to predefine the label order when the seq2seq model is utilized to solve the multi-label classification problem.Additionally,multi-label learning also needs to pay attention to the correlation among labels and the relationship between the instance and the label.In order to solve these problems,this thesis proposes a seq2seq model based on graph embedding and region attention mechanism.Moreover,a seq2seq model based on joint embedding and multi-output is proposed to solve the exposure bias and improve the overall performance of the model.The main research achievements of this thesis are as follows:1.The seq2seq model consists of encoder,attention mechanism and decoder,this thesis makes some improvements to make the Seq2 Seq model suitable for multi-label text classification as follows: Firstly,the graph embedding technology which can mine the associated information among labels is applied to generate the label vector for the encoder.Furthermore,unlike machine translation task,the model needs to pay attention to the information of multi-region of the text when predicting the current label in multi-label text classification task.Based on this problem,this thesis improves the dot attention mechanism commonly used in machine translation,and designs the region attention mechanism which focuses on the information of text segments.Finally,the policy gradient is utilized to solve the problem of exposure bias and predefined label order.2.This thesis proposes a seq2seq model based on joint embedding and multi-output.First of all,the joint embedding strategy utilities the label information of K-Maximum probabilities at the previous timestep to predict the current label,and combines scheduled sampling strategy to effectively solve the exposure bias.Finally,the generalization performance of the model can be improved effectively by training multiple decoders from multiple ways and merging results of multiple decoders at inference stage.Experiments on two multi-label text datasets show that graph embedding and region attention mechanism improve the performance of the model in chapter three,the generalization performance of the model in chapter four is improved by using joint embedding and merging multi-output,furthermore,the performance of proposed models are superior to the state-of-the-art methods on two main metrics.

Keywords/Search Tags:

multi-label learning, seq2seq model, graph embedding, region attention mechanism, joint embedding

PDF Full Text Request

Related items

1	Research On Automatic Code Generation Approach Based On Tag-graph Embedding
2	Text Classification Based On Label Embedding And Attention Mechanism
3	History-based Attention In Seq2seq Model For Multi-label Text Classification
4	Advertising Recommendation Model Based On Graph Embedding
5	Research Of Knowledge Graph Embedding Adversarial Learning Method Based On Attention Mechanism
6	Research On Label Embedding In Ambiguous Machine Learning
7	Follower Recommendation Based On Time-aware Hybrid Graph Embedding
8	Research On Graph Neural Network With Graph Embedding Model For Session-based Recommendations
9	Research And Implementation Of Product Recommendation Algorithm Based On Graph Embedding And Deep Learning
10	Multi-Effects Embedding Based Personalized POI Recommendation Method