Research On Multi-label Text Classification Based On Improved Seq2seq Model

Posted on:2021-01-17

Degree:Master

Type:Thesis

Country:China

Candidate:X H Liu

Full Text:PDF

GTID:2428330602989139

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of artificial intelligence technology,neural networks have been widely used in natural language processing tasks,and have made revolutionary progress.Text classification is an important task in the field of natural language processing.With the complexity and variety of Internet information,the data content is becoming richer,and the classification granularity is getting finer and finer.Therefore,research on multi-label text classification came into being.The multi-label text classification task is one of the main research areas in the field of natural language processing,which provides great convenience for information retrieval,recommendation systems,dialogue systems,etc.It has great research prospects and application value.In this paper,we mainly focus on multi-label text classification.The feature extraction from text,the vector representation of words,and the correlation between labels are the core basic technologies in the field of multi-label text classification.To improve the multi-label text classification,we propose a novel sequence-to-sequence model.The research major contributions are as follows.Firstly,in order to effectively highlight the word-level key information,the insufficient use of local and global features of the text,we build a joint model to comprehensively extract text features.This model can employ the multi-head attention mechanism to distinguish the importance of each word and capture the vital information from the keywords.The joint model applies capsule networks to extract the local features and BiLSTM to extract the global features from texts.Both local and global features are integrated by the fusion layer to generate more comprehensive and detailed text features.Secondly,for traditional word vectors,it cannot solve the problems of word ambiguity,change according to context information,and cannot capture multi-level text features and obtain the correlation between tags.To address the above problem,we propose an improved seq2seq model based on the traditional seq2seq model.The improved seq2seq model can not only obtain a rich semantic representation,but also capture the correlation between tags.The ELMo pre-trained language model and GloVe word vector are constructed into vector representation of the text to obtain richer semantic information.The encoder is the previous joint model proposed in this paper,which is used to obtain multi-level text features.The decoder is used to capture the correlation information among categories,which can effectively improve the classification performance.Experimental results show that the improved seq2seq model achieves state-of-the-art performance in the multi-label text classification task.

Keywords/Search Tags:

Multi-label Text Classification, Feature Fusion, Joint Model, Text Representation, Neural Networks

PDF Full Text Request

Related items

1	Research On Multi-label Text Classification Based On Hybrid Neural Network
2	Research On Multi-Label Text Classification Methods Based On Topic Feature
3	Research And Implementation On Text Classification In Vertical Domain
4	Research On Multi-Label Text Classification Based On Deep Learning
5	Research On Text Classification Algorithm Fusion Label Information And Capsule Network
6	Research On The Essential Technology Of Multi-Label Chinese Text Classification
7	Research On Text Multi-label Classification Algorithm Based On Label Correlation
8	Research On Label-aware Text Classification Methods
9	Research On Multi-label Classification Method Of Chinese Short Text Based On Multi-dimensional Feature Fusion
10	Research On Multi-label Text Classification Based On Text And Label Representation Optimization