Font Size: a A A

Research On Multi-label Text Classification Based On Improved Seq2seq Model

Posted on:2021-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:X H LiuFull Text:PDF
GTID:2428330602989139Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence technology,neural networks have been widely used in natural language processing tasks,and have made revolutionary progress.Text classification is an important task in the field of natural language processing.With the complexity and variety of Internet information,the data content is becoming richer,and the classification granularity is getting finer and finer.Therefore,research on multi-label text classification came into being.The multi-label text classification task is one of the main research areas in the field of natural language processing,which provides great convenience for information retrieval,recommendation systems,dialogue systems,etc.It has great research prospects and application value.In this paper,we mainly focus on multi-label text classification.The feature extraction from text,the vector representation of words,and the correlation between labels are the core basic technologies in the field of multi-label text classification.To improve the multi-label text classification,we propose a novel sequence-to-sequence model.The research major contributions are as follows.Firstly,in order to effectively highlight the word-level key information,the insufficient use of local and global features of the text,we build a joint model to comprehensively extract text features.This model can employ the multi-head attention mechanism to distinguish the importance of each word and capture the vital information from the keywords.The joint model applies capsule networks to extract the local features and BiLSTM to extract the global features from texts.Both local and global features are integrated by the fusion layer to generate more comprehensive and detailed text features.Secondly,for traditional word vectors,it cannot solve the problems of word ambiguity,change according to context information,and cannot capture multi-level text features and obtain the correlation between tags.To address the above problem,we propose an improved seq2seq model based on the traditional seq2seq model.The improved seq2seq model can not only obtain a rich semantic representation,but also capture the correlation between tags.The ELMo pre-trained language model and GloVe word vector are constructed into vector representation of the text to obtain richer semantic information.The encoder is the previous joint model proposed in this paper,which is used to obtain multi-level text features.The decoder is used to capture the correlation information among categories,which can effectively improve the classification performance.Experimental results show that the improved seq2seq model achieves state-of-the-art performance in the multi-label text classification task.
Keywords/Search Tags:Multi-label Text Classification, Feature Fusion, Joint Model, Text Representation, Neural Networks
PDF Full Text Request
Related items