Font Size: a A A

Research And Application Of Text Classification Algorithm Based On Label Embedding And Self-Interaction Attention

Posted on:2021-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y R DongFull Text:PDF
GTID:2428330602464594Subject:Engineering
Abstract/Summary:PDF Full Text Request
In today's Internet era,the text data generated by the Internet is exploding.In the face of a large amount of text data,how to quickly organize and classify these text data and dig the value behind it is of great significance.Under this background,text classification technology came into being.The text classification method can help users quickly and accurately find the required text by understanding the semantics of the text and outlining the subject content of the text,and then classifying the text into the category to which it belongs.information.Most of the current text classification methods are based on neural network methods,such as CNN,RNN and LSTM.Although they have achieved good text classification effects,it still faces many challenges.It is very important to obtain word embedding representations with more comprehensive semantics in text data.Efficient word embedding methods have always been required in the field of natural language processing.In terms of text representation,current text representation methods only consider the previous context and ignore the interactive representation of the following context and the entire text,resulting in the loss of part of the semantics.In terms of text feature extraction,labels play a central role in the final text classification,but the role of labels in text classification has not been fully utilized.In this context,in view of the above problems,this article mainly does three aspects of work:(1)Proposed a Text classification method based on Bidirectional Long-term and Short-term Memory networks and label embedding.Our paper proposed an improved text classification model aiming at the three existing problem of text classification above.Firstly,our paper use pre-trained BERT to obtain word embedding representations with more comprehensive semantics.Then,BiLSTM is employed to capture the preceding and succeeding contexts to obtain better text representations.What's more,words and labels are learned in the joint embedding space,and the learned attention is weighted to the final text representation and label representation,thereby capturing features more relevant to the later classification task,and at the same time,the label also learns to be more relevant to the labeled content.And finally the classifier classifies the input text according to the weighted label representations.Extensive experimental results prove the effectiveness of this classification method.(2)Proposed a fusion model based Label Embedding and Self-Interaction Attention for Text Classification.The text representation obtained in the first method proposed in this paper does not incorporate the interactive semantic representation of the full text,causing partial some semantic loss.In response to this problem,our paper introduced self-interactive attention,then,we further proposed an improved text classification algorithm based on label embedding and self-interaction attention.Self-interaction attention views the full text as contexts,capturing text representation with the interaction information of full text.Furthermore,the joint embedding of words and tags learns the attention to the entire text sequence,and uses the learned attention to weight the final text representation to capture the interactive representation that is more relevant to the later classification task.A large number of experimental results prove the effectiveness of the classification algorithm.(3)Designed and implemented a text classification system based on Label Embedding and Self-Interaction Attention.On the basis of the proposed text classification framework based on label embedding and selfinteraction attention,a text classification system with good interaction design and high classification accuracy is designed and implemented.Extensive system function test results show that the classification system can effectively reduce the costs(manual labeling costs and financial resources)required to complete the text classification task,effectively improve the efficiency of classifying unlabeled text,and effectively solve the problem that users obtain the required information quickly and accurately.
Keywords/Search Tags:text classification, text representation, labels embedding, self-interaction attention
PDF Full Text Request
Related items