Font Size: a A A

Research On Capsule Network Text Classification Algorithm Based On Label Embedding

Posted on:2022-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:C F WangFull Text:PDF
GTID:2518306551970649Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Text classification is a fundamental and important subtask in the field of natural language processing(NLP).The task aims at labeling a piece of text.For example,categorizing news texts according to topics,or staring user comments from e-commerce platform,etc.With the explosive growth of Internet data,how to classify these complex text data has become an important research topic.At present,the most typical text classification methods are mainly based on deep learning model,including Convolution Neural Network(CNN),Recurrent Neural Network(RNN),Transformer and Capsule Networks(Caps Nets).In many analysis and summary of the existing research methods based on Caps Nets,we found that there still exist several problems.Firstly,the current methods regard the extracted n-gram features of the text as equally important,and ignore the problem that the importance of each n-gram feature corresponding to the word should be determined by the specific context,which will directly affect the semantic understanding of the whole text.Secondly,most of the current researches only utilize text labels as supervision signals in the prediction stage,while ignoring the rich semantic information contained in the labels,which may be helpful to the reinforcement of the model in the training process.Thirdly,complete routing algorithm is adopted between the high-level and low-level capsules of Caps Nets,so that some redundant syntactic and semantic information from the low-level is transmitted to the high-level capsule,this information will interference the classification results.To solve these problems,this paper improves the current Caps Nets based text classification model.Firstly,Partially-connected Routings Caps Nets with Multi-scale Feature Attention(Mul Part-Caps Nets)is constructed,which incorporates multi-scale feature attention into Caps Nets.Multi-scale feature attention can automatically select n-gram features from different scales,and capture accurately rich n-gram features for each word by weighted sum rules.At the same time,due to the introduction of multi-scale feature attention,there is no need for multiple similar complete capsule network layers to capture grammatical features of different scales,and the size of parameters is greatly reduced.In addition,a partially-connected routings algorithm is proposed to make the high-level capsule only keep connection with some adjacent low-level capsules.That is,we need to remove the smaller routing weights between the low-level and high-level capsules,and then the remaining weights is re-averaged to keep the sum of weights to 1,which is helpful to reduce the transmission of redundant syntax and semantic information from low-level capsule to high-level capsule.So,the text features extracted by high-level capsule will be more accurate.Secondly,the label information of the text is also introduced into the process of capsule text feature modeling.Based on Mul Part-Caps Nets,label information is introduced into our model.So Mul Part Lab-Caps Nets is constructed.By calculating the correlation between words and all labels in the label set,the weight of the label with the largest correlation will be assigned to each word as supplementary information for text representation learning.In this way,the model can strengthen the role of classification related words in the training process according to the semantic information in the label,and weaken the influence of classification independent word.In order to verify the effectiveness of the proposed model,our experiments are conducted on seven well-known datasets in text classification.And the experimental results show that,the classification accuracy of the proposed method is improved by about 1percentage point in each public dataset,up to 2.4 percentage points.At the same time,model parameter size is discussed too.We found,parameters are reduced by 3 / 4 compared with current Caps Nets based text classification model with the introduction of multi-scale features attention,which verify the validity of proposed model.
Keywords/Search Tags:Text Classification, Capsule Network, Label Embedding, Routing Algorithm, Multi-scale Features Attention
PDF Full Text Request
Related items