Font Size: a A A

Research On Text Classification By Combined Global And Local Features

Posted on:2021-02-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J WuFull Text:PDF
GTID:1368330629983548Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text Classification is an important processing step in the field of Data Mining and Natural Language Processing.It is a prerequisite for the realization of many application technologies.It has become the most concerned research frontier in academia and industry.Most Text Classification methods based on Deep Learning technology use words as text features,and then refine these features layer by layer to obtain strong expressive local features.The final classification result also completely depends on these extracted local features.Although these features can describe the characteristics and context of words well,it is difficult to describe the semantic information about various categories of documents from a global perspective.Therefore,in order to further improve the accuracy and adaptability of the Text Classification method,it is a key and hot topic to extract global features that can represent the global semantic information of the document in the current Text Classification field.It is still very challenging to further improve the accuracy of Text Classification methods: Firstly,the local feature distribution of different datasets is quite distributing difference and the input data is not refined,which leads to the problem of neutral word interference.Secondly,it may lead to be insufficient generalization ability of the Text Classification model for part of imbalanced datasets and lack of labeled.Thirdly,the existing Deep Learning methods deal with the construction of a unified model framework on different subjects,failing to consider differences between subject documents.Lastly,existing methods ignored global features that can represent the spatial distance between samples of different categories,and it is lack of document category distance information.Therefore,only selecting local features as the input of the classification model will miss the global detailed description of different categories,and it cannot accurately represent the global semantic information contained in the categories.The above mentioned four challenges,it is crucial to restrict the performance improvement on the Text Classification model.The main reason is that the existing methods usually ignore the global features of the dataset,and it is difficult to make the algorithm adaptable only by considering the local features.In order to solve the above problems,this paper has conducted in-depth research on Text Classification methods that combined global and local features.These tasks include:(1)Propose a text classification method based on High Utility Neural Network.For the problem that there is considerable variability between distribution of local features of different datasets,and the input data is not refined,which leads to the interference in neutral words.Use High Utility Itemset algorithm to mine text features of high utility value.Excavate the features fully with high utility value,and then use convolution to further extract the local features of the text,to obtain a neural network with stronger classification ability.(2)Propose a text classification method based on Words in Pairs Neural Network.For the uneven distribution of some datasets and the lack of labeled data,the generalization ability of the text classification model is insufficient.To solve this problem,mine some implicit Words in Pairs fully as a supplement based on mining explicit Words in Pairs with strong expressive ability.Using all explicit and implicit words in pairs as input to the neural network,a text classifier with stronger class expression ability is trained.(3)Propose a text classification method based on the Siamese Capsule Network.The existing methods fail to consider the differences between different subject documents,and ignore the problem of global semantic information.First,use the capsule network to learn the local features of spatial position relationship.Then use a global memory mechanism to extract and store the subject features of each category,and perform a Singular Value Decomposition on all the global capsule features stored in the global memory mechanism to obtain the subject center capsules of each category,and use topical center capsules as the global feature.Last,the extracted local and global features are used as the basis for model classification,so that the model can consider both local features and global features.(4)Propose a text classification method based on Triplet Capsule Network.The existing method ignores the global feature that can represent the spatial distance between samples of different categories,resulting in the problem of lack of document category distance information.First,use the capsule network of the first stage training to obtain local features of spatial position relationships.Then the three capsule networks of shared parameters are combined with space,and a Triplet Loss function is used for the second stage of training to learn the global features that can represent the spatial distance between different categories.Construct a network framework that integrates global and local features,and fine-tune through a routing mechanism to fully obtain the global and local features of the text.
Keywords/Search Tags:Natural Language Processing, High Utility Itemset, Deep Learning, Text Classification, Capsule Network
PDF Full Text Request
Related items