Font Size: a A A

Research On Feature Dimensionality Reduction And Text Classification Method Based On Multi-label Leaming

Posted on:2021-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:C WangFull Text:PDF
GTID:2428330614958185Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Multi-label text classification has become an effective method for processing text information data with the explosive growth of network information in the internet era.Information data can be quickly and accurately located in the category through multi-label text classification.Multi-label text classification algorithms have gradually become a research focus in the field of natural language processing.However,the existing multi-label classification algorithms still have the problems of class imbalance and high computational complexity,and the multi-label data generated through the network has characteristics such as high feature dimensions and complex data.It is easy to affect classification efficiency and classification accuracy when directly used in classification tasks.Based on this,this thesis studies feature dimensionality reduction algorithm and the multi-label text classification algorithm,and proposes a multi-label dimensionality reduction algorithm based on Kullback-Leibler divergence dependency maximization and a multi-label text classification model based on gravitational model.For the high-dimensional features of multi-label data in the multi-label classification process,it will make the calculation difficult and the classification efficiency low.This thesis improves the dimension reduction method based on dependency maximization and proposes a multi-label dimension reduction method based on Kullback-Leibler divergence dependency maximization.In the feature dimension reduction stage of the classification process,the original matrix is mapped into a low-dimensional space and the dependency between the original feature description and the class label is maximized through the Kullback-Leibler divergence.The amount of calculation is greatly reduced due to omitting eigenvalue decomposition.The experimental results show that the dimensionality reduction method proposed in this paper can effectively reduce the dimensionality of multi-label data and improve the efficiency of multilabel classification.In order to solve the problem that the proposed multi-label classification algorithm has class imbalance and high computational complexity,a gravitation-based multi-label text classification model is proposed by improving the gravitational model.During the training phase,a quality factor is used to represent the centroids of each category,and the similarity interval between the document and the class centroids is calculated.In the test phase,multilabel classification is performed by comparing whether the similarity between the undefined document and the class centroid is within the similarity interval.The experimental results indicate that the performance index of this multi-label classification method is better than some existing multi-label text classification methods,which proves the effectiveness and feasibility of this method in multi-label text classification.
Keywords/Search Tags:multi-label, text classification, dimensionality reduction, Kullback-Leibler divergence, gravity model
PDF Full Text Request
Related items