Font Size: a A A

Research On Text Classification And Clustering Algorithm Based On Manifold Learning And Sparse Constraints

Posted on:2020-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:2518305882975799Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,text data is exploding,and text classification and clustering have also become research hotspots.K-nearest neighbor algorithm and non-negative matrix decomposition algorithm are two commonly used methods in text classification and clustering.However,due to the defects of the algorithm itself,the diversity of text data structure and the complexity of Chinese semantics,the basic algorithm has been unable to obtain better results,optimization algorithms are also constantly being proposed.The problem of reasonable selection of K value in the traditional K-nearest neighbor algorithm has always been a research hotspot of the algorithm.At the beginning of the algorithm,it mainly relies on the experience of researchers to determine the K value,which has greater randomness and blindness,and usually leads to the degradation of algorithm performance and can not get the optimal classification result.Aiming at the above problems,this paper optimizes the sample space by using the least square model reconstruction method that integrates theL2,1,/2normal form with row sparsity so that the transformation matrix has a good sparse effect.At the same time,the local preserving projection algorithm based on manifold theory is introduced to keep the nearest neighbor structure unchanged during the projection transformation of samples.An adaptive LAKNN algorithm based on manifold LPP algorithm and sparse constraint is proposed to automatically determine the K value in KNN algorithm.We applies the improved algorithm to text classification.Experimental results of text classification show that,compared with other comparison algorithms,the proposed method improves the accuracy of text classification to a certain extent.In view of the traditional non-negative matrix decomposition algorithm is affected by the internal correlation between potential features in the process of model optimization,and there are large error noises and outliers in the data,which affect the clustering performance of the algorithm.This paper proposes a sparse non-negative matrix decomposition text clustering algorithm for independent feature learning.In the process of non-negative matrix decomposition,combined graph regularization and sparse representation theory,the sparse constraints are applied to the target function to make the decomposed matrix have controllable sparsity.At the same time,cosine similarity is introduced into the non-negative matrix decomposition model to reduce the potential feature correlation,and the independent feature learning ability of the algorithm is enhanced.The correlation is minimized by minimizing the cosine between potential features.Finally,the algorithm in this paper is compared with the existing improved algorithm,the effectiveness of the improved algorithm proposed in this paper is proved.
Keywords/Search Tags:manifold learning, sparse constraint, K-nearest neighbor classification, non-negative matrix factorization
PDF Full Text Request
Related items