Font Size: a A A

A Non-negative Matrix Factorization Clustering Algorithm Based On L2,1/2 Sparse Constraint And Cosine Similarity

Posted on:2019-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:W J XieFull Text:PDF
GTID:2428330566959491Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Text clustering algorithm has always been a hot issue in the era of large data.The popular method is to use the theme model or spectral clustering algorithm as the basis,the traditional k-means method is used for further clustering.Some models use the non-negative matrix factorization algorithm to decompose the high dimensional sparse text feature matrix,then clustering on the matrix vector after dimensionality reduction.But this kind of model in the optimization process is often subject to inherent relationship between the latent features,increasing the difficulty of minimizing the loss function,the general non-negative matrix factorization is used to construct a large number of sparse NMF,try to find a suitable solution with additional constraints and transformations,which will lead to the complexity of the calculation.This paper studies the models and ideas of the traditional non-negative matrix factorization algorithm including sparse constraint NMF and other regularized NMF,on the basis of the existing algorithms,the idea of introducing cosine similarity,we put forward a non-negative matrix factorization clustering algorithm INMF based on L2,1/2 sparse constraint and cosine similarity.In the non-negative matrix factorization of the document term frequency matrix,this model uses cosine similarity to reduce the correlation between latent features,prevents the co-adaption of latent features,improves the independent feature learning ability of NMF,on this basis,using L2,1/2 sparse norm to achieve the purpose of sparse representation of data and simplification of calculation,and enhance the the local learning ability and robustness of the algorithm.Therefore,the semantic information in the latent features is more obvious and the representation of the latent space is more discriminatory.In this paper,the experiments and analysis are based on the open dataset.The results show that the INMF algorithm proposed by this paper is significantly better than the traditional NMF algorithm in a series of evaluation indexes on the dataset with high sparsity.
Keywords/Search Tags:non-negative matrix factorization, L2,1/2 sparse, independent feature learning, cosine similarity
PDF Full Text Request
Related items