Font Size: a A A

Research And Application On Nonnegative Matrix Factorization Algorithm For Clustering

Posted on:2020-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:B C FanFull Text:PDF
GTID:2428330575493579Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As an important data analysis method,clustering algorithms have been widely applied in all kinds of fields.Clustering algorithms are able to assign samples into different clusters according to the features and the similarities between them,and the samples belong to the same cluster should be as similar as possible,the samples belong to different clusters should be as dissimilar as possible.We can have a fast understanding of the structure of the data and find the underlying pattern of each cluster by using clustering algorithms to further analyse data.So far,various clustering methods have been developed to solve different problems.Among them,Nonnegative Matrix Factorization(NMF)is a popular method,which decomposes the nonnegative input matrix into the form of the product of the nonnegative basis matrix and the nonnegative representation matrix.Due to only the nonnegative elements are involved,NMF model has better interpretation and is consistent with the requirement of pratical application,such as image,gene expression data,and spectral data.Meanwhile,nonnegative constraint also guarantees that each data sample is represented by the nonnegative linear combination of nonnegative basis vectors,which reflects the mapping from whole to parts.However,there are still some issues of NMF:1.the generation of basis matrix is based on algebra;2.the basis vectors may have correlations under only nonnegative constraints;3.NMF neglects the correlations of local information.In order to overcome the abovementioned issues,based on nonnegative matrix factorization we propose three algorithms.The main research work and achievements are as follows:(1)A density peaks based multiple centroids nonnegative matrix factorization algorithm is proposed to capture the manifold information or structure information in the original data.It may not be optimal for the complex data.Clustering results.In order to capture the local structure of the original data,first select multiple density peaks from data points.Then,set the linear combination of these density peaks as the initial cluster centroids to obtain the relationship between the data points and the centroids.Finally,assign every data point to a centroid according to its max similarity.Degrees are completed.Experiments were carried out on several synhetic datasets and real datasets.The propomising experimental results demonstrate that the method can effectively introduce the local structure and improve the clustering effect.(2)A Dropout based deep semi-negative matrix factorization model is proposed.NMF factorizations the input matrix into the production of a basis matrix and a representation matrix.The nonnegative constrains cannot guarantee that the latent features in basis matrix are orthogonal and non-overlapping.In order to break the co-occurrences between latent features and decrease the redundancy,A Dropout based semi-negative matrix factorization model is proposed.By incorporating the deep model,a Dropout based deep semi-negative matrix factorization model is further proposed.(3)A nonnegative matrix tri-factorization model for biclustering is proposed.In order to find out the important features of the high-dimensional data while clustering,we clustering the data according to the correlation between features and and the correlation between samples.According to manifold assumption,the feature neighbor graph and the sample neighbor graph are constructed for the features and the sample,and the manifold regular terms are constructed based on them.It is well known that Frobenius norm is sensitive to noise and outliers.To overcome this weakness,we use L2,p norm in the cost function.Experiment results on real high-dimensinoal gene expression dataset show the superiority of the model.
Keywords/Search Tags:clustering, NMF, density peak, deep model, graph regularization
PDF Full Text Request
Related items