Font Size: a A A

Model Selection Of Non-negative Matrix Factorization And Its Application In Biological Data Mining

Posted on:2019-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhengFull Text:PDF
GTID:2370330548466863Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Model selection is one of the steps in many important methods of machine learning Model selection is widely used in methods such as data clustering,complex network community discovery,and data dimensionality reduction.How to accurately select models,so as to select a reasonable target dimension,and then guide the interpretable analysis program,and tap the hidden information hidden in the data is a challenge for the choice of machine learning model.Matrix low-rank decomposition is a widely used data dimension reduction and data representation method.Non-negative matrix factorization is the most representative low-rank matrix decomposition method.Non-negative matrix factorization(NMF)is a low-rank approximation method for matrices.The values of its decomposition matrix and decomposition result matrix are both non-negative.Non-negative matrix factorization can reduce high-dimensional data to low dimensions.A reasonable dimension can lead to more ideal decomposition,so that the decomposed low-dimensional matrix can retain the characteristics of the original data as much as possible.Focusing on the dimension selection of non-negative matrix factorization,that is,model selection,this paper has done the following research work:First,a model selection method Tendency Drive Nonnegative Matrix Factorization(TDNMF)was proposed.Different from other methods of model selection in the decomposition process,this method starts from the structure preservation before and after the data decomposition.Based on the correlation relationship between data points,the concept of sample homomorphism is proposed,and the method of resampling is used to solve the problem.The sample correlation is compared when the sample sizes are inconsistent.Thanks to these two data processing techniques,the TNDMF based on the same tropism has a smaller time complexity.Second,an information-equalized dimension selection method,Entropy Balanced Nonnegative Matrix Factorization(EBNMF)is proposed.This method combines the scalable decomposition properties of nonnegative matrices with the unsupervised learning method of efficient and stable dimension selection criteria.The method demonstrated good performance.Further,the proposed method was validated on real biological data sets including Drosophila gene expression data and human microbiome data sets,demonstrating the stability and interpretability of the EBNMF method.EBNMF can make good model selection in the process of information decomposition,and can effectively extract the effective features of biological data with noise.The non-negative matrix factorization model is widely used in many fields because the overall conformity is formed by local objective laws.However,the model selection is still a difficult problem.This paper presents two non-negative matrix factorization model selection methods,which have certain advantages in computational complexity and accuracy,and can be applied to different levels of data sets.
Keywords/Search Tags:Non-negative matrix factorization, Bioinformatics, Data mining, Model selection, Unsupervised learning
PDF Full Text Request
Related items