Font Size: a A A

Tumor Gene Expression Data Analysis Based On Nonnegative Matrix Factorization

Posted on:2016-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q TanFull Text:PDF
GTID:2308330461992192Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Analysis of gene expression profile data has gradually become a routine procedure for disease diagnosis and classification. The dimension of the original gene data is high but the number of sample is few, so it is the key and difficult of the study to extract the redundant genes from the large number of gene data and extract the characteristic genes which can characterize the attributes of the samples. In practical application, people can use effective data representation methods to solve this problem, promptly using a low-dimensional representation of data to dig out the inner structure and the nature information of the original data. When in the process of dealing with the gene datasets, the classification result of some common data processing methods are not very good, so this paper uses the Nonnegative Matrix Factorization (NMF) algorithm to represent the original gene data with low-dimension. Compared with other matrix decomposition methods, this method can not only reflect the local feature information but also realize effective classification.The main contents of this paper are as follows:1. This Paper first introduces the basic theory of the tradition NMF algorithm, and briefly introduced several variant algorithms of NMF on the basis of tradition NMF, finally it introduces the objective function and the iterative rule of these modified.2. Bi-orthogonal Nonnegative Matrix Tri-factorization (BONMTF) algorithm is applied to the data mining of gene expression data. Firstly, this article carries out a systematic analysis of BONMTF algorithm; and men it uses the algorithm to get the matrix which characterizes the properties of the samples; furthermore, this paper also applies BONMTF algorithm in tumor classification. Therefore, it improves the recognition rate of samples. Four representative groups of gene expression data are used for test. The results prove that, for different data sets, the method has higher recognition rate than conventional methods. Therefore, it has certain feasibility and application prospects.3. Symmetric Three-factor Non-negative Matrix Factorization (STFNMF) algorithm is applied to the tumor classification. Firstly, this paper ranks the genes by scoring criteria to reduce the interference of the noise; secondly, it maps the tumor samples into the points in high-dimension space and makes the use of weight matrix to construct a similarity matrix, then it uses the STFNMF algorithm to extract the features; finally, this paper uses SVM to realize the classification of tumor samples. Four representative groups of gene expression data are used for test, it uses a lot of experiments to verify the proposed algorithm has better performance when compared with other traditional algorithms.
Keywords/Search Tags:Cancer Classification, Gene Expression Profile Data, Bi-orthogonal Nonnegative Matrix Tri-factorization, Symmetric Three-factor Nonnegative Matrix Factorization
PDF Full Text Request
Related items