Font Size: a A A

Non - Negative Matrix Decomposition And Its Application In Gene Expression Data Analysis

Posted on:2016-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:C X MaFull Text:PDF
GTID:2270330464463538Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Bioinformatics is an emerging interdisciplinary field of computer science, statistics and applied mathematics. For further revealing the hidden biological mystery, we often use it to research and explain the hidden biological resources. With the extraordinary development of technology, the biological databases influx thousands of data every day. In bioinformatics, how to accurately and efficiently identify potential function of ge nes and the corresponding gene expression levels from a large number of gene expression data. The emergence of DNA microarrays solves this problem well. Wherein, the generated gene expression data by DNA microarray has attracted the attention of researchers. The gene expression data matrix is a typical ‘ultra high dimensional small sample’ problem, which poses challenge to the gene expression data analysis and processing. Therefore, it is crucial to select an appropriate and effectively cluster analysis method and dimension reduction method in gene expression data analysis.In this thesis, the theory of non-negative matrix factorization is used for clustering analysis of cancer samples and extracting characteristic genes. In order to make it more efficiency, we propose two improved non-negative matrix methods and apply them to the extraction of characteristic gene. The experiments demonstrate the feasibility and effectiveness of the two improved methods. The main studies of this thesis are listed as follows:(1) Non- negative matrix factorization in gene expression data clustering analysis. First ly, the system generalization of on-negative matrix factorization(NMF); Secondly, the graph regularized non- negative matrix factorization(GNMF) is used to cluster analysis of tumor samples; Finally, NMF, SNMF and GNMF clustering methods are studied and analyzed in tumor samples.(2) Based on the L0-norm constraint graph regularized non-negative matrix factorization algorithm. We simply combine the principles of GNMF and L0-NMF into the L0-norm constraint graph regularized non-negative matrix factorization algorithm(GL0NMF). In order to make a further study analysis, we apply it to the extraction of gene expression data. Last, the experimental result is analyzed by the Gene Ontology(GO). Compared with PMD, SPCA and GNMF algorithms, the experiment results demonstrate that our algorithm has certain feasibility and effectiveness on gene extraction.(3) C lass-information-based sparse non-negative matrix factorization algorithm. In order to improve the efficiency of NMF in gene expression data analysis, we combine category information and sparse non-negative matrix factorization algorithm into class- information-based sparse non-negative matrix factorization algorithm(CISNMF). In order to make a further study analysis, we apply it to the extraction of gene expression data. Finally, it is conducted qualitative analysis by GO. Compared with PMD, SPCA, SNMF and SVM-RFE algorithms, the experiment results demonstrate that our algorithm has certain feasibility and effectiveness on gene extraction.The main innovation of this thesis is to propose two improved non-negative matrix algorithms: the L0-norm constraint graph regularized non-negative matrix factorization algorithm(GL0NMF) and class- information-based sparse non-negative matrix factorization algorithm(CISNMF). The two algorithms are applied to the extraction of gene expression data. These experiment results have verified that our algorithms have certain feasibility and effectiveness on gene extraction.
Keywords/Search Tags:Gene expression data analysis, C luster analysis, Characteristic gene extraction, NMF, Gene O ntology
PDF Full Text Request
Related items