Font Size: a A A

Non-negative Matrix Factorization Based Clustering Research For Cancer Gene Expression Data

Posted on:2016-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:J PengFull Text:PDF
GTID:2404330473464896Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As we all know,cancer has always been a threat to human health,and has been "the number one killer".A specific tumor type often contains a variety of subtypes,different subtypes usually have different genetic variations and gene expression patterns,and will have different responds to the same treatment.There fore,cancer subtype information is critically important for effective treatment of cancers.So far,gene expression data has been widely used in the identification of cancer subtypes,and a variety of machine learning algorithms have been proposed.NMF is an outstanding representative of machine learning algorithms.It has been developing rapidly in recent years and has developed a series of practical algorithms.In this paper,we proposed a kind of clustering algorithm and dual clustering algorithm based on non-negative matrix factorization,and applied them to identify tumor subtypes respectively.This article is about the clustering problems to identify tumor subtypes,and the main work is as follows:(1)After a review of the theoretical basis of feature selection method and its significant importance to data mining,the paper proposed a novel feature selection method especially for cancer subtypes identification.By selecting informative genes and removing irrelevant,redundant genes,our feature selec tion method can not only improve the efficiency of the algorithm,but also can improve the performance of the algorithm to a certain extent.(2)After a review of exsiting models,algorithms and applications of non-negative matrix factorization,the paper proposed a weighted non-negative matrix factorization algorithm.Specifically,gene weights are embedded into the NMF objective function and iteration rule s,so that it is able to fully consider the importance weights of the selected gene in subsequent clustering algorithm.(3)In addition,the paper also proposed a bi-clustering algorithm based on non-negative matrix factorization.Bi-clustering algorithm can simultaneously on gene dimension and sample dimension in cluster analysis.Experiments showed that our proposed bi-clustering algorithm has improved the identification accuracy and robustness to some extent when compared to previous single clustering methods.
Keywords/Search Tags:nonnegative matrix factorization, clustering algorithms, bi-clustering algorithms, cancer subtype identification, feature selection, gene weight
PDF Full Text Request
Related items