Font Size: a A A

Applying K-means With PCA To Identifygenes Associated With Alzheimer’s Disease

Posted on:2014-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:D X ZhangFull Text:PDF
GTID:2268330425957406Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Alzheimer’s disease (AD) is the most common form of dementia, currently, thegenes confirmed related with AD are only amyloid precursor protein (APP), apolipoprotein E(ApoE4), presenilins1(PS1), presenilins2(PS2) and tau protein. In order to understand itsformation mechanism, DNA microarray expression data analysis appears to be very important.The common used data analysis methods are many, such as principal component analysis(PCA), a statistical method widely used in unsupervised dimensionality reduction; K-meansclustering algorithm (K-means), a common data clustering method in unsupervised learningtask. But the drawback of K-means is that the results have a strong dependence on theselection of class number and center of centroid point at the first iteration. Based on theK-means clustering algorithm, an improved method is proposed in this paper. Firstly, applyPCA to reduce the dimension of a given gene data, and determine the gene number of classesand corresponding centers with one-dimension classification method proposed in this paper.Secondly, the results obtained from the PCA are applied to the K-means clustering algorithmto identify candidate pathogenic genes of AD. Finally, this paper identified38candidatepathogenic genes; including eight genes have been supported by other team. Regard theknown AD genes as the initial center, combined with principal component analysis algorithmand K-means clustering algorithm, to find the candidate pathogenic genes related with knownAD genes. The main contents are as follows:Firstly, this paper expounds the Alzheimer’s disease, gene chip technology and thepathogenic gene for AD research present situation.Secondly, one dimensional classification method is put forward, combines with principalcomponent analysis algorithm and K-means clustering algorithm to design a new algorithm.Thirdly, using the algorithm of this paper, we find co-expressed genes and isolated genesassociated with known AD genes, and use that as the candidate pathogenic genes of AD.
Keywords/Search Tags:Alzheimer’s disease, DNA Microarray Data, Gene, Principal ComponentAnalysis, K-means
PDF Full Text Request
Related items