Font Size: a A A

Study And Implementation On The Unsupervised Phenotypes Discriminating Algorithms Based On Projected Clustering

Posted on:2011-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WangFull Text:PDF
GTID:2248330395458448Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The successful completion of the human genome projiect, the exponentially increasing volume of biological data and the advancement of information technology pose new challenges for bioinformatics in the post-genome ear. In recent years, the rapid advances in microarray technology which has been widely applied in functional genomics reseaches enable people to measure simultaneously the expression levels for thousands or tens of thousands of genes in a single experiment (Gene expression data obtained through microarray technology is called microarray gene expression data). Such high throughput capability offers great opportunities in terms of gene expression data collection but also poses great challenges in terms of mining the data.The contradiction between high throughput Microarray technology and manual phenotypes calibration manner led to the imbalance between the acquisition of gene expression data and the determination of phenotypes. However, the existing methods of phenotypes discriminating are most supervised; they choose correlated genes according to individual discriminative score of single gene to partition the sample phenotypes and usually ignore the widespread regulatory relationships among genes. This thesis proposes two unsupervised phenotypes discriminating algorithm, namely USPD1and USPD2, from a new perspective, which are based on the projected clustering, with the consideration of the correlations among genes. By transforming gene expression data into sequence data with negative gap constraint, the relationships among genes are emphasized. By the designed quality functions, USPD1and USPD2conduct quickly depth-first traverse on the sample enumeration tree and generate the sample phenotype partitions in an unsupervised mean, while several novel efficient pruning strategies are adopted to further improve the performance of the algorithms. Compared with HARP, a classic projected clustering algorithm for gene expression data, it shows that the proposed algorithms, USPD1and USPD2, are more efficient. Moreover, the experiments conducted on five real Microarray datasets prove the effectiveness of USPD1and USPD2algorithms.This thesis studies in the sample phenotypes discriminating of gene expression data. Algorithms use projected clustering method of data mining to partition the sample phenotypes unsupervised and generate the diagnosed genes from the corresponding p-signature of sample phenotypes. This thesis provides a new perspective for the diagnosis of disease and the cause of disease.
Keywords/Search Tags:Projected clustering, gene expression data, sequence data, sample enumeration tree, phenotype partition
PDF Full Text Request
Related items