Font Size: a A A

Research On Microarray Data Mining Techniques

Posted on:2005-03-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:M Y WangFull Text:PDF
GTID:1100360125969624Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
The new molecular biological technology, microarray, makes it feasible to obtain quantitative measurements of expression of thousands of genes present in a biological sample simultaneously. Genome-wide expression data generated from the technology are promising to uncover the implicit, previously unknown and potentially biology knowledge. A major challenge in this area is to develop bioinformatics tools for data collection and analysis.In this dissertation several problems about microarray data mining techniques are investigated, which includes gene selection, tissue classification and genetic network construction using gene expression data. The main contributions of this dissertation are summarized as below:Gene set of interest typically selected by usual ranking methods from microarray data will contain many highly correlated genes. This situation will degrade the performance of classifiers. For filtering these redundant genes (features), an unsupervised feature selection algorithm was proposed. The task of the algorithm involves two steps, namely, partitioning the original feature set into a number of homogeneous subsets (clusters) and selecting a representative feature from each such cluster. Partitioning of the features is done based on k-NN (k nearest neighbor) principals using the pairwise feature correlation measures. This method dose not need to specify the optimal number of clusters in advance and its computational complexity is low. Real biological data experiments have shown that this algorithm will significantly increase the classification accuracy of the existing classifiers.Accurate supervised classification of tissue samples in use of large-scale gene expression data presents major challenges due to the number of genes far exceeding the number of samples. Thus, a classification method using artificial neural network ensembles was proposed. In this method, significant genes for classification were selected by Wilcoxon test. Each member of neural network ensembles is trained by different datasets generated by convex pseudo-data methods. The predictions of those individual networks were combined by simple average method. Real biological data experiments have shown that this classification method outperformed than single neural networks, 1-nearest-neighbor classifiers and decision trees.A Bayesian network is a graphical model of joint multivariate probability distributions that captures properties of conditional independence between variables. Such models are attractive for their ability to describe complex stochastic processes of gene expression. We compared the results of using hill-climbing method and Markov chain Monte Carlo method to learning Bayesian networks from simulated microrray data. Our analysis suggests that MCMC performed better than hill-climbing method. However, we find Bayesian network is at chance for determining the existence of a regulatory connection between gene pairs.There is great potential for mining microarray databases to discover causal relationships in the gene-regulation pathway. A constrained-based causal discovery method was presented to search for the underlying causal relationships between genes. The search uses published data set from Hughes et al. of 300 expression profiles for yeast. Using this method, a number of causal relationships were found. A cursory analysis shows some of these relationships make sense biologically sensible, others suggesting new hypothesis that may deserve further investigation. The results indicate that the approach proposed here is both computationally feasible and successful in identified interesting causal structures.
Keywords/Search Tags:microarrays, data mining, unsupervised learning, gene selection, nerual network ensembles, supervised classification, genetic network, Bayesian networks, causal discovery
PDF Full Text Request
Related items