Font Size: a A A

Research On Relevant Problems Of DNA Microarray Expression Data Analysis

Posted on:2008-08-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:L B QiuFull Text:PDF
GTID:1118360242498896Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
This dissertation refers to studies on DNA microarray expression data preprocessing techniques, classification and class discovery algorithms in cancer research and the gene regulation network modeling method. The main contents and contributions of the dissertation are summarized as follows:1) The research on method to normalize system bias for high-density oligonucleotide array gene expressionIn multiarray experiments, there is some system bias, which be contaminated by experimental factors such as spot location (often referred to as a print-tip effect), arrays, dyes, and various interactions of these effects. For comparable each other, it need to normalize the raw expression profile data. Normalization is the key step in low level processing. In fact, many normalization methods have been developed, i.e. scaling normalization, nonlinear normalization, quantile normalization and so on. New baseline normalization is presented. First, select the subset of probes, which have the min rank range; secondly, compute pseudo-baseline by Tukey biweight algorithm; finally, do nonlinear normalization on pseudo-baseline. Iterative strategy weakens the sensitivity of the baseline method to select baseline. With the standard test dataset, compare it with other methods. The results show that the novel method has better performances than others in several ways.2) The research on algorithms for missing value estimation of microarray expression dataIn microarray experiments, the missing value does exist and somewhat affects the stability and precision of the expression data analysis. Compared with increasing experiments, missing value estimation is preferred in reducing the influence of missing values on the post-processing. With the kernel weight based on similarly between target gene and sample genes, which localize missing value estimation, a new method based on weighted regression is presented. On the two real microarray expression datasets, the novel method is compared with several existing methods. Experimental results show that the novel method has better stability and precision than the existing methods that have been employed.3) The research on algorithms for cancer microarray expression classificationDNA microarray technology can measure the expression levels of thousands ofgenes simultaneously. It has become an important tool in cancer biological investigations. In combination with classification methods, microarray technology can be useful to support clinical management decisions for individual patients. Cancer microarray expression classification is a typical case that has high dimensions and small samples. In gene expression dataset, there are many genes that are redundant for cancer microarray expression classification. The most relevant gene selection is an important issue. A robust two-step approach is presented. For reducing the computation complexity, a gene pre-selection procedure by ReliefF is adopted to reduce the huge number of genes being considered. Secondly, the relevance vector machine and the support vector machine optimized by immune clonal algorithm are differently used on the gene subset for cancer microarray expression classification. On four real cancer microarray datasets, the new approach is compared to the several existing methods. The experimental results show that the proposed approach can achieve high classification accuracy and is more robust.4) The research on methods for class discovery of cancer microarray expressionCancer is a highly heterogeneous disease, and the different causes will lead to thesame phenotype. Based on clinical pathology, it is very difficult to find different classes of the cancer. DNA microarray technology provides a high-throughput tool that penetrates the occurrence and evolution of the cancer on the molecular level. The different classes of the cancer can be accurate discovered on microarray expression profiling. Many clustering methods have been widely used in the study to discover classes of the cancer. The support vector clustering is a bound-based clustering method that does well for irregular classes and can automatically find true classes. An algorithm to discover classes of the tumor is presented, which is based on the support vector clustering. There are a lot of redundancy gene expression profiles for class discovery of cancer. Therefore, the variance filtering selects a little of genes with the largest variance as characters for class discovery of cancer. Secondly, the support vector clustering is used to discover classes of cancer. On the two cancer microarray datasets, with the parameter sequence produced automatically, the presented method partitions the cancer samples on different fine level. The result shows that this method can more accurately discover classes of cancer samples and automatically find true class number of cancer samples.5) The research on modeling methods for the gene regulatory networksThe gene regulatory networks is not only a mechanism of the interaction between genes, and also includes the interaction of various regulatory factors, such as the regulation protein, siRNA and so on, which regulatory factors can not be measured directly. The state-space model is a special type of dynamic bayesian networks, on the assumption that the observed variables are dependent on the state variables that have the Markov dynamic characteristics. Therefore the state-space model can accurately describe the complex mechanism of the gene regulatory networks. Due to the complexity of computation, model-based modeling methods of gene regulatory networks are difficult to directly model greater gene regulatory networks. It is the typical sparse characteristics of gene regulatory networks that one gene expression was only controlled by a very small number of genes and regulatory factors, and its continuous expression profiles show a strong correlation. In view of the light characteristics of gene regulation, cluster genes by use of correlation clustering, and then model the mutual regulation of genes in one cluster with the state-space model. In order to get a sparse network, integrate with the conservative interaction between genes on the various levels of cluster number. On the human T-cell cycle expression data, the dissertation analyzes the reconstruction performance of the model's dynamic behavior. The result shows that with the increase of the number of clusters, decomposition-modeling can better respond to network reconstruction. Meanwhile, the dissertation establishes several sparse regulatory networks with different levels of light.
Keywords/Search Tags:Microarray, Gene Expression, System Bias correction, Missing Value Estimation, Clustering, Gene Regulatory Networks
PDF Full Text Request
Related items