Font Size: a A A

Research On Several Critical Issues Of DNA Microarray Design And Data Analysis

Posted on:2010-09-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y B WuFull Text:PDF
GTID:1118360308985665Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The application of DNA microarray technology has revolutionized the research in modern biology. The development of DNA microarray technology is the result of multidisciplinarity. Several critical issues in DNA microarray, such as design and data analysis, must be completed with the support of bioinformatics research. The dissertation focuses on design of 16S rRNA-based oligonucleotide array for large scale bacteria detection and analysis of microarray gene expression dataset.Revealing biodiversity in microbial communities is essential in metagenomics researches. The biotechniques developed for large scale phylogenetic identification of bacteria is of importance in analysis of unknown environmental samples. And it can be applied to analysis of microbial communities and environmental supervision of biological threat. With thousands of 16S rRNA gene sequences available, and advancements in oligonucleotide microarray technology, design of 16S rRNA-based oligonucleotide array in large scale bacteria detection for analysis of microorganisms in an unknown environmental sample consisting of hundreds of species may be possible. The critical issue of design of 16S rRNA-based oligonucleotide array in large scale bacteria detection is to find optimized probes. Firstly, optimization of probe design for array-based experiments requires improved power of predictability of oligonucleotide hybridization behavior. The thermodynamic properties of nucleic acid duplex formation and dissociation in solution have been well established. However, duplex formation using surface-immobilized DNA oligonucleotides is less well understood, presumably due to the complex factors affecting the kinetics and thermodynamics of target capture. Statistical analysis of large sets of hybridization data reveals that the negative effect of surface-immobilization can be reduced by subtraction of the hybridization free energy of PM (perfect match) and MM (mismatch) oligo-target duplexes. It is helpful for discrimination of specific and non-specific hybridization to design positive controls for each probe, and this can be implemented on the base of hybridization behavior prediction.All target sequences are clustered into several groups based on taxonomy in design of 16S rRNA-based oligonucleotide array for large scale bacteria detection. There exist multiple copies of 16S rRNA gene in a taxonomic unit. The concept of cluster- or group-specific probe should be introduced. Many of the existing strategies developed for group-specific oligonucleotide probe design are dependent on the result of global multiple sequences alignment, which is a time-consuming task. We present a novel program named OligoSampling that uses MCMC method to design group-specific oligonucleotide probes. Our method does not need to globally align target sequences. Furthermore, OligoSampling provides more flexibility and higher speed than other software programs based on global multiple sequences alignment.To design the 16S rRNA-based oligonucleotide array for large scale bacteria detection, it is not enough to design group-specific probes for bacterial taxonomic units. For groups of target sequences assembled based on taxonomy, target sequences of each group are homologous but not identical. Finding a unique group-specific probe that can specifically detect all target sequences in a group is often difficult. Hence, it is a cute trade-off to design non-unique probes. Each probe can specifically detect target sequences of a different subgroup. Combination of these multiple probes (identification based on disjunctive inference, any one of the probes exhibit positive signal for target group identification) can achieve higher coverage. However, it is a time-consuming task to evaluate all possible combinations. We presented a feasible algorithm using relative entropy and genetic algorithm (GA) to design group-specific non-unique probes. This scheme has been applied to the design of 16S rRNA-based oligonucleotide array in large scale bacteria detection. The results demonstrate that the designed 16S rRNA-based probe sets have high coverage and low cross-hybridization.We found that there was considerable risk that'false'identities occur within 16S rRNA gene copies of unrelated microorganisms resulting from multiple 16S rRNA gene mutations during the course of evolution. At meantime, we believe that it is highly unlikely that'false'identities evolved at multiple 16S rRNA sites in phylogenetically distant microorganisms. Based on this fact, we also proposed an identification scheme based on conjunctive inference (all probes exhibit positive signal for target taxon identification). We applied the OligoSampling developed in this dissertation to design group-specific probe candidates, and combined multiple probe candidates based on conjunctive inference to form an identification unit. And then multiple identification units were combined based on disjunctive inference to identify target group. The results demonstrate that combination of multiple probes in this way can improve coverage and specificity.There are two main steps in analysis of microarray gene expression data: normalization and identification of differentially expressed genes. Differentially expressed genes have negative impact on normalization, especially in the condition that the number of over-expressed genes and the number of under-expressed genes differ a lot. Furthermore, imprecise normalization can lead to failure in identification of differentially expressed genes. As a two-step statistical procedure, normalization or identification of differentially expressed genes can bring cumulating errors to each other. We proposed a new iterative reselection algorithm for outlier removal and applied this approach to normalization of microarray gene expression data. Simulated and real datasets were analyzed. Results demonstrate that our approach can eliminate the impact of outliers in an iterative reselection process, lead to significant improvement of the precision of normalization. As a result, candidates for differential expression can be efficiently identified simultaneously. Especially, based on normalization by using our method, we achieved some new biological explainations differing from Gasch's original analysis on the same cDNA microarray datasets obtained in a study of transcriptional response when amino acid starvation was applied to Saccharomyces cerevisiae. For genes involving in carbohydrate metabolism, we found that the induction of synthetic enzymes is prior to the induction of catabolic enzymes, instead of simultaneous induction.
Keywords/Search Tags:DNA microarray, 16S rRNA, probe design, gene expression, normalization, outlier removal
PDF Full Text Request
Related items