Font Size: a A A

Microarray Data Analysis

Posted on:2006-06-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:X P FuFull Text:PDF
GTID:1110360212484485Subject:Genetics
Abstract/Summary:PDF Full Text Request
Microarray analysis can be used to monitor the expression levels of thousands to tens of thousands of genes in a single assay, which is the most widely used for the study of gene expression patterns on a genomic scale. As more and more researchers jump on the microarray bandwagon, however, it has become increasingly clear that simply generating the data is not enough; one must be able to extract from it meaningful information about the system being suited. The microarray data analysis can be simply summed up to three steps: data pre-processing, data analysing, and data further analysing. Then information of the representative genes is integrated and the relation between gene expression pattern and biological function is found.In this thesis, the three steps are respectively described in brief. We focus on three novel approaches developed by ourselves: a threshold determining approach for weak signal based on the accumulated distribution, a clustering approach based on dominant sets finding, a dimension reducing approach based on local tangent space alignment(LTSA). We also focus on the meta analysis in microarray data of diverse stress on Saccharomyces cerevisiae from different labs to find stress unchangeably expressed genes(SUEG), and the regulatory elements analyzed by AlignACE tool.In microarray experiments, a lot of spots with low signal intensities are vulnerable to background and noise biases. It is important to determine an effective threshold, with which one can clearly distinguish low abundance genes from background. A new threshold determining method for gene expression intensity based on the accumulated distribution is proposed in the thesis. Compared with previous methods, it takes the overall signal intensity and background into consideration. Using this method, the reproducibility and reliability of microarray experiments are greatly increased, and more valuable genes with significant biological function are preserved for further analysis.To overcome the pitfalls in commonly used linear dimension reduction methods, we introduce a new nonlinear dimension reduction method: LTSA in dealing with the difficulty of analyzing high-dimensional, nonlinear microarray data. We analyze the applicability and the construction error of LTSA. The experiments show good visualization performance and the clustering correctness doesn't decline after dimension reduction. And the method showsadvantage on determining the reduced dimension than PCA algorithm.To deal with three issues that have bedeviled clustering, some dominant sets being statistically determined in a significance level, predefining cluster structure being not required, and the quality of a dominant set being ensured, a novel, iterative clustering approach is proposed. The approach sorts the original data by dominant set so that genes with high similarities would be rearranged together and then finds a cluster by some criterion. The new clustering approach is evaluated on several aspects. Both of the theoretical analysis and the experiment results of the approach confirm that it is very applicable, efficient and has good ability to resist noise. We have also applied this approach to analyse published data of yeast cell cycle gene expression and find some biologically meaningful gene groups to be dug out. Furthermore, this approach is a potentially good tool to search for putative regulatory signals.In this thesis, a meta statistical model is built to analysis microarray data of diverse stress on Saccharomyces cerevisiae from different labs to find stress unchangeably expressed genes(SUEG) with very low false positive rate and false negative rate. The characteristic of unchangeable expression of SUEGs is confirmed from two aspects of the SAGE and the intensities in these microarray data. The biological meaning of the SUEGs is analyzed from biological process, gene function and cellular localization. And some regulatory elements are identified with AlignACE tool. It is concluded that the approach of meta analysis can obtain good results and provide a new idea to integrate microarray data from different source, moreover, the SUEGs and the elements may provide some clues to research in steadily expressed genes.
Keywords/Search Tags:microarray, weak signal, accumulation distribution, clustering, dominant set, nonlinear dimension reduction, Local Tangent Space Alignment(LTSA), regulatory element, meta analysis, Stress Unchangeably Expressed Genes(SUEG)
PDF Full Text Request
Related items