Font Size: a A A

Research On Differential Gene Expression Analysis Algorithms

Posted on:2013-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y TianFull Text:PDF
GTID:2218330371983554Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the completion of the human genome project, genomic research came intothe post-genome era. During this period, microarray technology appeared. One of themain objectives in the analysis of microarray experiments is the identification ofgenes that are differentially expressed under two experimental conditions. With thematuration of microarray technology, more and more microarray data arise. Manyanalysis methods for identifying differentially expressed genes were developed.The use of microarray technology makes it possible to study hundreds ofthousands of gene expression at the genomic level. Differential gene expressionalgorithms not only identify differentially expressed genes, but also find the exactdifferences in expressed genes, narrow the scope of the study, improve efficiency, andprovide accurate data for the next step of biological analysis. At the same time theycan be applied to cancer diagnosis, analysis of the data of gene expression profiles,research on gene expression biomarkers and drug targets, and other aspects.The earliest method for identification differential gene expression is Fold, whichcalculates the fold-change of both averages of logged gene expression in two differentconditions. Fold is simple, but its result is so intuitive. Therefore it is the mostextensive method for identification differential gene expression. As Fold is lack ofstatistical significance, and the threshold is fixed, biologists proposed an algorithmbased on statistical t model, which was T-test method. Statistical t model has inherenthigh false-positive caused by the small variance. To avoid this problem, StanfordUniversity Tusher et al improved T-test to SAM (Significance Analysis ofMicroarrays) algorithm by a constant added on the variance, and proposed arandom perturbation to further improve the algorithm accuracy.With increasing in gene expression data, the algorithm for a single dataset hasfailed to meet the needs of multiple datasets analysis, so meta-analysis era is coming.Choi et al proposed TM (T-based Meta-analysis). Simultaneously, Hong broughtRPM (Rank Products Meta-analysis).At first, we compared three analysis methods which are SAM, TM and RPM. Wefound out strengths and weaknesses of the three analysis methods with respected to three measures, which we referred to as efficiency, stringency, and ability to handleheterogeneity. TM is more efficient than SAM and RPM. In addition, RPM is thebest analysis method of the three with other two measures. We concluded thatmeta-analysis is a powerful tool for identifying differentially expressed genes.Secondly, we have implemented TM with several datasets of different studies, inwhich we presented a meta-analysis tool to identify differentially expressed genes.Experimental results showed that: TM can identify the differential expressed genesaccurately, and handle the heterogeneity between studies effectively. Thirdly, wemodified the rank products meta-analysis approach to obtain an improved model foridentifying different gene expression. The new model, grouping rank productapproach, adds competitive classification of samples to group datasets before thecomputation of the fold changes. We used the grouping rank product approach on twosimulated datasets and two breast datasets and the results showed that the groupingrank product approach is not only as accurate as the rank products meta-analysisapproach, but also more computational efficiently in identifying differentiallyexpressed genes. Fourthly, we describe fold-based meta-analysis, a simple, yetpractical, new method to identify differentially expressed genes in microarray dataacross different datasets in same or different platforms. The method is implemented inthe VBA(Visual Basic for Applications) as an add-in of Microsoft Excel2007,whose name is FMT(Fold-based Meta-analysis Tool). We apply FMT to publishedtwo Arabidopsis datasets within one same perform and three Mouse databases withinthree different performs. Then analyze the Gene Ontology terms or metabolicpathways of the most differentially expressed genes, respectively. The results showthat the agreement of differentially expressed genes is high, and many of them arecrucial and functional. In summary, our method is fast and practical. Innovations:propose the concept of IDE based on achieving TM; propose GRP algorithm byimproving RPM; propose FM.Obviously, the above-mentioned analysis algorithms are all based on statistics.Can we develop a new non-statistical algorithm? The next objective of my study is todevelop a non-statistical analysis algorithm, in essence, to improve the indexes andprovide better analysis of microarray data.
Keywords/Search Tags:Microarray technology, Meta-analysis, Differential gene expression
PDF Full Text Request
Related items