Font Size: a A A

Research On Gene Expression Data Analysis Method And Its Application

Posted on:2014-02-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:J N WuFull Text:PDF
GTID:1220330395996608Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Microarray technology is a major breakthrough in the field of molecularbiology,it can simultaneously measure the expression levels of thousands of genesunder different environments and conditions in different samples. Gene expressiondata is generated by DNA microarray technology, which reflects the abundance ofgene transcription product mRNA. Mass information is implicit in it. It plays animportant role in biomedical research. You can obtain the physiological state of thecell, the gene expression regulatory information and gene function information inthese data. Analyze the activity of genes implied in these data and enhance theunderstanding of the phenomenon of life is the ultimate goal of bioinformaticsresearch.In order to mining the potential biological significance we need comprehensivecomputational intelligence algorithm, data modeling and mathematical statisticsmethod to analyze gene expression data. Different methods of data analysis willproduce different results,so it is very important to select the appropriate method. Westart our research work about gene expression data analysis method in the context ofbioinformatics and computer science.(1)We propose a workflow to analyze microarray data based on statisticallearning theory and clustering method. This workflow is composed of datapreprocessing and normalization, detecting differentially expressed genes, geneclustering and functional enrichment analysis. Through experimental analysis, we findthe result have good biological meanings and verify the effectiveness and feasibilityof the proposed method.(2) Two drawbacks will be generated if we use experimental methods to collecteffective and sufficient gene expression data in some research work, one is that itcosts too expensive and another is that operational complexity is too high. We proposean algorithm for generating simulated genetic data based on K-mediods. A concept ofCluster Channel is proposed in this algorithm and used to generate simulated data.The noise of origin data could be eliminated using the proposed method. Theexperimental results show reliability of simulated genetic data. SAM is used to analyze the simulated data and original data, we get a conclusion that the simulateddata can effectively validate differentially expressed gene detected algorithm.(3) One of the main objectives in the analysis of microarray data is theidentification of genes that are differentially expressed under different experimentalconditions. A main approach to such an analysis is to calculate a statistic for each geneand to rank the genes in accordance with the calculated values. A large ranking valueis evidence of a differential expression. Here, we present a novel technique namedMatrix Rank Product (MRP) for identifying differentially expressed genes thatoriginate from a simple statistical ranking model. The algorithm can deal with the rawdata of the microarray directly, as a result it can eliminate the interference of differentdata preprocessing algorithm. At the same time, our method was designed for accurategene ranking,by calculating the microarray data matrix of overall sorting.(4) The past methods of Differentially Expressed Genes (DEGs) analysis can’t beused to deal with heterogeneous data sets. The analysis results of the past methods areinconsistent usually. A new method which is called Rank Standard DeviationMeta-analysis (RSDM) was proposed for detecting DEGs in this paper. The methodbases on the meta-analysis and rank standard deviation filtering technology. Theproposed method can detect true differentially expressed genes and filter pseudodifferentially expressed genes out from experimental datasets. The experimentalresults show the highly efficiency of the proposed method.(5) We developed an efficient algorithm to identify network modules inprotein-protein interaction (PPI) networks based on gene expression profiles. Thefound modules in a given PPI network are suggestive of a complex or a distinctfunctional pathway. We use a seed-expand method to search network modules andevaluate them by using GO annotations and KEGG Pathway. The results show that theidentified modules are statistically significant in terms of GO annotations and KEGGPathway Enrichment.The researches on the algorithm of identifying differentially expressed genes andits application in PPI have important academic and application value. Furthermore, itprovided significant method and strategy for the analysis of gene expression data.
Keywords/Search Tags:Bioinformatics, gene expression data, microarray data analysis, meta-analysis, identification of differentially expressed genes, gene clustering, enrichment analysis, data normalization, simulation data, module identification
PDF Full Text Request
Related items