Font Size: a A A

Study On Drought-resistant Related Gene Mining In Arabidopsis Thaliana Based On Microarray Data

Posted on:2018-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:N TangFull Text:PDF
GTID:2310330566463714Subject:Agricultural Extension
Abstract/Summary:PDF Full Text Request
The lacking of water in plants growth can cause a huge scale of withered and died,therefore,under the arid conditions,it's important to find drought-resistance key genes to improve plant survival ability.And the rapidly development of gene chip technology provides a new technological platform for drought resistance research.However,due to the complex interaction of genes,digging the drought resistance gene only by employing the traditional statistical method and feature selection has limits,so based on molecular biology network to analyze and dig the high-throughput genes becomes the research hotspots in Bioinformatics.In view of the characteristics of Arabidopsis gene expression data,this essay developing a research by constructing weighted gene co-expression network analysis?digging of Arabidopsis drought-resistance gene and It's gene functional analysis.The main results are as follows:(1)Extraction of the expression difference genes based on the maximum information factor(MIC).The dimension of Arabidopsis gene expression profile data reaches to 20000 and conducting the pretreatment to filter the differentially expressed genes is a key part for construction of gene co-expression network.This essay introduces the two variables of both universality and fairness maximum information coefficient of correlation algorithm,calculating the MIC value of each gene expression value and sample phenotype,then,obtaining the importance sequencing of all genes based on the size of MIC,finally,according to the construction scale of gene co-expression network to extract the first N as differentially expressed genes.On account of SVM and first i(i=1,2……,s)Sorting genes to conduct 10 times cross validation for data set respectively.The results shows that under the circumstance of redundancy exists,the differential gene subset may still has a high classification accuracy,this paper verified the rationality of selected genes from the machine learning perspective.(2)Digging the drought-resistance related gene module based on the weighted gene co-expression network analysis(WGCNA).Gene co-expression network is a molecular biology network that constructed on the premise of similarity of gene expression,in this molecular biology network,dense connected subgraphs often have specific biological functions.Based on WGCNA analysis algorithm,we built a differentially co-expression network for Arabidopsis and obtained Gene modules with different functions by using hierarchical clustering algorithm.Further,Through the correlation coefficient between the characteristic value of the gene module and the sample phenotype as well as the significance of gene modules to select the high drought-resistance Arabidopsis gene module.Under the analysis of Chip data GSE27548 and GSE10670 of Arabidopsis gene module,we acquired 3 drought-resistance related gene modules respectively.(3)The excavation of Arabidopsis drought resistance genes.Based on the Protein interaction network analysis tool STRING,this essay analyzed the biological function of acquired drought-resistance gene module.And Find the 20 and 13 genes are related to water stress response from two experimental data sets for drought resistance related gene modules.The results show that our method can effectively excavate the genetic module and key gene of biological significance,and provide a new perspective for the study of drought-resistance of plants.
Keywords/Search Tags:Arabidopsis thaliana, microarray data, maximal information coefficient, weighted gene co-expression network analysis, drought-resistant gene
PDF Full Text Request
Related items