Font Size: a A A

Research On Prediction Of Drought-Resistant Genes In Arabidopsis Thaliana Based On Microarray Data

Posted on:2013-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:F ZhangFull Text:PDF
GTID:2248330371983351Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Identifying genes with essential roles in resisting environmental stress rates highin agronomic importance. Although massive DNA microarray gene expression datahave been generated for plants, current computational approaches underutilize thesedata for studying genotype-trait relationships. Some advanced gene identificationmethods have been explored for human diseases, but typically these methods have notbeen converted into publicly available software tools and cannot be applied to plantsfor identifying genes with agronomic traits.Prediction of drought-resistant genes in Arabidopsis thaliana is an issue ondifferential expression, which can be solved through many feature selection methods.The traditional ways are T-test and Fold change. And recently Mean AbsoluteDeviation Variation (MADV) judging the significance of a differential expression genefrom the perspective of the amplitude of it changes has appeared. Moreover,SVM-RFE is another classical algorithm.In this study, we used22sets of Arabidopsis thaliana gene expression data fromGEO to predict the key genes involved in water tolerance. We applied T-test, Foldchange, MADV, improved MADV, modified SVM-RFE (Support VectorMachine-Recursive Feature Elimination) feature selection method for prediction.Then we analyzed and discussed each algorithm and the experiment results.After the experiments we analyzed the drawbacks of T-test and Fold change. Inview of the noise and outliers among the high-throughput microarray data, theapplication of MADV and IMADV can find the genes with big variation regardless ofthe interference value. The experiment proved IMADV was more effective comparedwith the traditional methods.To address small sample sizes, we developed a modified approach for SVM-RFEby using leave-one-out cross-validation, which overcame the difficulty of reflection ofthe complex machine learning approach. We predicted the top10genes as thedrought-resistant ones by the modified SVM-RFE, and verified them by biological evidences.By comparison of the above methods, the modified SVM-RFE algorithm has thebest performance in small sample sizes. Therefore, we developed a tool available ingene identification of agronomic traits based mainly on modified SVM-RFE andT-test. The analytical procedure integrated the whole process of microarray dataanalysis and established the bridge between the genotype and the phenotype. Thissoftware can be applied not only to the analysis of the microarray data, but also to theRNA sequence commonly. The software is freely available with source code athttp://ccst.jlu.edu.cn/JCSB/RFET/.In addition, we expanded our study to predict genes involved in watersusceptibility, the non-resistance genotype. Comparison analysis of multiple sets canbe achieved by joining the analysis of non-resistance data, through which we can findthe tuning genes whose adaptability to hydropenia is the result of irritable reactions toenvironmental changes. By discarding the tuning genes in the10drought-resistantgenes screened already, we ultimately found the7real drought-resistant genes. Thecomparison analysis made up the defect of the traditional analysis process onmicroarray differential expression, and made the prediction more accurately. We alsoanalyzed the top100genes in terms of the comparison analysis. From the top10totop100genes we found, the experiments results showed the effectiveness of themodified SVM-RFE again. Our study shows the software with modified SVM-RFEinvolved is a highly promising method in analyzing plant microarray data for studyinggenotype-phenotype relationships.
Keywords/Search Tags:Microarray, gene expression data, feature selection, SVM-RFE, differentialexpression genes, drought-resistant
PDF Full Text Request
Related items