Font Size: a A A

Research On Disease-Resistant Genes Prediction In Rice Based On Hybrid Feature Selection

Posted on:2018-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y XiangFull Text:PDF
GTID:2393330566463718Subject:Agricultural Extension
Abstract/Summary:PDF Full Text Request
Rice is one of the most important crops in China,the invasion of various diseases and pests has seriously affected the quality and yield of rice.It is significant to research the disease resistance of rice.With the rapid development of gene chip technology,the use of machine learning methods to analyze the gene expression data of rice,and dig out the disease resistance genes from rice,has become a new research field of rice disease resistance.The data of rice gene chips related to pests and diseases are few,the dimensions are high,the noise is large and the redundancy is high.It is difficult to make accurate prediction of disease resistance genes.In this paper,According to the features of gene expression data,around the selection of feature selection model,rice disease resistance gene prediction and disease resistance gene functional analysis of the research,the main results are as follows:(1)mRMR-Relief-SVM hybrid feature selection model with mRMR algorithm and improved Relief algorithm as feature preselection filter and SVM classification accuracy as evaluation function has been established.Firstly,importance ranking of genes was done by means of minimum-redundancy-maximum-relevance feature selection approach(mRMR).The subset of feature genes A was obtained by sequentially leading the ranked genes into for cross-test of precision and elimination of redundancy based on the training set by support vector machine(SVM)classifier.At the same time,the improved Relief feature selection approach was used to obtain the ranked genes above the designated weight.The subset of feature genes B was achieved by sequentially leading the ranked genes into for cross-test of precision and elimination of redundancy based on the training set by SVM.Finally,the final subset of feature genes C was achieved by combining the two subsets of feature genes A and B.(2)Prediction of rice resistance genes based on mRMR-Relief-SVM hybrid model.In the GEO data set,rice tungro gene chip data GSE16142 and rice stripe gene chip data GSE11025 were selected for analysis.The original two-category data were divided into the training set and the test set with balanced positive and negative samples according to 2:1ratio in consideration of the robustness of the model.Three random selections of each dataset could be used to obtain 3 groups of training set and test set.The results of applying m RMR-Relief-SVM hybrid model to the independent The subset of selected feature genes in the new model has achieved higher prediction accuracy in multiple classifiers.(3)Biological significance analysis of feature genes.For the selected feature genes,DAVID bioinformatics database was used for biological significances analysis,from which among the feature genes of rice tungro,there were six genes related to the disease resistance while eight genes associated with the disease resistance regarding the feature gene of rice stripe.Additionally,the molecular interactive network analysis software Cytoscape was used to construct a gene interaction network diagram with Pearson's correlation coefficient for the different feature gene subsets obtained from different training sets of the same disease.The results showed that there was a strong correlation between different groups of feature genes,that is,co-expression could be seen among different genes between groups.
Keywords/Search Tags:Rice, gene chip, resistance gene prediction, feature selection, support vector machine
PDF Full Text Request
Related items