Font Size: a A A

Research On Data Mining Methods Of Gene Expression Profile

Posted on:2017-02-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:T CheFull Text:PDF
GTID:1310330566955694Subject:Unknown
Abstract/Summary:PDF Full Text Request
DNA microarray(gene chip technology)is a major technological breakthrough in the field of molecular biology at the last century.It can simultaneously test thousands of genes in cells in a single experiment,so the original studies of individual genes go into the genomics era.The analysis and mining of gene expression profile data from microarray experiments can help people understand the cell growth and gene expression in different periods,measure changes in normal tissue and tumor tissue,measure changes before and after treatment,discovery drug,diagnose genetic disease,forecast disease,and figure out the mysteries of human biology.It is important for biological and biomedical research and development,and is currently one of the key hotspots and bioinformatics research.However,gene expression data have the characters of high-dimension,small samples,high noise,high redundancy and continuous,etc.It is a challenge for the traditional method of data mining.On the basis of combing,analyzing and summarizing the existing data mining method,this paper research the feature gene selection and data classification.The main contents and results are as follows:(1)Proposed a hybrid method of features gene selection based on optimal neighborhood mutual information.Firstly,all genes are sorted by using ReliefF algorithm,and the first k genes are selected as the primary subset of genes,so that eliminate noise and other inactive genes,reduct dimensional and improve the quality of data.Secondly,according to the influence of the neighborhood radius on the performance of neighborhood mutual information model,the differential evolution algorithm is used to optimize the neighborhood information radius.Finally,An improved neighbor mutual information model was designed to achieve the final gene selection based on the neighborhood information of the optimal radius and the forward greedy search strategy,it eliminated the noise and redundancy genes.The simulation results show that the proposed method is superior to ReliefF,Kruskalwallis,Gini Index,MI and NMI in the recognition accuracy and the number of genes.(2)Proposed a feature gene selection method based on improved harmony search algorithm.Firstly,the Kruskal-Wallis algorithm is used to select the genes in order to reduce the dimension of the search space of the harmony algorithm,and to guarantee the optimization precision and convergence speed of the harmony algorithm.Then,according to the deficiencies of harmony search algorithm,the optimal and the worst harmonics are used to carry on the evolutionary operation.At the same time,an improved harmony search algorithm is designed to realize the feature gene selection by integrating the updating method of the teaching and learning optimization algorithm.Simulation results show that the proposed method outperforms the HS and improved algorithms,such as IHS,EHS and GHS in terms of classification accuracy,time efficiency and stability.(3)Proposed an ensemble classification method based on improved rotation forest algorithm.Firstly,an improved information index to classification method is proposed to filter the gene to eliminate the noise gene,reduce the data dimension and improve the data quality.Then,aiming at the deficiency of rotation forest algorithm,an improved rotation forest algorithm is proposed on the basis of considering the difference and accuracy of base classifier and combining heterogeneous integration and sample disturbance,and is used to classify the samples.The simulation results show that the proposed method outperforms the rotation forest,improved rotation forest,bagging and adaboost in classification accuracy,stability and running time.(4)Proposed a selective ensemble classification method based on improved Teaching-Learning-based Optimization.Firstly,Multiple subsets of samples with large differences are generated by using bootstrap technique;Then,the double disturbance based on Kruskal-Wallis and neighborhood mutual information are used on each sample subset and improve diversity and precision of subsets.In the end,an improved TLBO is designed to realize the selective ensemble from two aspects of the "teaching" and "self-learning" process.Simulation results show that the proposed method is superior to ensemble methods(Bagging,Adaboost and Roation Forest)and selective ensemble methods based on TLBO and MTLBO in terms of classification accuracy,ensemble size,stability and reliability.
Keywords/Search Tags:Gene expression profiling, Data mining, Feature gene, Ensemble classification, Intelligent optimization algorithm
PDF Full Text Request
Related items