Font Size: a A A

A Method Study Of Classification And Feature Selection Based On Gene Expression Data

Posted on:2017-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:S L WangFull Text:PDF
GTID:2310330488496089Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the field of bioinformatics,DNA microarray technology is a great landmark technical breakthrough.With the deepening of research,it has been widely used in many fields such as pharmaceutical research,gene sequencing.It has very high application value and broad development prospects.However,in the practical application,the genome that DNA microarray technology study is getting a bigger and bigger scale and a higher and higher feature dimension.This eventually leads to a situation that the gene expression data not only gets high dimension and small sample,but also contains a large number of redundant gene and noise gene which has no or a small effect on the sample classification.These characteristics of the gene expression data will improve the time and space complexity of machine learning,and lower the classification accuracy.It can finally increase the disease diagnosis cost and reduce the disease prediction accuracy.Therefore,in order to improve the classification accuracy,this article will focus on the study from two aspects which include improving the classification algorithm and proposing effective feature selection method,through which it can select the key genes to eliminate the redundant and noise gene,and reduce the feature gene dimension,and improve the machine learning efficiency.The main contents are as follows:(1)Regularized extreme learning machine(RELM)is proposed on the basis of extreme learning machine(ELM)and had many advantages,such as being easy-to-use,high classification accuracy,good generalization ability.However,the input layer weights and hidden layer bias of RELM are given randomly which can affect the stability of RELM.In addition,RELM needs to set lots of layer nodes in order to obtain relatively ideal classification accuracy.Aiming at this problem,this article proposed an improved particle swarm RELM(PSO-RELM)which brought the initial input layer weights and hidden layer bias of RELM into particle swarm optimization(PSO)as particles,and optimized them by analyzing the theory of PSO.Simulation results on the UCI datasets show that PSO-RELM has betterclassification accuracy and stability compared with BP neural network,support vector machine(SVM)and RELM.(2)Combining mutual information maximization(MIM)with adaptive genetic algorithm(AGA),article proposed a feature selection method(MIMAGA-Selectio n)which selected ELM as classifier to calculate the classification accuracy.The method first filtered and grouped the source dataset to form a primary gene subset according to MIM which was produced by each gene and different categories.Then it used AGA,which took sample classification accuracy as fitness function,to optimize the primary gene subset and ultimately got an optimal gene subset.The experiments on three standard gene expression datasets show that the method can effectively eliminate the redundant genes and noise genes,and significantly improve the classification accuracy.
Keywords/Search Tags:gene expression data, regularized extreme learning machine, particle swarm optimization, mutual information maximization, adaptive genetic algorithm
PDF Full Text Request
Related items