A Method Study Of Classification And Feature Selection Based On Gene Expression Data

Posted on:2017-05-06

Degree:Master

Type:Thesis

Country:China

Candidate:S L Wang

Full Text:PDF

GTID:2310330488496089

Subject:Computer application technology

Abstract/Summary:

In the field of bioinformatics,DNA microarray technology is a great landmark technical breakthrough.With the deepening of research,it has been widely used in many fields such as pharmaceutical research,gene sequencing.It has very high application value and broad development prospects.However,in the practical application,the genome that DNA microarray technology study is getting a bigger and bigger scale and a higher and higher feature dimension.This eventually leads to a situation that the gene expression data not only gets high dimension and small sample,but also contains a large number of redundant gene and noise gene which has no or a small effect on the sample classification.These characteristics of the gene expression data will improve the time and space complexity of machine learning,and lower the classification accuracy.It can finally increase the disease diagnosis cost and reduce the disease prediction accuracy.Therefore,in order to improve the classification accuracy,this article will focus on the study from two aspects which include improving the classification algorithm and proposing effective feature selection method,through which it can select the key genes to eliminate the redundant and noise gene,and reduce the feature gene dimension,and improve the machine learning efficiency.The main contents are as follows:(1)Regularized extreme learning machine(RELM)is proposed on the basis of extreme learning machine(ELM)and had many advantages,such as being easy-to-use,high classification accuracy,good generalization ability.However,the input layer weights and hidden layer bias of RELM are given randomly which can affect the stability of RELM.In addition,RELM needs to set lots of layer nodes in order to obtain relatively ideal classification accuracy.Aiming at this problem,this article proposed an improved particle swarm RELM(PSO-RELM)which brought the initial input layer weights and hidden layer bias of RELM into particle swarm optimization(PSO)as particles,and optimized them by analyzing the theory of PSO.Simulation results on the UCI datasets show that PSO-RELM has betterclassification accuracy and stability compared with BP neural network,support vector machine(SVM)and RELM.(2)Combining mutual information maximization(MIM)with adaptive genetic algorithm(AGA),article proposed a feature selection method(MIMAGA-Selectio n)which selected ELM as classifier to calculate the classification accuracy.The method first filtered and grouped the source dataset to form a primary gene subset according to MIM which was produced by each gene and different categories.Then it used AGA,which took sample classification accuracy as fitness function,to optimize the primary gene subset and ultimately got an optimal gene subset.The experiments on three standard gene expression datasets show that the method can effectively eliminate the redundant genes and noise genes,and significantly improve the classification accuracy.

Keywords/Search Tags:

gene expression data, regularized extreme learning machine, particle swarm optimization, mutual information maximization, adaptive genetic algorithm

Related items

1	Gene Data Classification Research Based On The Improved Particle Swarm Optimization And Extreme Learning Machine
2	Key LncRNA Prediction In Gene Expression Data Based On Machine Learning
3	Study On Feature Selection And Classification Algorithm For Gene Expression Data
4	A Study And Implementation Of Processing Gene Expression Profile Based On Prior Information And Binary Particle Swarm Optimization
5	Study On Smart Prediction Method Of Slop Stability Based On Hybrid Kernel Extreme Learning Machine Trained And Optimized By Particle Swarm Optimization
6	Study On Precipitation Prediction Model Of Extreme Learning Machine Based On Intelligent Optimization Algorithm
7	Research On Particle Swarm Optimization Extreme Learning Machine And Its Application In Precipitation Forecast
8	Research On Gene Expression Data Clustering Algorithm Based On Particle Swarm Optimization
9	A Study Of Multi-swarm Particle Swarm Optimization Based On Chaotic Optimization And Its Applications
10	Research On Multi_Objective Optimization Algorithm For Biclustering In Microarry Gene Expression Data