Research On Feature Gene Selection Method Based On Information Fusion

Posted on:2013-12-07

Degree:Master

Type:Thesis

Country:China

Candidate:X F Gou

Full Text:PDF

GTID:2230330395984903

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Typically, feature gene selection aims to find a compact feature subset used toconstruct a pattern classifier with reduced complexity, in order to improve theclassification performance. It is not only for us to find disease-related genes andimprove classification of tumors, but also reduces the cost of the clinical diagnosisof tumor type.An effective feature gene selection method should not only be able toproduce a solution with better classification performance, but also it should havegood robustness. Gene expression microarray data with characteristics ofsignificantly less sample and high dimension, some related studies confirm this kindof dataset more easily lead to poor robustness of feature selection methods. However,the existing feature selection methods are mostly concerned about the classificationperformance of the algorithm, easy to overlook the robustness of the algorithm.The main research work is as follows:A feature gene selection method based on prior information fusion. When thenumber of extracted features is small, the classification performance is high, butwhen the number of features exceeds a certain threshold, the classificationperformance gets lower. Based on this assumption, we first remove noise genes aswell as unrelated genes, and then use a heuristic breadth-first search algorithm forfeature gene selection. At the same time, we propose using multiple testingprocedures (MTP) to fuse the priori information, in order to make full use of theclinical and reliable information, so that it further improve the accuracy of tumorsubtype classification. Experimental results show that our method can select a morecompact feature gene subset, and it has a better classification performance.A feature gene selection method based on multicriterion fusion. Geneexpression data with the characteristics of high-dimensional and small sample size,likely to cause the poor robustness of feature gene selection algorithm. If a featuregene selection algorithm lacks robustness, it might produce unrepeatable resultseven only a few samples are added to or deleted from the training dataset. Evenwithout perturbation of training data different feature selection algorithms usuallyproduce different selection results. The inconsistent gene selection results thusproduced could cause confusions to biological researchers and result in loss ofconfidence in clinical diagnosis. In this article, we use the property that the various basic filter criteria portray different focus of the sample data distribution. At thesame time, in order to avoid that multi-feature fusion is difficult to portray thecomplexity of the sample data distribution resulting in lower classification accuracy,we propose the integration of multiple criteria and priori information scoring, andthen use a forward-backward algorithm to eliminate half of features at each iterationfor feature gene selection. Experiments show that this method is effective topreserve the genes wrongly eliminated by the bias of a single criterion, the methodhas similar classification performance with other methods, and it has betterrobustness.

Keywords/Search Tags:

Gene chip, Gene expression profile, Gene selection, Robustness

PDF Full Text Request

Related items

1	Research On Hybrid Gene Selection Method Based On Clustering
2	The Research On Feature Selection And Classification Method Using Gene Expression Profile Data
3	The Study Of Gene Expression Profile And The Preparation Of DNA Microarray Of K562 Cells
4	Sample Class Discovery And Sample Class Prediction Based On Gene Expression Profile
5	Study Of Feature Gene Analysis Based On The Network Module
6	Methods And Programs For Analyzing Microarray Data And Detecting Horizontal Gene Transfer: Phylogenetic Approaches
7	Research On 2D Spatial Gene Selection Algorithm Based On Unbalanced Gene Data
8	Preparation Of SH-SY5Y Cell CDNA Microarray And Study On SH-SY5Y Cell Gene Expression Profile
9	Fusion Gene Chip And DNA Sequencing Data To Research Differences In Gene Expression
10	Research And Applications Of Gene Chip On Detecting Genes Involved In Aflatoxin Biosynthesis