Research On Feature Selection Algorithm Base On Gene Expression Data

Posted on:2012-02-21

Degree:Master

Type:Thesis

Country:China

Candidate:C Zhao

Full Text:PDF

GTID:2178330338490892

Subject:Biomedical engineering

Abstract/Summary:

PDF Full Text Request

Bioinformatics data is one type that come from life scientific research, has high-dimensional, small sample size and other notable features, in order to analyze such data effectively, feature selection and reduction are both the essential step. Especially gene expression data, compared with other types of data, it has more dimensions, faster growth, contains higher amount of biological information and knowledge, so, to process such data, the past feature selection algorithm can't meet the need. Therefore, more efficient feature selection algorithm has been the research focus all the time. This paper is based on the gene expression data analysis requirements, closely linked to its high-dimensional features and small samples, give a new feature selection algorithms to process this special data.First, defining a discriminate operator based on the relevance of features and label, as well as the relevance between each of the features, use this operator we can express relevance and redundant very well, then calculate the scores of each feature, sequence all of scores last. It is what we call the maximum relevance minimum redundant feature selection algorithm.Second, consider the degree of specificity between features redundant and relevance, weighted maximum relevance minimum redundant feature selection algorithm is given, this improved algorithm further improve the effectiveness of the original algorithm.Finally, consider that conventional feature selection algorithm is difficult to determine the number of optimal subsets, a feature subset evaluation criterion based on Neuro-fuzzy is given, and this function could confirm the best features number very well.Experimental data were after mining pretreated mice gene expression data come from Leiden University in Holland, and classic leukemia dataset and colon cancer dataset. From algorithm complexity to accuracy rate, experimental results show that the algorithm performance well.

Keywords/Search Tags:

Feature selection, Mutual information, Feature subset evaluation, Bioinformatics, Gene expression data

PDF Full Text Request

Related items

1	Research On Several Key Technologies Of Gene Expression Data Mining
2	Study On SVMs-based Classification Of Gene Expression Data
3	Feature Selection Algorithms For High-throughput Data
4	Integrated feature subset selection/extraction with applications in bioinformatics
5	Research On Information Gene Selection Algorithm Based On Mutual Information
6	Research On Gene Selection Based On Max-Relevance And Min-Redundancy Feature Selection Algorithm
7	Design Of Stream Feature Selection Algorithm And Its Application In Gene Expression Data
8	SVM Based Research On Feature Selection Method For Gene Expression Data
9	Study On Feature Selection Method For Classification Of Gene Expression Data
10	Research On Feature Selection Algorithm Based On Mutual Information