Font Size: a A A

Research On Feature Selection Algorithm Base On Gene Expression Data

Posted on:2012-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhaoFull Text:PDF
GTID:2178330338490892Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Bioinformatics data is one type that come from life scientific research, has high-dimensional, small sample size and other notable features, in order to analyze such data effectively, feature selection and reduction are both the essential step. Especially gene expression data, compared with other types of data, it has more dimensions, faster growth, contains higher amount of biological information and knowledge, so, to process such data, the past feature selection algorithm can't meet the need. Therefore, more efficient feature selection algorithm has been the research focus all the time. This paper is based on the gene expression data analysis requirements, closely linked to its high-dimensional features and small samples, give a new feature selection algorithms to process this special data.First, defining a discriminate operator based on the relevance of features and label, as well as the relevance between each of the features, use this operator we can express relevance and redundant very well, then calculate the scores of each feature, sequence all of scores last. It is what we call the maximum relevance minimum redundant feature selection algorithm.Second, consider the degree of specificity between features redundant and relevance, weighted maximum relevance minimum redundant feature selection algorithm is given, this improved algorithm further improve the effectiveness of the original algorithm.Finally, consider that conventional feature selection algorithm is difficult to determine the number of optimal subsets, a feature subset evaluation criterion based on Neuro-fuzzy is given, and this function could confirm the best features number very well.Experimental data were after mining pretreated mice gene expression data come from Leiden University in Holland, and classic leukemia dataset and colon cancer dataset. From algorithm complexity to accuracy rate, experimental results show that the algorithm performance well.
Keywords/Search Tags:Feature selection, Mutual information, Feature subset evaluation, Bioinformatics, Gene expression data
PDF Full Text Request
Related items