Font Size: a A A

Extended Rough Set Models Based On The Neighborhood And Their Applications In Gene Selection

Posted on:2013-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:L J ZhangFull Text:PDF
GTID:2210330374460444Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Rough set theory is an effective data analysis tool which can deal with uncertain, imprecise,incomplete and inconsistent data. However, in the practical applications, the classical rough set theorybased on strict equivalence relation has some limitations. Extended rough set theory is a current researchhot spot, which is more appropriate for the incomplete, symbol, and numeric or their mixture data. In thispaper, based on neighborhood relation, a series of extended rough set models are proposed, and then theattribute reduction algorithms based on extended rough set models are constructed. Through theoretical andexperimental analysis, the rationality of algorithms is verified. According to the data characteristics of thegene expression datasets, the attribute reduction algorithms of extended rough set models, combined withthe gene primary selection methods, are applied to feature gene selection for gene expression datasets.Through the contrast experiment, it is proved that the method of feature gene selection based on theproposed extended rough set model is effective. The main structure of this paper is organized as follows.On the basis of neighborhood relationship, several problems are found to be addressed such asneighborhood parameter lacking of theoretical basis and the consistent parameters tending to make errors.To solve these problems, the concept of quantitative series in quantification theory is introduced tochoosing neighborhood parameters. The relative neighborhood relationship is proposed according toquantitative series and different genes range, and then the relative neighborhood rough set model isconstructed. In incomplete hybrid decision system, a new kind of generalized neighborhood relationship isconstructed by combining with relative neighborhood relationship and tolerance relationship. Theinconsistent samples of covering granule based on the general neighborhood relationship are discussed, andthen the mutex relationship and its properties are defined. The mutex covering granule is made to bereflexive, symmetric and transitive by decomposing.Under the general neighborhood relationship, the conditional entropy used for incomplete hybriddecision system is defined on the basis of information entropy, and then the attribute significance based onconditional entropy is given. The attribute significance based on positive region and the attributesignificance based on conditional entropy are researched and analysed with comparison. It is proved thatthe attibute significance of the condition entropy contains that of the positive regions in this paper. Andthen the reduction algorithm based on conditional entropy of incomplete hybrid decision system is constructed.The data of gene expression datasets are incomplete, symbol, numeric or their mixture data. Theproposed attribute reduction algorithms based on extended rough set models are applied to feature geneselection for gene expression datasets. Then combining with gene primary selection strategy, the reductionof the redundant attributes is obtained by taking the advantage of rough set theory which does not needprior knowledge in attribute reduction. Ultimately the feature subset of genes is acquired. Throughexperiments on some different opening gene expression datasets, comparing the proposed method and thesimilar methods in aspects of time-complexity and feature gene number, the experiment results show thatthe method of feature gene selection based on the proposed extended rough set model is effective.
Keywords/Search Tags:rough set theory, neighborhood relation, conditional entropy, gene expression datasets, feature gene selection
PDF Full Text Request
Related items