Font Size: a A A

Study And Implication On A Feature Extraction Model Of Data Mining

Posted on:2012-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2178330335968847Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Feature extraction is an important process in pattern recognition, and now, it also has a wide range of applications in the field of data mining. For the high-dimension data, feature extraction can effectively reduce the dimensions of the data and the size of the algorithm. Compare to the Principal Component Analysis, rough sets and other data-dimension reduction methods, feature extraction is more systematic, and also more dependent on specific applications, based on above, the results have more reference value. Support vector machine has applied to feature extraction, you can combine the strengths of both to optimize the process of feature extraction, and it is an exploration in feature extraction field.In this paper, we are making some exploration and research on the problem of most optimal feature subset selection. Reference the concept of the quartile in the classical statistical theory, we introduce into the feature selection model established. What's more, we established another feature selection model based on the concept of relative entropy in the information theory. In the process of selecting the most optimal Candidate set of features, this feature selection was in accordance with quartiles model and the relative entropy feature selection model, and established the discriminant function. To the next, to find sequence number of attributes under different candidate subset of the best features making use of floating forward algorithm. Finally, support vector machine method was used to make sure the number of different properties which including in the characteristics of the candidate subset. By learning to make the judgment, according to the error recognition rate we determine the subset of the features of candidate. In addition, take an experiment on illness and normal samples in colon cancer gene expression data sets to make sure that the feature extraction model is validated.This paper brings the feature extraction into the field of data mining. Feature extraction model is established based on the statistical properties of the sample and information science. The model is far away from the field of specific applications, and making it more universal.
Keywords/Search Tags:Quartile Model, Relative-Entropy Model, Support Vector Machine, Feature extraction, Data Mining
PDF Full Text Request
Related items