Font Size: a A A

Study And Implementation On Feature Selection Algorithms In Large Data Sets

Posted on:2006-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y J LiuFull Text:PDF
GTID:2168360155458071Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is a rising database technique with the development of database and artificial intelligence in the recent several years. The objects it handles are a large amount of the ordinary business data, with the purpose of extracting some worthy knowledge or information from these data. Data mining algorithms generally have the certain request to the datasets, such as good integrality, less redundancy, small relevance between the attributes (features). However, the data in actual system always have incompletion, redundancy and illegibility, and seldom directly satisfy the request of data mining algorithms. Moreover, there are a lot of insignificant ingredients in massive actual data, which seriously affect the efficiency of the data mining algorithms, and the noisy data will result in the invalid induce. The data preprocessing have already become the key issues in the process of implementation of the data mining systems.The data preprocessing is an important part of data mining, and is absolutely necessary. As an important step of data preprocessing, feature selection has already become a very hot topic. Especially, to the large datasets composed of a large amount of records and a lot of irrelevant features with the data mining tasks at hand, the application of feature selection becomes more important.The theory of rough set is a mathematical tool for characterizing the imprecise, uncertainty and all kind of incomplete information. It can efficiently analyze and deal with all kinds of the underlying information, whatever imprecise, inconsistency and uncompletion. And it can find underlying knowledge, discovering the potential rule. In recent years, it is a hot topic that the research on the theory of rough set and its algorithm in the field of data mining. The reduction algorithm is one of key problems. Therefore, there are many investigations about reduction algorithms.In this thesis, we briefly introduce feature selection problem and rough sets model, and research feature selection algorithms based on rough sets model. The traditional rough sets model didn't combine with the relation database system and all the intensive computational operations are performed in flat files, rather than take advantages of the...
Keywords/Search Tags:data mining, feature selection, rough set, genetic algorithm
PDF Full Text Request
Related items