Font Size: a A A

Research On Models And Algorithms For Feature Selection On Dynamic Incomplete Data

Posted on:2016-01-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:W H ShuFull Text:PDF
GTID:1228330470955922Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and network technology, the scale and amount of data growth has been at an unprecedented rate, which brings us a data expansion and information explosion era. On one hand, the rapid increase of data scale from the previous TB level to the current PB level makes the knowledge acquisition beyond the reach of our ability. On the other hand, while the explosive growth in the amount of data makes the traditional methods of knowledge acquisition helpless, the abundant information contained in the data brings us much valuable knowledge. Thus how to effectively preprocess the massive amount of high-dimensional data in real-life applications, and to extract the potentially useful knowledge have become one of the important research topics in the areas of data mining, computational intelligence and machine learning.Feature selection is an important data preprocessing technique in data mining. In real-world complex environments, as data often have some characteristics of dynamicity, incompleteness and inaccuracy, how to find an efficient and feasible way for feature selection on the data is one of the main challenging problems in the feature selection research. As a mathematical tool for characterizing the uncertainty and imprecision information, rough set theory has been widely applied in the areas of data mining, knowledge discovery and machine learning. The main advantage of this theory is that it does not rely on any prior information to handle the uncertain problems for a given data set, such that the ways to describe and deal with the problems are more objective. Therefore, the research on rough set theory based feature selection for dynamic incomplete data has important theoretical and practical significance.In the context of dynamic incomplete data, this thesis gives the methods for acquiring the results of feature subset selection and extracting dynamic knowledge effectively at the actual knowledge-driven need. In the framework of feature selection based on rough set theory, in the main line of three different dynamic scenarios of incomplete data sets, this thesis gives a systematic study of incremental updating to feature subset for dynamic incomplete data sets. The purpose of this thesis is to explore effective methods for feature selection on dynamic incomplete data sets in an incremental manner. The research results will provide a new theoretical foundation and a set of implementation methods for knowledge discovery on dynamic data. Therefore, this thesis makes use of rough set theory as the basis, and conducts a systematic study of feature selection for dynamic incomplete data. The main research results are shown as follows. 1) An incremental updating mechanism of the positive region is provided when an object set is added into or deleted from the incomplete date set. Based on the updated mechanism of the positive region, the significance of candidate features is defined, and incremental feature selection algorithms based on positive region with the variation of an object set are designed. Experimental results have validated the efficiency and effectiveness of the proposed feature selection algorithms.(The2nd Chapter)2) When a feature set is added into or deleted from the incomplete data set, through analyses on the dynamic changes of tolerance granularity in the incomplete data set, the incremental updating mechanism of the positive region with the variation of a feature set is established, and incremental updating feature selection algorithms are designed when a feature set is added into or deleted from the incomplete data set, respectively. Theoretical analysis and experimental results have verified the efficiency and effectiveness of the proposed feature selection algorithms.(The3rd Chapter)3) The incremental updating mechanism of the positive region is provided when feature values change dynamically in the incomplete date set. On this basis, incremental feature selection algorithms to the variation of feature values over time are designed. Especially when the feature values of multiple objects change, the proposed algorithm can obtain the result of feature subset at once, without repeatedly performing the incremental algorithm to the variation of feature values of a single object. Theoretical analysis and related experimental results have verified the efficiency and effectiveness of the proposed feature selection algorithms.(The4th Chapter)4) To overcome the drawbacks of a single evaluation function in the existing feature selection algorithms for incomplete data sets, a new evaluation function, combined with the granularity measure criterion, for entropy-based feature selection to measure the information power of candidate features from different viewpoints is provided. On this basis, a greedy forward feature selection algorithm for incomplete data sets is designed. In addition, a multi-criteria evaluation function to measure the quality of candidate features from different perspectives for cost-sensitive incomplete data set is provided. To accelerate the feature selection process, a strategy for a continuous reduction search space is presented, and a cost-sensitive feature selection algorithm under this strategy is designed. The designed algorithm can obtain a feature subset that has the same information power as the whole feature set, and the minimum test cost and misclassification cost. Experimental results have verified the efficiency and effectiveness of the proposed feature selection algorithms.(The5th Chapter)...
Keywords/Search Tags:Data mining, Dynamic incomplete data, Feature selection, Granularcomputing, Rough set theory
PDF Full Text Request
Related items