Granulation-mechanism-based Efficient Rough Feature Selection Algorithm

Posted on:2014-02-09

Degree:Doctor

Type:Dissertation

Country:China

Candidate:F Wang

Full Text:PDF

GTID:1228330401463043

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

At present, data mining has been conceived as a significant approach for knowledge discovery in the information society, which aims at transforming data into useful information. With the rapid development of information tech-nology including internet and database, both the size and the dimension of data sets increase at an unprecedented rate, which has brought the times of "large-scale data with high dimension". These data and their high dimension brings big challenges for traditional data mining algorithms, and exploring ef-ficient and effective data mining algorithms has quickly become a global issue in many areas.Feature selection is an important data preprocessing technique in data mining. However, existing feature selection algorithms are usually low in com-putational efficiency, especially when dealing with large-scale data sets. In this paper, on the basis of rough set theory, efficient feature selection for large-scale data sets is studied systematically. Main contributions are listed as follows.1. Based on the idea of decompose and fusion, an efficient framework for feature selection is constructed. According to the idea of sample estimation, two key steps are discussed in this paper. One is decompose which means decomposing a big granule into a family of small ones which have the similar distribution with the large one. The other one is fusion which means fusing all the estimates got from small granules together and generating a final feature subset of the large data set. The framework provides new ways for analyzing big data.2. By employing the framework, two efficient rough feature selection algorithms are developed. One is used for nominal data and the other one is applicable for hybrid data. Two typical algorithms for nominal data and hybrid data are embedded in the framework respectively, and then, two efficient algo-rithm are developed. The two developed algorithms can find an effective result efficiently, especially for large-scale data sets. Experiments better illustrate effectiveness of the two developed algorithms and the framework.3. For dynamic data sets, group incremental mechanisms, dimension in- cremental mechanisms and updating mechanisms of three representative in-formation entropies are introduced. On the consideration of there are three situations of data updating in databases, based on analyzing changes of ele-mentary granules and granular space in dynamic data sets, the corresponding mechanisms of three employed information entropies are proven.4. On the basis of mechanisms, three efficient rough feature selection are proposed for dynamic data sets. They are a group incremental feature se-lection algorithm, a dimension incremental feature selection algorithm and a feature selection algorithm for data sets with varying data value. Both theo-retical analysis and experiments illustrate effectiveness and efficiency of the three algorithms. In addition, the main ideas can be expanded to fusion of two data sets or even multiple data sets. It is our wish that this study provides new approaches on fusion of multi-source data sets.In this paper, on the basis of analyzing limitations of existing feature se-lection algorithms for large-scale data sets, several efficient rough feature se-lection algorithms are introduced. Experiments better illustrate that these al-gorithm are effective and efficient. Hence, the development in the paper makes an important contribution to knowledge discovery for large-scale data sets.

Keywords/Search Tags:

Large-scale data with high dimension, Dynamic data, Hybrid data, Rough set, Information entropy, Information granularity, Multi-granulation, Feature selection

PDF Full Text Request

Related items

1	Granulation Mechanism And Data Modeling For Complex Data
2	Researches Of Rough Set Model And Feature Selection For Numerical Data
3	Research On Information Granulation Algorithm For High-Dimensional Mixed And Class Overlapping Data
4	Granulation Modeling Approaches And Its Applications For Multi-feature Integration
5	Research And Application Of Information Granulation Based On Rough Clustering Under The Framework Of The Principle Of Justifiable Granularity
6	Model And Algorithm Of Analyzing Data Based On Rough Set Theory
7	Research On Feature Selection Algorithms Using Information Granulation
8	Research On The Neighborhood Multi-granulation Rough Set Model And Algorithm Oriented Mixed Data
9	Research On Multi-Granularity Information Fusion Method For Multi-Source Data
10	Research On Feature Selection Algorithm In Data With Large Scale And High Dimension Based On Evolutionary Multi-Objective Optimization