Efficient Feature Selection Algorithm Based On Rough Set

Posted on:2018-04-05

Degree:Master

Type:Thesis

Country:China

Candidate:J P Zhang

Full Text:PDF

GTID:2348330521951759

Subject:Systems Engineering

Abstract/Summary:

Data mining is a process of searching for effective information from a large number of data through certain algorithms.It is one of the important means of artificial Intelligence in current information society.With the rapid development of the Internet,especially the rapid development of database technology and network technology,the mount of data from information technology industry is showing an explosive growth trend,dimension also increased rapidly.There is no doubt that the "massive high-dimensional" data age has arrived.The massive high-dimensional feature of the data set has brought great and new challenges for the traditional data mining technology.Exploring fast and efficient data mining algorithms has become a global hot research field.Feature selection is a commonly used data preprocessing technique.It is a research hotspot and difficulty in the study of feature selection in the study of more efficient feature selection processing techniques for large-scale data sets.To this end,this paper takes the rough set theory as the background,analyzes and studies the efficient feature selection for large-scale data sets,and obtains the following research results:1.Based on the theory of information entropy,a feature selection algorithm for dynamic updating of data in data set is proposed,which can process a set of data with change of value at one time by referring to some of the core concepts in rough set theory.In this algorithm,an efficient feature selection algorithm based on complementary entropy is designed by analyzing and proving the change mechanism of the complementary information entropy with the dynamic updating of the data value,and drawing on the solution strategy of attribute reduction in rough set theory.Experimental results show that the new algorithm is feasible.2.Based on the idea of semi-supervised learning,a semi-supervised feature selection algorithm based on clustering hypothesis is proposed for the "small mark problem" in data mining.Using the labeled data as the seed,thealgorithm uses the clustering algorithm to cluster the unlabeled data and assigns the label,and it selects some of the data in each kind of unlabeled data as unlabeled data representative to form a new data set with the original labeled data,then uses the information entropy as a measure of the importance of the feature,and has designed a semi-supervised rough feature selection algorithm based on information entropy.Experimental results show that the new algorithm is feasible and efficient.In this paper,the limitations of existing feature selection algorithms in dealing with massive high dimensional data sets are systematically analyzed.Based on rough set theory,a rough feature selection algorithm for dynamic data sets and a semi-supervised feature selection algorithm for A small number of tagged data sets are proposed.The related theory and experimental results also further verify the feasibility and efficiency of the new algorithm in this paper.Therefore,the main research contents and related results of this paper provide new processing techniques and research ideas for the knowledge discovery of massive high dimensional datasets.

Keywords/Search Tags:

Rough sets, Information entropy, Dynamic data sets, Feature selection, Semi-Supervised learning

Related items

1	Researches Of Rough Set Model And Feature Selection For Numerical Data
2	Research On Feature Selection Based On F-neighborhood Rough Sets
3	Research On Data Stream Classification Based On Granular Computing And F-Rough Sets Extension
4	Research And Application On Feature Selection Based On Extending Of Rough Set
5	Research On Semi-supervised Feature Selection Algorithm For Categorical Data
6	Extended Graded Rough Sets Models Based On Covering
7	A Semi-Supervised Feature Dimension Analysis Method Based On Entropy
8	Research On Feature Selection Algorithm Based On Rough Sets
9	Mixed Data Mining Methods Based On Rough Sets Theory
10	Designing Of A Construction And Evolutionary Algorithm Of Self Sets Based On Rough Sets Theory