Font Size: a A A

Efficient Feature Selection Algorithm Based On Rough Set

Posted on:2018-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:J P ZhangFull Text:PDF
GTID:2348330521951759Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Data mining is a process of searching for effective information from a large number of data through certain algorithms.It is one of the important means of artificial Intelligence in current information society.With the rapid development of the Internet,especially the rapid development of database technology and network technology,the mount of data from information technology industry is showing an explosive growth trend,dimension also increased rapidly.There is no doubt that the "massive high-dimensional" data age has arrived.The massive high-dimensional feature of the data set has brought great and new challenges for the traditional data mining technology.Exploring fast and efficient data mining algorithms has become a global hot research field.Feature selection is a commonly used data preprocessing technique.It is a research hotspot and difficulty in the study of feature selection in the study of more efficient feature selection processing techniques for large-scale data sets.To this end,this paper takes the rough set theory as the background,analyzes and studies the efficient feature selection for large-scale data sets,and obtains the following research results:1.Based on the theory of information entropy,a feature selection algorithm for dynamic updating of data in data set is proposed,which can process a set of data with change of value at one time by referring to some of the core concepts in rough set theory.In this algorithm,an efficient feature selection algorithm based on complementary entropy is designed by analyzing and proving the change mechanism of the complementary information entropy with the dynamic updating of the data value,and drawing on the solution strategy of attribute reduction in rough set theory.Experimental results show that the new algorithm is feasible.2.Based on the idea of semi-supervised learning,a semi-supervised feature selection algorithm based on clustering hypothesis is proposed for the "small mark problem" in data mining.Using the labeled data as the seed,thealgorithm uses the clustering algorithm to cluster the unlabeled data and assigns the label,and it selects some of the data in each kind of unlabeled data as unlabeled data representative to form a new data set with the original labeled data,then uses the information entropy as a measure of the importance of the feature,and has designed a semi-supervised rough feature selection algorithm based on information entropy.Experimental results show that the new algorithm is feasible and efficient.In this paper,the limitations of existing feature selection algorithms in dealing with massive high dimensional data sets are systematically analyzed.Based on rough set theory,a rough feature selection algorithm for dynamic data sets and a semi-supervised feature selection algorithm for A small number of tagged data sets are proposed.The related theory and experimental results also further verify the feasibility and efficiency of the new algorithm in this paper.Therefore,the main research contents and related results of this paper provide new processing techniques and research ideas for the knowledge discovery of massive high dimensional datasets.
Keywords/Search Tags:Rough sets, Information entropy, Dynamic data sets, Feature selection, Semi-Supervised learning
PDF Full Text Request
Related items