Research And Implementation On Larger Data Sets Mining Algorithm Based On Rough Set

Posted on:2011-04-26

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhou

Full Text:PDF

GTID:2178360302993842

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer technology, sensor technology and the internet, there are a lot of effective data tools for generation, transmission, storage and retrieval. Therefore, as the increase of data rate and size that we captured, a variety of data streams were recorded in the various types of storage media. Because of the rapid growth of data in the number of instances, properties and classification, it appears high-dimensional data sets, which bring some tremendous challenges in robustness and scalability to many machine learning algorithms including decision tree classification mining algorithm.In this thesis, we firstly explain the background and significance,then discuss the related principles and theories of decision tree classification and rough set.We introduce rough set theory into the preparation of training set and the model construction of decision tree,and start with reducing size of large data sets and improving node attribute selection measure of decision tree.We perpetrate several in-depth researches and fruitful innovations, the main content and innovation are described as follow:1. The data set size compressing algorithm is too complex and cutting instances size is not taken into accout seriously.We propose the space partition algorithm for large data sets based attribute purity partitioning, which introduce clustering notion and use entropy to partition data sets as measurement of attribute purity. The smaller entropy, the more pure subset segmented, in other words the greater similarity (or homogeneity) of internal subset.2. In general, some information may lose after partitioning. Therefore, one major consideration is how to keep important information. The RLDS (RLDS, reduction algorithm for large data sets) based attribute purity partitioning and representative instances extracting is proposed. Which can search central instance of each subset by Euclidean distance and find k nearest neighbors of central instance, then the two components compose reduction of training sets. The complexity and information theory analysis of algorithm illuminate that time complexity is much less than classical rough set and algorithm can rapidly find a reduction which is an approximately simplest set of original large data sets. 3. We propose a novel measurement—attribute classification value for selecting attribute in each node based on rough set and a new decision tree model construction algorithm (ACVS, attribute classification value for selecting) synthesized with reduction algorithm (RLDS)for large data sets.The ACVS make condition that different condition attribute but same class as compensative factor to expand discemibility matrix.The measure function of attribute classification value is proposed based on new discemibility matrix,which could be used to select attribute in node and more synthetically measures contribution of an attribute for classification. RLDS is the core method of optimization of the training sets.4. A decision tree classification model is designed and implemented. we implemente the evaluation test of algorithm performance in some UCI data sets, summarize the experiment and analyze the existing problems, and propose future research goals and direction.

Keywords/Search Tags:

large data sets mining, rough set, decision tree, attribute purity, discemibility matrix, attribute classifition value

PDF Full Text Request

Related items

1	The Data Mining Algorithm Based On Rough Sets
2	Research On Attribute Reduction Algorithms Based On Rough Sets Theory
3	Research On The Attribute Reduction Algorithm Based On Rough Set In Data Mining
4	Research On Data Mining Algorithm Based On Rough Sets
5	Study And Application Of Attribute Reduction Algorithms Based On Rough Sets
6	Research Of Decision Tree Algorithm Based On Rough Sets And Gray Theory
7	Research On Attribute Reduction Algorithm For Decision Tables In Rough Sets
8	Attribute Reduction Based On Rough Set Theory And Research On Classification Algorithm Of Decision Tree
9	Attribute Reduction Algorithm Of Neighborhood Rough Sets And Its Application In Classifier
10	Study On Attribute Redution Based On Rough Sets And Its Application