Research On Data Mining Algorithm Based On Rough Sets

Posted on:2010-01-23

Degree:Master

Type:Thesis

Country:China

Candidate:X J Wang

Full Text:PDF

GTID:2178360278497089

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the extensive application of database technology, the amount of data in the database increases rapidly. In order to find out laws and models to help people make better use of these data for decision-making, the concept of knowledge discovery and data mining is proposed. Data mining is the most critical steps in knowledge discovery, but also the technical difficulties in knowledge discovery, is the very active area in research nowadays. The theory of rough sets, presented by Polish mathematician Pawlak Z., is a powerful mathematical tool for analyzing uncertain, fuzzy knowledge. Rough sets, as a new hot spot in the field of artificial intelligence, can effectively deal with the expression and deduction of incomplete, uncertain knowledge. The theory of rough sets is specially fit for the application to data mining because of its features. Its validity has been confirmed from the successful use in various science and engineering domains in recent years.Decision tree is the most universal model adopted in classification. The univariate decision tree is confined to test of only single attribute at each node, which has the follow problems: ignore the relation of attributes; some sub trees appear repeatedly in the decision tree; some attributes are measured for many times on certain route of the decision tree. In order to overcome the defect, the learning method of multivariate has been proposed, which can test several attributes simultaneously at one node. This method produces the new attributes which are more relevant, and revise or remove independent attributes. The key problem is the standard for selection and test of the nodal attribute. Preprocessing to the massive data are also the critical technique.This dissertation studies the theory of multivariate decision tree, with the following main research results:1. A new concept of similarity degree of attribute importance is presented. The attribute importance, as the weighted value, is integrated into the traditional formula of similarity. It overcomes the only consideration of quantitative change of distance, but not the attribute importance. Moreover, it accords with the reality and the calculation is simple.2. Preprocess the data to make the data mining more effective. Reduct the attributes by the classical simplification algorithm of the discernibility matrix to compress the dimension. Calculate the similarity degree of data objects each other, and put the ones whose similarity degree is bigger than the threshold into a group. Select one from each group to form a new sample of data to decrease the redundant ones.3. The attribute selection criterion, based on the attribute sets importance, is proposed, setting the number of attributes at each node to be two at most. It conquers the shortcomings of traditional decision tree algorithms at deflection problems in selecting testing attribute. Less computing time is acquired while the height of decision tree is compressed and rules are more comprehensive.4. A concept of relative generalization of one equivalence relation with respect to another one is introduced and used for construction of multivariate tests to avoid the overfit of data.Based on the former work, algorithm based on the rough sets for multivariate decision tree is put forward. The comparison between multivariate decision tree and univariate one is done through an example. The comparison among several multivariate decision trees is fulfilled .And it is verified with instance and experiments that the algorithm is advantageous.

Keywords/Search Tags:

data mining, rough sets theory, multivariate decision tree, similarity degree of attribute importance, relative generalization

PDF Full Text Request

Related items

1	The Data Mining Algorithm Based On Rough Sets
2	Research And Implementation On Larger Data Sets Mining Algorithm Based On Rough Set
3	Research On Similarity Rough Set Theory Based On Pansystems
4	Research Of Decision Tree Algorithm Based On Rough Sets And Gray Theory
5	Research On Attribute Reduction Algorithms Based On Rough Sets Theory
6	A Study Of Optimizing Data Mining Algorithms Based On Decision Tree
7	Research On Attribute Importance Measure Theory And Method Based On Data Coordination
8	The Research Of Attribute Reduction And Minimum Rule Sets Acquisition In Decision Rough Set Theory
9	Rough Set Theory In The Decision Tree
10	Attribute Reduction Algorithm Of Neighborhood Rough Sets And Its Application In Classifier