Research And Application Of Rough Set On Data Preprocessing

Posted on:2008-09-24

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Liu

Full Text:PDF

GTID:2178360242467190

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

As massive historical knowledge's accumulation,the people more and more longed for excavating latent,valuable information(useful knowledge) from massive, chaotic, the strong jamming data(magnanimous data) to instruct people later decision-making behavior. This proposed unprecedented challenge for humanity's intelligence information-handling capacity. The rough set theory is one kind of processing ambiquity and uncertainty question mathematical instrument which developed this century 70's, and it is one important metod of intelligence information processing.Due to rough set theory's these characteristic superiority,It more and more attacks people's attention .Various data mining methods which recently the people study based on the rough set theory bacome popular.This article has fully studied data pretreatment method of rough set theory and propose some improvements to the existing classical algorithm. Firstly, this article makes the massive research and analysis to the ROUSTIDA algorithm, because the algorithm fills the lost values of attributes according to the similar relationship of objects, it can not fill the lost values when two datas which do not similar to each other are similar to the data that has lost values.when the algorithm has run over, it still need to draw support in other fill algorithm to fill the lost values. The improved algorithm not only maintains the original algorithm's good fill performance but also increases noise data separation function.It has more widespread use scope than classical algorithm.Secondly the article has revised the attribute discretization method based on the attribute importance. The classical algorithm has carried on massive comparison to judge the break point whether was the final break point. Simultaneously while the break points are very many, this will waste a lot of time. In view of this question, the article has fully studied the difference information between the attribute values which the discriminable Matrix reflected, proposes a new discretization method which simplifies the set of points and reduces the search scope.the time behavior has improved nearly 30%.Finally this article realizes separately to the two algorithms, may see that the improved algorithm have better practical value compared to the original algorithms from the system running result.

Keywords/Search Tags:

Data Fill, Attribute Discretization, Discriminable Matrix

PDF Full Text Request

Related items

1	Bayesian Classification Algorithm Based On Attribute Discretization And Its Application
2	Research On Method Of Attribute Discretization In Data Mining
3	Research On Discretization Algorithm Based On Class-attribute Association
4	VPRS Based Approaches For Discretization Of Continuous Attributes And Data Preprocessing
5	A Discretization Algorithm And Its Parallelization For Unbalanced Data
6	Researched For Continuous Valued Attribute Reduction Algorithm Based On Rough Theory
7	The Data Mining Algorithm Based On Rough Sets
8	The Knowledge Discovery Of Attribute Partial-ordered Theory Based On The Discretization Of Continuous Attribute
9	Application Of Rough Set And Svm In Discretization Of Continuous Attribute
10	Application Of Rough Set And SVM In Discretization Of Continuous Attribute