Font Size: a A A

Research And Application Of Rough Set On Data Preprocessing

Posted on:2008-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y C LiuFull Text:PDF
GTID:2178360242467190Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As massive historical knowledge's accumulation,the people more and more longed for excavating latent,valuable information(useful knowledge) from massive, chaotic, the strong jamming data(magnanimous data) to instruct people later decision-making behavior. This proposed unprecedented challenge for humanity's intelligence information-handling capacity. The rough set theory is one kind of processing ambiquity and uncertainty question mathematical instrument which developed this century 70's, and it is one important metod of intelligence information processing.Due to rough set theory's these characteristic superiority,It more and more attacks people's attention .Various data mining methods which recently the people study based on the rough set theory bacome popular.This article has fully studied data pretreatment method of rough set theory and propose some improvements to the existing classical algorithm. Firstly, this article makes the massive research and analysis to the ROUSTIDA algorithm, because the algorithm fills the lost values of attributes according to the similar relationship of objects, it can not fill the lost values when two datas which do not similar to each other are similar to the data that has lost values.when the algorithm has run over, it still need to draw support in other fill algorithm to fill the lost values. The improved algorithm not only maintains the original algorithm's good fill performance but also increases noise data separation function.It has more widespread use scope than classical algorithm.Secondly the article has revised the attribute discretization method based on the attribute importance. The classical algorithm has carried on massive comparison to judge the break point whether was the final break point. Simultaneously while the break points are very many, this will waste a lot of time. In view of this question, the article has fully studied the difference information between the attribute values which the discriminable Matrix reflected, proposes a new discretization method which simplifies the set of points and reduces the search scope.the time behavior has improved nearly 30%.Finally this article realizes separately to the two algorithms, may see that the improved algorithm have better practical value compared to the original algorithms from the system running result.
Keywords/Search Tags:Data Fill, Attribute Discretization, Discriminable Matrix
PDF Full Text Request
Related items