Font Size: a A A

Research And Application Of Rough Set On Data PreProcessing Of Knowledge Discovery

Posted on:2015-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:C J ChenFull Text:PDF
GTID:2298330452950801Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the process of knowledge discovery in actual application fields, due to limited data collection capacity, data storage medium damage or other unknown situations, the data acquired for mining systems are often incomplete, i.e. there is a missing data. Because of such incompletness of data, the noise and uncertainty is introduced into data model used for datamining, so it makes the result of data mining encounters problems of conflict and chaos. This will seriously affect the process or result of data mining and knowledge discovery. The rough sets theory,a kind of mathematical instrument used to deal with ambiguity and uncertainty questions, was first introduced by Pawlak. In the process of data processing, the theory of rough set has a certain degree of objectivity and universality. This thesis presents the research work of solving the problem of missing data values in the process of data preprocessing when applying the theory of rough set. The thesis puts forward a joint processing model combining the rough set theory and the frequent item sets in association rule algorithm.Firstly, the thesis introduces several methods of filling missing values in an incomplete information system, and analyzes their respective advantages and disadvantages, especially pays attention on two methods of the domestic concern ROUSTIDA algorithm that based on rough set theory and foreign Closest Fit algorithm. On that basis, this thesis proposes an incomplete information system processing algorithm RSF, which combines the technologies of quantitative tolerance rough set and attributes reduction. The RSF algorithm is obviously improved on describing similarity accuracy between the absence objects and alternative filling objects, and in computation complexity. It has been proved by experiments that RSF method has a higher filling accuracy than ROUSTIDA algorithm, and has a lower computation complexity than Closest Fit algorithm.Since the previous algorithms for incomplete information systems ignore the importance of the alternative filling objects in the information system. The thesis puts forward a method of using frequent item sets in association rules knowledge to fill the data missing value. This method is simple and can improve the filling precision of the missingdata. Because this method cannot complete all the missing value in the dataset, the thesis finally puts forward a combined processing model about the RSF algorithm and frequent items in dataset, called’FI-RSF’. In this model use the method of frequent itemsets to fill first, and reuse RSF algorithm to fill for the rest of the failure to dealwith missing value.Finally, the UCI machine learning database data sets are selected for experiments. The results show that FI-RSF method compared with RSF algorithm has higher filling accuracy, and as the effect of frequent itemsets default support goes down, the prediction accuracy will improve.
Keywords/Search Tags:Knowledge Discovery, Incomplete Information System, Rough Set, Data Filling, Frequent Item Set
PDF Full Text Request
Related items