Font Size: a A A

Research Of Preprocessing In Rough Set Based On Similar Prediction

Posted on:2012-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z JiangFull Text:PDF
GTID:2218330368981942Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With mature of data mining technology, the information industries emerge in large numbers and Internet develops rapidly in daily life. The amount of information people required is growing exponentially. In practice, traditional data analysis and data query methods cannot meet the urgent requirements from people, because the potential knowledge hidden in data. As a new mathematical tool, rough set theory dose not requires any additional information or prior knowledge from the outside world. With the significant feature, rough set theory has gradually become the most important theory in the exploration in KDD. Classical rough set theory can not deal with the missing source data information, thus it needs to be preprocessed for data mining algorithm, how to conduct data preprocessing effectively is very important at present.This paper takes direct padding method and non-treatment method as data preprocessing in rough sets. First, review the feature and disadvantages of existing major padding algorithm, such as existence of redundant information system, the requirement of a priori probability distribution, no sparse degree dealing. Based on similarity computation, collaborative filtering technology is taken to deal with sparse information table, meanwhile combine this technology with Direction-area padding algorithm, a null-value estimation method in rough set based on similar prediction is improved here; second, entropy and mutual-information are introduced as a dual-feature weight to descript the property of information table, thus the padding value can show result with property; last, for the multi-value and no existing null-value, multi-value incomplete information system and a limited tolerance relation based on existing null-value are taken to deal with these problems in attribute reduction.In this paper, the improved algorithm is verified effective by a simulation, which is good at dealing with sparse data, and the accuracy and the mean absolute error are better than the original method if information table is sparse. The instances also verify that multi-value and no existing null-value are feasible in attribute reduction.
Keywords/Search Tags:rough set, data preprocessing, null value, similarity, multi-value
PDF Full Text Request
Related items