Font Size: a A A

Research On Incomplete Information System Data Mining

Posted on:2011-11-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:H TianFull Text:PDF
GTID:1118360305955704Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Since the data missing or restrictions on access to real data, data mining are often face with incomplete information system, which there are some unknown attribute values and unable to obtain real data in information system. Rough set theory is a new mathematical approach to uncertain and vague data analysis. It can effectively deal with imprecise, inconsistent, incomplete informations, and can discovery the hidden knowledge. In order to study data mining and knowledge discovery in incomplete information system, the general rough set theory based on the week fuzzy similarity relation and the rough set models based on valued similarity relation are studied in this dissertation. Furthermore, the privacy preserving data mining techniques and algorithms are studied in incomplete information system. The research works are listed as follows:1. The rough set theory extension in incomplete information system is the theory foundation for data mining in incomplete information system recently. The rough set based on tolerance relation, in which the vacancy is equal to any known attribute values. The rough set based on similarity relation, in which the vacancy does not exist. The rough set based on the limited tolerance relation, in which the vacancy does exist and can be campared. However, it is limited that the two objects do not have the same attribute values while they attribute values are not vacancy. In the light of the above shortcomings and the lack of theory, we have proposed a general rough set based on the week fuzzy similarity relation. The properties and objectivity are researched and examined in deal with objects in incomplete information system. It is proved that the week fuzzy similarity relation is a more general binary relation.2. In order to mining the knowledge in incomplete information system based on the tolerance relation and the similiarity relation, which can not accurately describe the difference between the two similiarity objects and can not accurately discovery knowledge. Therefore, we present an approach to mining knowledge based on the value similiarity relation, which method can objectively reflect the objects inherent relationship in incomplete information systems. First, we can accurately identify the upper and lower approximation of each object relative to the concept of a set, by computing the similarity degree of attribute values between each object. Second, if user selects an appropriate threshold value of similarity, we can find the set of objects meeting the similarity threshold by computing the upper and lower approximation. Finally, we can precise determine the rules of knowledge meeting the conditions. Experimental results show that this model is a validity model of knowledge discovery in incomplete information system.3. The privacy-preserving data mining algorithems are studied in incomplete information system. The MASK algorithem based on randomized transition strategies, the PARD algorithem based on attribute transfer probility matrix and the RRPH algorithem based on randomized response with partial hiding. In the light of the above shortcomings, we propose a validity privacy-preserving association rules mining method, which are the partial randomized response based on probability matrix or PRRPM. The PRRPM algorithm is explored and its validity examined through theoretical analysis and experiments, experimental results show that the accuracy, privacy, complexity and applicability are more advantages.
Keywords/Search Tags:Incomplete information systems, Data mining, Rough sets, Privacy preserving
PDF Full Text Request
Related items