Font Size: a A A

Research On Identification Of Implicit Privacy Dimension In Hight Dimension Data

Posted on:2015-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:H F BaFull Text:PDF
GTID:2308330479989748Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Most of data publishers generally obtain the original raw data, but do not have the ability to perform data mining techniques. Data analysts always, however, suffered from a lack of data. Some data publishers worry that it will lead to privacy disclosure without taking any protective methods. By taking privacy preservation techniques, the twisted data may inevitably affect the later data mining process.In the era of big data, privacy becomes a challenge issue, which attracts a great amount of research efforts. Traditional privacy preserving algorithms focus on preventing sensitive data associated with a specific person via a set of features manually assigned. However, how to determine such set of features is seldom studied. It is urgently needed to be studied since it is impossible to manually find the complete set of features which can deduce data privacy given a huge volume of data.In this paper, we first theoretically study privacy preserving issues and propose the Implicit Privacy Feature Set(IPFS) algorithm to find the complete set of Inferring Privacy Feature Set which is turned by a threshold. The Key Implicit Privacy Feature Set(KIPFS) algorithm is designed to find the key inferring privacy feature set. The key inferring privacy features can be applied to various privacy protection techniques, thus protecting individual’s privacy. To evaluate the effectiveness and the efficacy of the proposed approach, two state-of-the-art algorithms are implemented as baseline algorithms, which are k-anonymity and t-closeness, and their revised versions are applied on the KIPFS for the performance comparison.Experimental results showed that by integrating the KIPFS both algorithms, we can achieve better performance in terms of efficiency and data quality. Furthermore, the impact of privacy protection on data distribution can be also minimized. Therefore, the quality of data mining result can be guaranteed.
Keywords/Search Tags:privacy preserving, implicit privacy dimension, data mining, data publishing
PDF Full Text Request
Related items