Font Size: a A A

Data Privacy Preserving Approach Research For Clustering Analysis

Posted on:2015-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:J QiuFull Text:PDF
GTID:2428330488498772Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the coming of the big data era,the data size is going to explosive growth,and higher requirements of data store and diming technology is presented to processing big data.The primary big data processing schema is that different data owner store their data in the data center and analysis centralized.However,privacy disclosure is the main problem which restricts its application.Clustering,as a common data mining approach,is faced with risk of privacy disclosure when mining data potential value.Therefore,privacy preserving research for clustering analysis has important theoretical significance and application value.The main research in this thesis includes the following two aspects:Firstly,for the problem of privacy preserving clustering,a fine-grained data obfuscation algorithm based on sensitive feature utility is proposed in this thesis,named as SFU-FDO algorithm.This algorithm first defines the class distribution entropy and neighbor distribution entropy of sensitive features,which is based information entropy theory and K-neighbor thought.Then,a clustering utility model is build.By using this model,this algorithm analysis clustering utility of sensitive features based class distribution entropy and neighbor distribution entropy.According the clustering utility of sensitive utility,a hybrid obfuscation strategy is used to achieving data obfuscation.Theory analysis and compare experimental results show that,comparing with the exist algorithms,this algorithm improve clustering accuracy effectively.At the same time,the data privacy is good ensured.Secondly,for the problem of that data set distribution is incomplete,for missing features appear when using data obfuscation method to realize data obfuscation,which is adverse to clustering analysis,a feature utility based privacy preserving clustering algorithm is proposed in this thesis,named as FCM-FUNI algorithm.This algorithm first defines a new utility distance measurement method,which is used to compute k neighbor distance.Then,missing feature of incomplete samples is converted to interval value of its k-neighbor feature value.At last,Interval fuzzy c-means algorithm is used to achieving clustering.Compare experimental results show that,this algorithm has better performance on clustering accuracy than those existing algorithms.
Keywords/Search Tags:Privacy-preserving clustering, sensitive feature, clustering utility, k-neighbor, clustering accuracy
PDF Full Text Request
Related items