Data Privacy Preserving Approach Research For Clustering Analysis

Posted on:2015-12-30

Degree:Master

Type:Thesis

Country:China

Candidate:J Qiu

Full Text:PDF

GTID:2428330488498772

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the coming of the big data era,the data size is going to explosive growth,and higher requirements of data store and diming technology is presented to processing big data.The primary big data processing schema is that different data owner store their data in the data center and analysis centralized.However,privacy disclosure is the main problem which restricts its application.Clustering,as a common data mining approach,is faced with risk of privacy disclosure when mining data potential value.Therefore,privacy preserving research for clustering analysis has important theoretical significance and application value.The main research in this thesis includes the following two aspects:Firstly,for the problem of privacy preserving clustering,a fine-grained data obfuscation algorithm based on sensitive feature utility is proposed in this thesis,named as SFU-FDO algorithm.This algorithm first defines the class distribution entropy and neighbor distribution entropy of sensitive features,which is based information entropy theory and K-neighbor thought.Then,a clustering utility model is build.By using this model,this algorithm analysis clustering utility of sensitive features based class distribution entropy and neighbor distribution entropy.According the clustering utility of sensitive utility,a hybrid obfuscation strategy is used to achieving data obfuscation.Theory analysis and compare experimental results show that,comparing with the exist algorithms,this algorithm improve clustering accuracy effectively.At the same time,the data privacy is good ensured.Secondly,for the problem of that data set distribution is incomplete,for missing features appear when using data obfuscation method to realize data obfuscation,which is adverse to clustering analysis,a feature utility based privacy preserving clustering algorithm is proposed in this thesis,named as FCM-FUNI algorithm.This algorithm first defines a new utility distance measurement method,which is used to compute k neighbor distance.Then,missing feature of incomplete samples is converted to interval value of its k-neighbor feature value.At last,Interval fuzzy c-means algorithm is used to achieving clustering.Compare experimental results show that,this algorithm has better performance on clustering accuracy than those existing algorithms.

Keywords/Search Tags:

Privacy-preserving clustering, sensitive feature, clustering utility, k-neighbor, clustering accuracy

PDF Full Text Request

Related items

1	Research On Privacy-preserving Clustering Based On Differential Privacy
2	Research On Privacy Preserving Data Publishing For Multi-sensitive Attribute Based On Clustering
3	Research On K-medoids Clustering Algorithm Under Privacy Protection Model
4	Research On Clustering Algorithms In Differential Privacy
5	Privacy-preserving Research Of The Distributed Clustering Algorithm Based On Density
6	Research On Key Technology Of Privacy Preserving For Network Security
7	Research On Privacy Preserving Distributed Clustering Algorithm
8	The Research Of The Grid Based Privacy Preserving Clustering Algorithm
9	Research On Privacy Preserving K Nearest Neighbor Classification Algorithm
10	Research On Privacy Preserving In Social Networking Based On Graph Modification And Clustering