Font Size: a A A

Research On K-medoids Clustering Algorithm Under Privacy Protection Model

Posted on:2018-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y GaoFull Text:PDF
GTID:2358330542962928Subject:Software Engineering Theory
Abstract/Summary:PDF Full Text Request
The pervasive impact of business computing has made information technology becoming an indispensable part of daily operations for enterprises,and many of them have accumulated a large amount of data in different ways for decision making.As one of the IT services enterprises most needed,data mining has been a tool of data analysis,and it has provided a technology which reveals hidden information and knowledge for researchers or decision makers.The technology got rid of limitations which faced to new-type datasets,and it would soon become important tool using to find data information for scholars.However,privacy leaking problem has become more and more serious on digital age,unlawful profit-making organization would release or misuse personal privacy information without users' consent or notice.To achieve the aim of privacy security in the process of finding knowledge and promote the development of data mining technology,the concept of PPDM was put forwarded by scholars,its content is extraction unknow or valuable information under protecting user's sensitive information.As one of the most important task for data mining,clustering analysis not only has many practical applications such as medicine,biology and finance and so on,but also is a starting point of dealing with data-collecting problems.It plays an essential role in the field of data mining successfully and charms many students to join in.Referring to privacy-preserving clustering algorithms,there is few studies relatively,the thesis would consider k-medoids clustering algorithm as a research object,using differential privacy model to propose DPk-medoids clustering algorithm to achieve clustering analysis under environment of privacy-preserving,in addition,we also put forward to an effective way according to the problems of low utility of clustering result,the main contents were presented below:(1)Privacy-preserving model and basic contents of data mining were analyzed in detail,there is mainly divided into two parts.Firstly,it outlined topic selection background and implications for research,analyzed current situation of privacy-preserving and PPDM,at the same time,used citespace tool to discuss countries and institutions which would make a major contribution.Secondly,it was introduced in detail to give correlation theory and concept about privacy-preserving,clustering technology,to set forth definitions of differential privacy,error preserving privacy model and adding noise technology.At last,it focused on analysing performance evaluation function based on privacy preserving clustering algorithm and provided a theoretical support for algorithms and experiments analysis.(2)A clustering algorithm of DPk-medoids based on differential privacy model was proposed.To disturb real data value,the algorithm used Laplace mechanism to add noise to achieve protection for user's sensitive information,and proved the process of adding noise which had satisfied differential privacy.Experimental results on UCI datasets were showed as follows.Firstly,DPk-medoids clustering algorithm can apply to datasets of different scales and dimensions to prevent personal privacy from leaking to some extent.Secondly,it ensures the utility of clustering results.(3)EPPk-medoids clustering algorithm was proposed.Taking account of the characteristic of error preserving privacy,EPPk-medoids clustering algorithm used bootstrap method to protect user' data,and gived attack model under the privacy-preserving cluster analysis.Through the experiment result of UCI and structure datasets,we can easily find out that EPPk-medoids algorithm not only satisfies security of data but also achieves good results of clustering,comparing with DPk-medoids clustering algorithm,EPPk-medoids clustering algorithm has large space to promote utility in the case of remaining the same privacy-preserving level,moreover,it adds less noise value to improve the accuracy of data clustering analysis.
Keywords/Search Tags:Differential privacy, Error preserving privacy, Data mining, K-medoids clustering, Utility analysis
PDF Full Text Request
Related items