Font Size: a A A

The Design And Implementation Of Big Data Anonymity System

Posted on:2019-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y ZhangFull Text:PDF
GTID:2348330542998136Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of digital technology and the widespread popularity of mobile Internet,more and more information of people's life and work is digitized and collected by massive mobile terminals and intelligent devices,result in the explosive growth of the amount of data on the Internet.The era of big data comes with enormous commercial value.Nowadays,various industries are committed to the big data mining and analysis,which followed by a series of privacy issues.It is obvious that publishing or sharing raw data without anonymous processing would result in user privacy disclosure.In recent years,data privacy protection has attracted considerable attention,which result in the development of data privacy protection model and technology.However,with the increase of data quantity,diverse data type as well as increasingly complex storage system,data anonymous model and technology based single computing node can hardly satisfy the privacy protection demand under big data circumstances.In view of the above demands and problems,this thesis proposed a big data anonymity system with flexible configuration,multiple data sources support and encapsulation of multiple anonymous algorithms.The main work of this article includes the following parts:First of all,this thesis introduced the research background and status,including the basic concepts of data anonymity,traditional anonymous method as well as widely used k-anonymity and 1-diversity anonymous principle.Secondly,this thesis designed distributed k-anonymity algorithm and distributed 1-diversity algorithm based on Spark framework,basic knowledge and existing research.For distributed implementation of k-anonymity principle,this thesis proposed heuristic key-based distribution algorithm,binary k-clustering algorithm.Then this thesis proposed quick binary k-clustering algorithm for time complexity optimization.Furthermore,this thesis proposed quick binary 1-diversity algorithm and sensitive attribute distribution based partition 1-diversity algorithm for distributed implementation of 1-diversity principle.Thirdly,this thesis designed and implemented the big data anonymity system modules including data interface module,task scheduling module,anonymity rule management module and anonymity algorithm module.In the data interface module,role-purview technology is used for data owners and data consumers,while different kinds of table mapping methods such as SparkSQL and HIVE are supported as well.In the anonymity algorithm module,multiple column based traditional methods,quick binary k-clustering algorithm and sensitive attribute distribution based partition 1-diversity algorithm are encapsulated for different scenarios support.In the end of this thesis,'performance comparison and analysis of distributed k-anonymity algorithm and 1-diversity algorithm are listed.In the meanwhile,module test and functional test have been completed for system availability proof.
Keywords/Search Tags:anonymization, distributed system, data release, kanonymity, 1-diversity
PDF Full Text Request
Related items