The Design And Implementation Of Big Data Anonymity System

Posted on:2019-05-23

Degree:Master

Type:Thesis

Country:China

Candidate:Q Y Zhang

Full Text:PDF

GTID:2348330542998136

Subject:Computer Science and Technology

Abstract/Summary:

With the rapid development of digital technology and the widespread popularity of mobile Internet,more and more information of people’s life and work is digitized and collected by massive mobile terminals and intelligent devices,result in the explosive growth of the amount of data on the Internet.The era of big data comes with enormous commercial value.Nowadays,various industries are committed to the big data mining and analysis,which followed by a series of privacy issues.It is obvious that publishing or sharing raw data without anonymous processing would result in user privacy disclosure.In recent years,data privacy protection has attracted considerable attention,which result in the development of data privacy protection model and technology.However,with the increase of data quantity,diverse data type as well as increasingly complex storage system,data anonymous model and technology based single computing node can hardly satisfy the privacy protection demand under big data circumstances.In view of the above demands and problems,this thesis proposed a big data anonymity system with flexible configuration,multiple data sources support and encapsulation of multiple anonymous algorithms.The main work of this article includes the following parts:First of all,this thesis introduced the research background and status,including the basic concepts of data anonymity,traditional anonymous method as well as widely used k-anonymity and 1-diversity anonymous principle.Secondly,this thesis designed distributed k-anonymity algorithm and distributed 1-diversity algorithm based on Spark framework,basic knowledge and existing research.For distributed implementation of k-anonymity principle,this thesis proposed heuristic key-based distribution algorithm,binary k-clustering algorithm.Then this thesis proposed quick binary k-clustering algorithm for time complexity optimization.Furthermore,this thesis proposed quick binary 1-diversity algorithm and sensitive attribute distribution based partition 1-diversity algorithm for distributed implementation of 1-diversity principle.Thirdly,this thesis designed and implemented the big data anonymity system modules including data interface module,task scheduling module,anonymity rule management module and anonymity algorithm module.In the data interface module,role-purview technology is used for data owners and data consumers,while different kinds of table mapping methods such as SparkSQL and HIVE are supported as well.In the anonymity algorithm module,multiple column based traditional methods,quick binary k-clustering algorithm and sensitive attribute distribution based partition 1-diversity algorithm are encapsulated for different scenarios support.In the end of this thesis,’performance comparison and analysis of distributed k-anonymity algorithm and 1-diversity algorithm are listed.In the meanwhile,module test and functional test have been completed for system availability proof.

Keywords/Search Tags:

anonymization, distributed system, data release, kanonymity, 1-diversity

Related items

1	Research On Anonymization Privacy Protection Techniques Based On Clustering
2	Design And Implementation Of T-Closeness Based Big Data Anonymization System
3	Design And Implementation Of Streaming Big Data Anonymization System
4	Research On Anonymization Technique Based Privacy Preserving Method On Facial Image
5	Approaches For Privacy Preserving Based On Individual Correlation In Continuous Data Release
6	Research On Anonymity Algorithm For Incomplete Medical Data Based On L-diversity
7	Research On Data Anonymization Techniques For Data Publishing
8	The Research And Implementation Of Full-Domain Anonymization Algorithm Based On Cloud Platform
9	For Dynamic Data Set Re-release Of Privacy Protection
10	Privacy Protection For Dynamic Continuous Publishing Of Structured Relational Data