Design And Implementation Of T-Closeness Based Big Data Anonymization System

Posted on:2020-06-21

Degree:Master

Type:Thesis

Country:China

Candidate:H X Shao

Full Text:PDF

GTID:2428330572973636

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The release and sharing of data in the era of big data has promoted the development of science,and brought convenience to people's lives.However,direct release or sharing of unprocessed data can easily lead to the disclosure of personal privacy.In recent years,privacy preserving data publishing has received widespread attention from industry and academia.Data anonymization is one of the most widely used privacy protection technologies.By randomly mapping and generalizing data,the published data cannot be associated with any specific individual.T-Closeness is an effective data anonymization model.Compared with K-anonymous,L-diversity and other data anonymization models,it can resist semantic attacks and probabilistic attacks and provide stronger privacy protection.In the big data scenario,the centralized T-Closeness algorithm has low desensitization efficiency and is limited by the memory of a single machine,which is difficult to meet the desensitization demand of massive data.Therefore,the research of distributed T-Closeness algorithm with high desensitization efficiency is particularly important.This thesis focuses on the design of distributed T-Closeness algorithm in big data scenarios.Based on the existing three typical T-Closeness algorithms,three efficient distributed T-Closeness algorithms are proposed.Furthermore,based on the Spark distributed computing framework,the proposed three distributed T-Closeness algorithms are implemented,and the efficiency and scalability of the algorithm are tested.The experimental results show that the three designed and implemented in this thesis the distributed T-Closeness algorithm has high algorithm efficiency and good scalability.Based on the proposed three distributed T-Closeness algorithms,this thesis designs and implements a big data desensitization system.In this thesis,the demand analysis of big data desensitization system is firstly carried out,and the functional and non-functional requirements of the system are proposed.Then introduced the system design,including system module division,interaction,Interface design and data table design.The system includes five modules:Web front-end module,permission management module,generalization tree automatic configuration module,desensitization algorithm module and infrastructure module to support distributed desensitization,data permission management,desensitization data download,generalization tree automatic configuration and other functions.Next,this thesis introduces the implementation of the system,including technical selection,workflow and sub-module implementation.Finally,the thesis verifies that the implemented big data anonymization system is consistent with the design expectation through system functional test and non-functional test.

Keywords/Search Tags:

T-Closeness model, anonymization, distributed, big data

PDF Full Text Request

Related items

1	Design And Implementation Of Streaming Big Data Anonymization System
2	Research And Implementation Of Data Anonymized Privacy Protection Method
3	An Enhanced T-Closeness Privacy-preserving Method
4	Research On Data Anonymization Techniques For Data Publishing
5	The Research And Implementation Of Full-Domain Anonymization Algorithm Based On Cloud Platform
6	Anonymization-based Research On Privacy Preserving Data Publishing In ERP Systems
7	Research On T-closeness Privacy Protection Model With The Support Of Rough Set And Cluster
8	Anonymization techniques for large and dynamic data sets
9	The Analysis And Evaluation On Relation Closeness In Linked Data
10	Private data outsourcing using anonymization