Font Size: a A A

Design And Implementation Of T-Closeness Based Big Data Anonymization System

Posted on:2020-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:H X ShaoFull Text:PDF
GTID:2428330572973636Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The release and sharing of data in the era of big data has promoted the development of science,and brought convenience to people's lives.However,direct release or sharing of unprocessed data can easily lead to the disclosure of personal privacy.In recent years,privacy preserving data publishing has received widespread attention from industry and academia.Data anonymization is one of the most widely used privacy protection technologies.By randomly mapping and generalizing data,the published data cannot be associated with any specific individual.T-Closeness is an effective data anonymization model.Compared with K-anonymous,L-diversity and other data anonymization models,it can resist semantic attacks and probabilistic attacks and provide stronger privacy protection.In the big data scenario,the centralized T-Closeness algorithm has low desensitization efficiency and is limited by the memory of a single machine,which is difficult to meet the desensitization demand of massive data.Therefore,the research of distributed T-Closeness algorithm with high desensitization efficiency is particularly important.This thesis focuses on the design of distributed T-Closeness algorithm in big data scenarios.Based on the existing three typical T-Closeness algorithms,three efficient distributed T-Closeness algorithms are proposed.Furthermore,based on the Spark distributed computing framework,the proposed three distributed T-Closeness algorithms are implemented,and the efficiency and scalability of the algorithm are tested.The experimental results show that the three designed and implemented in this thesis the distributed T-Closeness algorithm has high algorithm efficiency and good scalability.Based on the proposed three distributed T-Closeness algorithms,this thesis designs and implements a big data desensitization system.In this thesis,the demand analysis of big data desensitization system is firstly carried out,and the functional and non-functional requirements of the system are proposed.Then introduced the system design,including system module division,interaction,Interface design and data table design.The system includes five modules:Web front-end module,permission management module,generalization tree automatic configuration module,desensitization algorithm module and infrastructure module to support distributed desensitization,data permission management,desensitization data download,generalization tree automatic configuration and other functions.Next,this thesis introduces the implementation of the system,including technical selection,workflow and sub-module implementation.Finally,the thesis verifies that the implemented big data anonymization system is consistent with the design expectation through system functional test and non-functional test.
Keywords/Search Tags:T-Closeness model, anonymization, distributed, big data
PDF Full Text Request
Related items