Font Size: a A A

Design And Implementation Of Streaming Big Data Anonymization System

Posted on:2021-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y ShiFull Text:PDF
GTID:2518306308972979Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Big data contains great value and is called "new oil" in the 21 st century.However,big data often contains a large amount of personal sensitive information.If the big data is directly released or shared without processing,it will lead to seriously personal private information leakage.Data desensitization refers to the processing of data in accordance with the desensitization rules to achieve the protection of private information.Anonymization is one of the commonly used desensitization methods.By generalizing and suppressing the data and other operations,it is ensured that private information is not leaked.Most existing anonymization algorithms are anonymization algorithms for static data and centralized anonymization algorithms for data streams.For streaming big data,the centralized data stream anonymization algorithm is limited by the computing efficiency of a single node.It is difficult to meet the demand for mass data desensitization in unit time.Therefore,the research on distributed anonymization algorithms for streaming big data has theoretical significance and practical value.Streaming big data has the characteristics of large volume of data per unit time.The existing centralized data stream anonymization algorithms have limited computing power and cannot meet the desensitization requirements of streaming big data.Therefore,based on the classic centralized data stream anonymization algorithm CASTLE,this thesis proposes two distributed anonymization algorithms for streaming big data:grid-based distributed anonymization algorithm and VP-Tree-based distributed anonymization algorithm.This thesis implements the above algorithms based on the Flink which is distributed stream computing framework,and tests the efficiency and utility of the algorithms.The experimental results prove that compared with the baseline algorithm,the two algorithms proposed in this thesis can desensitize streaming big data.And they can achieve higher algorithm operation efficiency on the premise of ensuring data utility.This thesis designs and implements a streaming big data desensitization system based on the proposed distributed anonymization algorithm.The streaming big data desensitization system implemented in this thesis includes four modules,which are front-end interaction module,authority management module,desensitization algorithm module and data management module.The system has functions such as user management,streaming big data desensitization,and subscription to desensitized streaming big data.This thesis conducts functional and non-functional tests on the system.The test results show that the functional modules of the streaming big data desensitization system designed and implemented in this thesis meet the design expectations and the system performance meets the application demand.
Keywords/Search Tags:anonymization, data desensitization, distributed, streaming big data
PDF Full Text Request
Related items