| With the popularization and development of information technology,a large amount of data has been accumulated in a variety of industries,based on the fact that these data usually involve plenty of personal privacy information,it is prone to the disclosure of personal privacy if the data is directly released or analyzed accordingly.As a new kind of privacy protecting model,differential privacy can prevent attacks from attackers with any background knowledge and effectively solve the problem of privacy threat in the process of data releasing and analyzing.This dissertation designs and implements a data release and algorithm evaluation system based on differential privacy in the context of big data,in order to solve the problem between data release and privacy protection in the context of big data.The research contents are as follows:Firstly,on the question of processing of massive numerical data,Spark distributed computing frameworks is chosen to promptly and effectively process the data.According to the difference in data dimension and requirements for release,two kinds of data processing algorithm are designed to preprocess raw data,to get theoriginal count values of data released.Then,in order to avoid the leak of sensitive data information in the process of release,non-interactive protection framework is adapted in the system and four types of differential privacy data releasing algorithm of different release strategies are used in data release,and the result of data release is showed visually.Finally,according to the standards for measuring the performance of differential privacy algorithm,the evaluation of privacy protection algorithm is implemented from the aspect of algorithm error and performance.To sum up,the system can meet the needs of data processing and data release of large-scale numerical data in the context of big data.It provides a visual platform of data release and algorithm evaluation based on differential privacy for data analysts and data owners,in order to help them select proper differential privacy algorithm to improve utility of data and ensure that the sensitive information in the data will not be leaked. |