Font Size: a A A

Research And Implementation Of Differential Privacy Protection Technology In MapReduce Environment

Posted on:2021-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:X X WangFull Text:PDF
GTID:2428330614463876Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud computing and big data technology,MapReduce technology based on Hadoop platform has been widely used.By combining data mining algorithms with MapReduce technology,valuable information can be obtained more conveniently and efficiently.In the area of data mining,classification algorithms play a key role in predicting and classifying data.The random forest algorithm is a typical representative of classification algorithms,which has been widely used in various fields.However,the classification results of single decision tree and the corresponding count information may cause the leakage of user privacy information.In the MapReduce environment,the random forest algorithm that satisfies differential privacy protection aims to improve the accuracy of classification data.Aiming at the current security problems faced by MapReduce distributed environment,this thesis proposes a differential privacy protection algorithm for MapReduce distributed environment,and designs a feature selection scheme combined with index mechanism.Through the use of equal difference distribution,equal ratio distribution and equal distribution three different ways of privacy budget distribution,the classification accuracy and operation efficiency of the proposed algorithm are effectively improved.Also,the amount of calculation is greatly reduced.The main research contents are as follows:(1)Aiming at the problem that massive data is susceptible to data leakage and malicious analysis during the processing process in a distributed environment,a random forest algorithm DPMRRF that satisfies differential privacy in a distributed computing framework based on MapReduce is proposed.The algorithm uses the MapReduce computing framework in the Hadoop platform to allocate data sets to each Map node by randomly extracting records,and starts Map sub-tasks for data integration processing.Through Reduce sub-tasks,the index selection mechanism is used to complete the attribute selection and update.In addition,by adding random noise values to the leaf nodes,the classification results satisfy differential privacy.Experiment results show that the algorithm has better classification accuracy and also ensures data availability.(2)A feature selection scheme with index mechanism is designed.By using Fayyad boundary point determination principle,the selection of continuous attribute segmentation threshold is improved,and the invalid and repeated calculation of discrete attributes is eliminated.In this scheme,the characteristic attributes are evaluated at one time,and the discrete attributes and characteristic attributes are evaluated at the same time.The subdivision scheme of continuous attribute value and the subdivision scheme of discrete attribute value are scored by the way of probability distribution control output through the index mechanism,and are then compared under the three noise allocation modes of equal difference,equal ratio and uniform privacy budgetallocation.Using the MapReduce distributed parallel framework of big data platform Hadoop,this paper constructs an effective differential privacy random forest privacy protection model,which reduces the calculation amount of the algorithm,avoids the problem of premature depletion of privacy parameters,ensures the reasonable allocation of privacy budget,improves the classification accuracy,and ensures the algorithm runs faster and shorter.(3)A prototype system of differential privacy protection technology in a distributed environment is designed and implemented.The differential privacy algorithm module and noise parameter module in a distributed environment are implemented,and the workflow of each module is explained in detail.Experiments were performed on the prototype system using the data set.The experimental results show that the prototype system can analyze the data and also ensure the security of the data effectively.
Keywords/Search Tags:MapReduce, Differential Privacy Protection, Random Forest, Privacy Budget
PDF Full Text Request
Related items