Font Size: a A A

Research On Differentially Private Classification And Recommendation Algorithms

Posted on:2020-06-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Q LiuFull Text:PDF
GTID:1368330602961107Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Today is the era of data explosion.The rapid development of network techniques,sensing techniques and storage techniques makes the collection and acquisition of massive data unprecedentedly easy,which greatly promotes the development of data mining tech-niques.Research institutions,business organizations,and individuals are able to discover knowledge from data through data mining techniques very easily.However,improper use of data often brings serious pri'vacy disclosure problems,causing both legal and moral disputes.In addition,the sharing of data is restricted,hindering the development of data mining technologies.In recent years,privacy preservation in data mining has become a heated topic.The core task of privacy preserving data mining is to design data mining models,which will not infringe the accuracy too much under the premise of data privacy.In other words,it aims to seek the balance between algorithm privacy and accuracy.In this thesis,we summarize and analyze the existing privacy preservation techniques.We introduce the mainstream privacy preservation techniques,including data encryption,data anonymization,and data perturbation.We also make a contrast on the implementa-tion approaches and main characteristics of these techniques.In particular,aiming at the privacy leakage problem in traditional data mining applications,the advanced differential privacy preservation model is introduced in this thesis.Differential privacy provides a robust design that does not depend on the background knowledge of the attackers,and provides rigorous mathematical proof.It can effectively protect the data privacy in the process of data mining.In the classification and recommendation algorithms involved in this thesis,we study the implementation mechanisms and key points of realizing differen-tial privacy in the corresponding data mining applications.We also study the calculation and allocation problems of privacy budget to guarantee that the proposed algorithms can effectively preserve the accuracy while satisfying differential privacy.In this way,we achieve the goal of accurate data mining with privacy preservation.The major works in this thesis can be summarized with the following aspects:(1)A differentially private classification algorithm based on decision tree induction and the ensemble strategy.Decision tree classification is one of the most impor-tant classification algorithms in the field of data mining.In the process of decision tree construction,the branch structure and leaf nodes are derived through statis-tical evaluation.Focusing on the privacy leakage in the process of decision tree construction,this thesis analyzes the count queries in the tree model,and clarifies the shortcomings of implementing differential privacy directly based on the Laplace mechanism.Under the premise of satisfying differential privacy,we propose an algorithm to use the maximal class label to measure the importance of branch at-tributes,thus to greatly reduce the privacy budget cost.In addition,we design a privacy budget allocation strategy based on the number of branch attributes to balance the signal-to-noise ratio,in order to solve the problem of intensive queries and accumulation of privacy budget in the evaluation process of branch attributes.Aiming at the instability problem of private single tree model,we further enhance the model through ensemble strategy.Experiments show that the proposed algorithm can guarantee the accuracy of classification under the premise of differential privacy.(2)A differentially private classification algorithm based on Transductive Support Vec-tor Machine.In the scenario where labeled data is insufficient,we design a privacy preserving classification algorithm through training on both labeled and unlabeled data.Different from the perturbation based privacy preserving Support Vector Machine classification,the proposed algorithm puts the focus of privacy preserva-tion on the classification results of unlabeled data,instead of the training process.Based on the theory of Transductive Support Vector Machines,the algorithm is de-signed to construct an approximately correct label pool to provide label assignment candidates.We measure the consistency between the candidate label assignment in the pool with the authoritative assignment,and make the algorithm differen-tially private through randomly sampling the final result based on the exponential mechanism.With the approximately correct requirements,the utility of the final classification results can be guaranteed.Experiments indicate that the proposed algorithm can classify the unlabeled data accurately.(3)A differentially private recommendation algorithm based on auto-encoders.Rec-ommender systems predict user preference based on history rating data.In most applications,the privacy preservation of user data is implemented through introduc-ing perturbation to the objective function and then optimize.However,the rating data is fairly sparse and the granularity of user rating vector is very large,causing huge sensitivity in the evaluation of perturbation scale.Therefore,the perturbed model accuracy can not be guaranteed adequately.This thesis studies the recon-struction of sparse user rating vector based on the theory of auto-encoders in deep learning.In the process of model training,collaborative filtering recommendation is achieved by minimizing the error between all reconstructed user ratings and the original ones.Considering the problem of privacy preservation in the model com-putation,the Gaussian mechanism is introduced to perturb the gradients in the process of parameter solving.Thus the recommender algorithm is made differen-tially private.Experiments indicate that the proposed algorithm can accomplish the recommendation task with high accuracy.
Keywords/Search Tags:Privacy preserving data mining, differential privacy, perturbation, clas-sification, unlabeled data, recommendation, privacy budget, deep learning
PDF Full Text Request
Related items