Research On Privacy-preserving Collaborative Ensemble Learning

Posted on:2018-02-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y Li

Full Text:PDF

GTID:2348330536968736

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Ensemble learning is a critical tool in big data analysis,and it learns a series of rules then combines them together to solve an intractable problem,such as classification,regression.It poses a threat to ensemble learning research that how to guarantee performance and personal privacy of integrated model simultaneously,when multiple parties collaboratively build the integrated model.Suppose that multiple parties build local ensemble models independently,due to limited data sets and computer power,the ensemble models can not satisfy requirement of big data mining.Traditional solution that multiple parities directly share data sets or knowledge extracted from ensemble learning models may disclose users' privacy potentially.This paper proposes collaborative ensemble learning framework and concrete algorithms under differential privacy constraint,and it is a feasible solution to the problem.The main idea is that multiple parities construct local ensemble classifiers under differential privacy,and these classifiers ensure personal privacy.Integrate these local models at semi-honest central agent,and then distribute integrated model to parities again.The integrated function considers both accuracy and data size,so that different local models have different weights in the integrated model.Under this framework,paper presents collaborative random forests under differential privacy(CRFsDP)and collaborative adaptive boosting under differential privacy(CAdaBoostDP).Theoretical analysis and extensive experimental results show that our solution can achieve a good balance between privacy and utility,and personalized privacy budget setting can improve the performance of the integrated model indeed.Based on proposed solution,we discuss the privacy issue of click-through-rate(CTR)predication of online advertising.CTR predication is core in computational advertising area,and it is a fundamental problem in direction of advertising recommendations,product positioning and user portraits.First,we design obfuscation strategy,which is simply adding noisy records to data set.Secondly,we compare these two methods,namely obfuscation and differential privacy,on real data set KDD CUP 2012.The experimental results explain that differential privacy strategy makes the performance of the integrated model better,and amount of noise added is easier to control.Finally,we design and implement CTR predication system under differential privacy framework,then simulate the effect of system in real scene.The study of the CTR predication further illustrates that the solution is available to practical problem.

Keywords/Search Tags:

Ensemble Learning, Differential Privacy, Random Forests, Adaptive Boosting, Click-through-rate Predication

PDF Full Text Request

Related items

1	Research On Ensemble Learning With Differential Privacy
2	Personalized Recommendation For E-Commerce Users Based On Ensemble Learning
3	Recommendation Algorithm Based On Random Forests And Boosting Thought Research
4	Research On Ensemble Classification Methods With Differential Privacy
5	Research On Advertisement Click Through Rate Prediction Based On The Integration Of ADIN And FM
6	Research On Key Algorithms Of Ensemble Learning For Click-Through Rate Prediction
7	Research On Federated Learning Method Based On Differential Privacy
8	Research On Adaptive Boosting Algorithm And Ensemble Classifier
9	Prediction Model Of Personal Customer Default Rate Of Financial Institution Using Ensemble Learning Algorithm
10	Research On Privacy Protection Of Multi-Source Data Based On Differential Privacy