Font Size: a A A

Research On Privacy-preserving Collaborative Ensemble Learning

Posted on:2018-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2348330536968736Subject:Engineering
Abstract/Summary:PDF Full Text Request
Ensemble learning is a critical tool in big data analysis,and it learns a series of rules then combines them together to solve an intractable problem,such as classification,regression.It poses a threat to ensemble learning research that how to guarantee performance and personal privacy of integrated model simultaneously,when multiple parties collaboratively build the integrated model.Suppose that multiple parties build local ensemble models independently,due to limited data sets and computer power,the ensemble models can not satisfy requirement of big data mining.Traditional solution that multiple parities directly share data sets or knowledge extracted from ensemble learning models may disclose users' privacy potentially.This paper proposes collaborative ensemble learning framework and concrete algorithms under differential privacy constraint,and it is a feasible solution to the problem.The main idea is that multiple parities construct local ensemble classifiers under differential privacy,and these classifiers ensure personal privacy.Integrate these local models at semi-honest central agent,and then distribute integrated model to parities again.The integrated function considers both accuracy and data size,so that different local models have different weights in the integrated model.Under this framework,paper presents collaborative random forests under differential privacy(CRFsDP)and collaborative adaptive boosting under differential privacy(CAdaBoostDP).Theoretical analysis and extensive experimental results show that our solution can achieve a good balance between privacy and utility,and personalized privacy budget setting can improve the performance of the integrated model indeed.Based on proposed solution,we discuss the privacy issue of click-through-rate(CTR)predication of online advertising.CTR predication is core in computational advertising area,and it is a fundamental problem in direction of advertising recommendations,product positioning and user portraits.First,we design obfuscation strategy,which is simply adding noisy records to data set.Secondly,we compare these two methods,namely obfuscation and differential privacy,on real data set KDD CUP 2012.The experimental results explain that differential privacy strategy makes the performance of the integrated model better,and amount of noise added is easier to control.Finally,we design and implement CTR predication system under differential privacy framework,then simulate the effect of system in real scene.The study of the CTR predication further illustrates that the solution is available to practical problem.
Keywords/Search Tags:Ensemble Learning, Differential Privacy, Random Forests, Adaptive Boosting, Click-through-rate Predication
PDF Full Text Request
Related items