Research On Principal Component Analysis Algorithm Under Differential Privacy

Posted on:2021-04-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Xu

Full Text:PDF

GTID:2428330614963742

Subject:Information security

Abstract/Summary:

PDF Full Text Request

With the continuous development of Big Data,the data stored in various information systems is becoming more and more abundant,increasing the difficulty of data analysis and processing.Principal Component Analysis(PCA)is a standard data analysis method,which can be used to reduce the data dimensionality.More specifically,it projects the original high-dimensional data to the space of principal components composed by the eigenvectors of the covariance matrix of the data to get low-dimensional data,which can represent most of information of the original data.PCA simplifies the data,making data easier to use while saving on the computational complexity of the algorithm.For example,face recognition is much faster when first projecting the data into lower dimension.Financial and medical data often deal with private or sensitive information.If machine learning tasks or data mining algorithms work directly on the original data,the outputs of these algorithms will leak private information,which may pose potential threats to individuals.Therefore,privacy preservation has become an urgent problem that needs to be solved.Differential Privacy(DP)is an effective and provable privacy protection model,it adds noise to query results and analysis results to protect privacy data.Differential privacy is supported by solid mathematical theory,and the parameter is used to quantify the level of privacy protection,so that the privacy protection levels provided by data sets processed under different parameters are comparable,which makes up for the shortcomings of traditional privacy protection models.Differential privacy can guarantee that no matter how strong the attacker's background knowledge is,it is still impossible to infer the information of a specific data record.There are two approaches to making approximate PCA while satisfying differential privacy: input perturbation and output perturbation.Input perturbation adds noise to the data before computing the PCA,while output perturbation adds noise to the output of PCA.Both of two approaches can effectively simplify data and preserve the data privacy,however there are few studies on their performance.At the same privacy protection level,better performance means better data utility.In view of this problem,this thesis apply Laplace mechanism to propose two differential privacy principle component analysis algorithms Laplace Input Perturbation(LIP)and Laplace Output Perturbation(LOP).We also offer two utility evaluation criterias i.e.,noise magnitude and approximation error to evaluate the performance of two algorithms.We compare the security and utility of the two algorithms from the perspective of theoretical proof and experimental verification,it is found that the algorithm LIP provides higher utility under the same level of privacy protection.Aiming at the practical utility of differential privacy principal component analysis and privacy protection of support vector machines,two algorithms DPPCA-SVM and PCA-DPSVM are proposed.Both algorithms can achieve fast classification while reducing the risk of sample leakage in the dataset.,But in terms of availability,DPPCA-SVM classification accuracy is higher.Finally,through experiments on three data sets,it is verified that the two high-quality algorithms LIP and DPPCA-SVM proposed in the thesis have stronger utility than existing algorithms.

Keywords/Search Tags:

Differential Privacy, Principal Component Analysis, Support Vector Machine, Input Perturbation, Output Perturbation

PDF Full Text Request

Related items

1	Feature Selection Based On Differential Privacy
2	Research On Some Problesm Of Support Vector Machine Learing Algorithm
3	A Study Of Privacy-Preserving Data Mining Based On Multiplicative Perturbation
4	Research On Theories And Algorithms Of Gradient Perturbation And Alternating Minimization For Differential Privacy Protection
5	Research On Face Recognition Method Based On Principal Component Analysis And Multi-core Support Vector Machine
6	Differential Privacy Based Principal Component Analysis Algorithm Design
7	Study On Detection Method Of Solder Joint Based On Principal Component Analysis And Support Vector Machine
8	Research And Design On Privacy-Preserving Data Mining Approaches And Algorithms
9	Fuzzy Support Vector Machine
10	Research On Deep Learning Methods To Support Privacy Protection