Font Size: a A A

Research On Principal Component Analysis Algorithm Under Differential Privacy

Posted on:2021-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y H XuFull Text:PDF
GTID:2428330614963742Subject:Information security
Abstract/Summary:PDF Full Text Request
With the continuous development of Big Data,the data stored in various information systems is becoming more and more abundant,increasing the difficulty of data analysis and processing.Principal Component Analysis(PCA)is a standard data analysis method,which can be used to reduce the data dimensionality.More specifically,it projects the original high-dimensional data to the space of principal components composed by the eigenvectors of the covariance matrix of the data to get low-dimensional data,which can represent most of information of the original data.PCA simplifies the data,making data easier to use while saving on the computational complexity of the algorithm.For example,face recognition is much faster when first projecting the data into lower dimension.Financial and medical data often deal with private or sensitive information.If machine learning tasks or data mining algorithms work directly on the original data,the outputs of these algorithms will leak private information,which may pose potential threats to individuals.Therefore,privacy preservation has become an urgent problem that needs to be solved.Differential Privacy(DP)is an effective and provable privacy protection model,it adds noise to query results and analysis results to protect privacy data.Differential privacy is supported by solid mathematical theory,and the parameter is used to quantify the level of privacy protection,so that the privacy protection levels provided by data sets processed under different parameters are comparable,which makes up for the shortcomings of traditional privacy protection models.Differential privacy can guarantee that no matter how strong the attacker's background knowledge is,it is still impossible to infer the information of a specific data record.There are two approaches to making approximate PCA while satisfying differential privacy: input perturbation and output perturbation.Input perturbation adds noise to the data before computing the PCA,while output perturbation adds noise to the output of PCA.Both of two approaches can effectively simplify data and preserve the data privacy,however there are few studies on their performance.At the same privacy protection level,better performance means better data utility.In view of this problem,this thesis apply Laplace mechanism to propose two differential privacy principle component analysis algorithms Laplace Input Perturbation(LIP)and Laplace Output Perturbation(LOP).We also offer two utility evaluation criterias i.e.,noise magnitude and approximation error to evaluate the performance of two algorithms.We compare the security and utility of the two algorithms from the perspective of theoretical proof and experimental verification,it is found that the algorithm LIP provides higher utility under the same level of privacy protection.Aiming at the practical utility of differential privacy principal component analysis and privacy protection of support vector machines,two algorithms DPPCA-SVM and PCA-DPSVM are proposed.Both algorithms can achieve fast classification while reducing the risk of sample leakage in the dataset.,But in terms of availability,DPPCA-SVM classification accuracy is higher.Finally,through experiments on three data sets,it is verified that the two high-quality algorithms LIP and DPPCA-SVM proposed in the thesis have stronger utility than existing algorithms.
Keywords/Search Tags:Differential Privacy, Principal Component Analysis, Support Vector Machine, Input Perturbation, Output Perturbation
PDF Full Text Request
Related items