Font Size: a A A

Research On Data Mining Privacy Preserving Method Based On Random Perturbation

Posted on:2021-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:J C ShanFull Text:PDF
GTID:2518306230978239Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of big data and analysis technology,a large amount of data is collected and used for data mining tasks.However,the direct release of data for mining or sharing purposes will inevitably lead to the disclosure of users' sensitive information and invasion of privacy.Therefore,privacy protection data mining(PPDM)is particularly important for applications based on big data.The perturbed-based method is a practical and efficient PPDM method,which mainly protects the privacy of data by modifying the values of sensitive attributes.However,the trade-off between data availability and privacy has always been a problem.This paper studies the disturbance technology in recent years,summarizes the advantages and disadvantages of the existing disturbance technology,and proposes two privacy protection methods based on the perturbation technology.The first is the random range noise perturbation method,which randomly selects the disturbance property for each data record and generates noise according to the range of the property.Different from the traditional method of perturbing the same attributes of the whole data set,the method proposed in this paper perturbs different attributes in different data records.In addition,this paper improved the statistical normalization method to calculate the disturbance interval,and used three commonly used classifiers--support vector machine(SVM),Bayesian classification(Bayesian)and classification regression tree(CART)for evaluation.Experimental results show that this method can greatly improve the availability of data on the basis of protecting data privacy,compared with uniform noise perturbation(UND)and improved non-negative matrix factorization(*NMF).The second is a new method of privacy protection data mining based on pp-svm random perturbation.This method is based on SVM machine learning algorithm for data mining.The WBC and JHI data sets in UCI machine learning database are used in the experiment,and the privacy protection parameters of machine learning are used for security assessment.The experiment shows that the data mining method based on random perturbation can guarantee the zero error of accuracy and protect the privacy of data sets in binary classification andmulti-classification SVM models.The first method is applicable to a variety of classifier models,and the second method is applicable to SVM model.Both algorithms can protect data privacy well.The prediction accuracy of the second random perturbation method based on SVM is better than that of the first one,which is more widely applicable to the model.
Keywords/Search Tags:Privacy preserving, Data mining, Random perturbation, PPDM, PP-SVM
PDF Full Text Request
Related items