Research On Naive Bayesian Classification Method Based On Differential Privacy

Posted on:2022-06-16

Degree:Master

Type:Thesis

Country:China

Candidate:W R Tang

Full Text:PDF

GTID:2518306344951229

Subject:Computer Software and Application of Computer

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of the Internet field,the development of social networks and e-commerce is also getting faster and faster.People communicate on social software through the Internet and use e-commerce platforms such as Taobao,JD.com,and Pinduoduo for online shopping.While the Internet brings people a lot of convenience,privacy leakage incidents also occur from time to time.Many enterprises and organizations have collected a lot of user data from their respective fields,and these large amounts of data often contain a lot of valuable information.Enterprises and organizations hand these information to data miners for analysis to obtain the value behind the data and guide the further decision-making of enterprises and organizations.However,when untrusted data miners directly access these data,it is likely to produce the risk of privacy disclosure.Therefore,it is of great significance to protect data privacy during data mining.Naive Bayes classification algorithm is one of the most widely used classification algorithms in data mining algorithms.Because of its simplicity and effectiveness,it is widely used in classification tasks.Naive Bayes classification algorithm needs to access data parameters such as count value,mean value,standard deviation and other parameters.However,when untrusted data miners directly access these data,it is very likely to cause inference-based attacks.Aiming at the privacy leakage problem caused by the Naive Bayes classification,combined with the differential privacy protection technology,this thesis proposes a Naive Bayes classification method based on differential privacy.In view of the problem that its classification efficiency is low in high-dimensional datasets,a differential privacy Naive Bayes classification method based on Haar wavelet transform is proposed in combination with differential privacy protection technology and Haar wavelet transform,which is widely used in noise reduction and compression fields.The main research work of this thesis is as follows:(1)This thesis introduces the problem of privacy leakage in data mining and the method of privacy protection in data mining.Data mining,Naive Bayes classification algorithm,differential privacy protection method,wavelet transform and standardization methods are introduced in detail.At the same time,the research status of data mining privacy protection methods is comprehensively analyzed.(2)In order to solve the problem of privacy leakage in Naive Bayes classification algorithm,a new Naive Bayes classification method NBDP based on differential privacy is proposed.For categorical attributes,the method satisfies differential privacy by adding Laplace noise to the count value.For numerical attributes,first assume that they obey Gaussian distribution,Laplace distribution or lognormal distribution,Laplace noise is added to the count value,mean value,standard deviation,scale and other parameters in the data,and then the probabilities of the item to be classified belonging to each category are calculated by using the noise added parameters,and finally the category of the item to be classified is obtained.Finally,the effectiveness of the method is evaluated by experiments on two real datasets from UCI database and the datasets synthesized by MATLAB.The experimental results show that the method not only protects the data privacy,but also has high data utility.(3)In order to solve the problem that the Naive Bayes classification algorithm based on differential privacy has low classification utility for high-dimensional datasets,NBDPNHWT1 algorithm,NBDP-NHWT2 algorithm,NBDP-NHWT3 algorithm and NBDPNHWT4 algorithm are proposed,that is;different standardized transformations are carried out on the original dataset respectively,and then Haar wavelet transform with specified decomposition stop level is applied to the data after standardized transformation.By retaining the non-zero approximate coefficients of the specified decomposition stop level,the dimension of the data is reduced,and the reduced result set is obtained,and the noisy data is used to train Naive Bayes classifier.Finally,the category of the item to be classified is obtained.The experimental results show that the NBDP-NHWT3 algorithm,that is,after the Z-score standardized transformation of the original dataset,and then the Haar wavelet transform,noise addition,Naive Bayes classification and other operations on the transformed data,has higher classification accuracy and F1-measure than the NBDP algorithm.

Keywords/Search Tags:

data mining, privacy protection, naive bayes classification, differential privacy, wavelet transform

PDF Full Text Request

Related items

1	Image Data Publishing Method Based On Differential Privacy
2	Real-time Data Privacy Protection With Adaptive ?-event Differential Privacy
3	Research On Classification Method Of Providing Differential Privacy Protection
4	Research On Data Publishing And Mining Method Based On Differential Privacy
5	Research And Implementation Of Data Release Technology Based On Differential Privacy
6	Research On Privacy Protection Scheme Based On Random Perturbation
7	Research And Application On Privacy Protection For Data Mining
8	Research On Privacy Protection Algorithm For Association Rules Mining
9	Research On Frequency Estimation And Frequent Itemset Mining For Local Differential Privacy Protection
10	Research On Differential Privacy Protection For Data Release