Font Size: a A A

Research On Naive Bayesian Classification Method Based On Differential Privacy

Posted on:2022-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:W R TangFull Text:PDF
GTID:2518306344951229Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of the Internet field,the development of social networks and e-commerce is also getting faster and faster.People communicate on social software through the Internet and use e-commerce platforms such as Taobao,JD.com,and Pinduoduo for online shopping.While the Internet brings people a lot of convenience,privacy leakage incidents also occur from time to time.Many enterprises and organizations have collected a lot of user data from their respective fields,and these large amounts of data often contain a lot of valuable information.Enterprises and organizations hand these information to data miners for analysis to obtain the value behind the data and guide the further decision-making of enterprises and organizations.However,when untrusted data miners directly access these data,it is likely to produce the risk of privacy disclosure.Therefore,it is of great significance to protect data privacy during data mining.Naive Bayes classification algorithm is one of the most widely used classification algorithms in data mining algorithms.Because of its simplicity and effectiveness,it is widely used in classification tasks.Naive Bayes classification algorithm needs to access data parameters such as count value,mean value,standard deviation and other parameters.However,when untrusted data miners directly access these data,it is very likely to cause inference-based attacks.Aiming at the privacy leakage problem caused by the Naive Bayes classification,combined with the differential privacy protection technology,this thesis proposes a Naive Bayes classification method based on differential privacy.In view of the problem that its classification efficiency is low in high-dimensional datasets,a differential privacy Naive Bayes classification method based on Haar wavelet transform is proposed in combination with differential privacy protection technology and Haar wavelet transform,which is widely used in noise reduction and compression fields.The main research work of this thesis is as follows:(1)This thesis introduces the problem of privacy leakage in data mining and the method of privacy protection in data mining.Data mining,Naive Bayes classification algorithm,differential privacy protection method,wavelet transform and standardization methods are introduced in detail.At the same time,the research status of data mining privacy protection methods is comprehensively analyzed.(2)In order to solve the problem of privacy leakage in Naive Bayes classification algorithm,a new Naive Bayes classification method NBDP based on differential privacy is proposed.For categorical attributes,the method satisfies differential privacy by adding Laplace noise to the count value.For numerical attributes,first assume that they obey Gaussian distribution,Laplace distribution or lognormal distribution,Laplace noise is added to the count value,mean value,standard deviation,scale and other parameters in the data,and then the probabilities of the item to be classified belonging to each category are calculated by using the noise added parameters,and finally the category of the item to be classified is obtained.Finally,the effectiveness of the method is evaluated by experiments on two real datasets from UCI database and the datasets synthesized by MATLAB.The experimental results show that the method not only protects the data privacy,but also has high data utility.(3)In order to solve the problem that the Naive Bayes classification algorithm based on differential privacy has low classification utility for high-dimensional datasets,NBDPNHWT1 algorithm,NBDP-NHWT2 algorithm,NBDP-NHWT3 algorithm and NBDPNHWT4 algorithm are proposed,that is;different standardized transformations are carried out on the original dataset respectively,and then Haar wavelet transform with specified decomposition stop level is applied to the data after standardized transformation.By retaining the non-zero approximate coefficients of the specified decomposition stop level,the dimension of the data is reduced,and the reduced result set is obtained,and the noisy data is used to train Naive Bayes classifier.Finally,the category of the item to be classified is obtained.The experimental results show that the NBDP-NHWT3 algorithm,that is,after the Z-score standardized transformation of the original dataset,and then the Haar wavelet transform,noise addition,Naive Bayes classification and other operations on the transformed data,has higher classification accuracy and F1-measure than the NBDP algorithm.
Keywords/Search Tags:data mining, privacy protection, naive bayes classification, differential privacy, wavelet transform
PDF Full Text Request
Related items