Statistical Inference Of Multiple Response Variable Regression Model Based On Double Privacy Protection

Posted on:2024-05-16

Degree:Master

Type:Thesis

Country:China

Candidate:T T Zhao

Full Text:PDF

GTID:2530306923974239

Subject:Probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

Due to the rapid development of mobile internet,big data and other network technologies,people are generating more and more data in their daily life,which contains more or less personal privacy,such as personal location and online consumption.Cloud computing provides a storage platform for this data,allowing the potential value behind the data to be fully exploited,but when this data is outsourced to the public cloud,the data holder may face a huge risk of personal privacy being compromised.Therefore,during the data mining process,it is important to consider privacy predictions that protect the sensitive information of data holders and ensure that data processors comply with data security laws.Among the currently popular approaches,differential privacy has been widely used to protect sensitive information in data processing,but can still be a black-box operation for data holders,leading to distrust of data processors by data holders.To address the problem of distrust of data holders towards data processors during data processing,this paper aims to enhance the trust of data holders towards data processors by proposing a double privacy protection method,which first performs differential privacy protection on the statistical algorithm,and then generates synthetic data based on this method using plugin sampling.The regression coefficient matrix and covariance array are estimated accordingly based on the synthetic data in the context of a multiple response variable regression model.Theoretical results establish the distribution of specific estimators for the synthetic data,complete with two exact inference methods based on mean synthetic covariance(MSC)and on a combination of mean synthetic covariance and cross synthetic covariance(MSC_CSC),i.e.statistical tools for hypothesis testing are provided.This paper presents a simulation study of the two exact inference methods proposed under the double privacy protection,and finds that the estimated confidence region coverage probability of the regression coefficient matrix A is approximately equal to 0.94,which assesses the validity of the statistical inference and confirms that the double privacy protection method proposed in this paper can provide useful information for statistical analysis while protecting the privacy of the original data.Finally the paper discusses the application of the 2000 US current population survey public use data,showing that the proposed inference method is still valid and that the risk of privacy breach is lower than the single means of protecting synthetic data generated by the plug-in sampling method.This paper is fully structured,firstly demonstrating the scientific validity of double privacy protection at a theoretical level,with a preliminary conjecture that combining differential privacy with plug-in sampling will reduce the risk of privacy breach compared to single protection;secondly establishing two exact inference methods based on the likelihood principle;finally confirming the validity of the proposed statistical inference through simulations,showing that data processed with double privacy protection can still be statistically analysed,and evaluating the risk of privacy breach for real data,finding that it does provide a higher level of protection than single means.

Keywords/Search Tags:

Privacy protection, Synthetic data, Differential privacy

PDF Full Text Request

Related items

1	Research On The Protection Method Of Sensitive Information In Meteorological Dat
2	Research On Privacy Protection Of Weighted Social Network Data Publishing Based On Differential Privacy And Closeness Centrality
3	Research Of Data Modeling And Algorithm Based On The Privacy Protection
4	A Personalized Privacy Protection Anonymous Method For Data Publishing
5	The Game Model Of Differential Privacy And Its Application
6	Research On The Theory And Algorithm Of Non-convex Regularized Optimization For Differential Privacy Protectio
7	Local Differential Privacy Protection Technology Of Social Network Based On Hierarchical Random Graph
8	Publishing Triangle Counting Histogram In Social Networks Based On Differential Privacy
9	Research On Privacy Protection Method Based On K-Anonymity In The Social Network
10	A Privacy Protection And Integrity Verification On Counting Query Scheme For Genomic Data