A Study Of Statistical Inference On Massive Data Based On Differential Privacy

Posted on:2024-03-23

Degree:Master

Type:Thesis

Country:China

Candidate:J S Song

Full Text:PDF

GTID:2556306923475434

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

In recent years,thanks to the development of information technology,various industries such as finance,technology,healthcare,education and construction have been actively promoting digital reform,which is inevitably supported by massive data.Massive data,known as the "fourth paradigm" of scientific research,represents a new milestone in the development of science and technology,bringing about great changes to people’s lives and having a great impact on social and economic development.At the same time,traditional data processing methods and data analysis processes need to be improved in order to extract valuable information from the vast amount of data.The recent introduction of three laws,the Cyber Security Law,the Data Security Law and the Personal Information Protection Law,signifies the growing concern of our country about data security issues.In academic circles,the issue of privacy protection has also become a relatively hot topic at the moment.There are various ways to protect privacy,such as cryptography,anonymisation,random perturbation,etc.One of the most widely used is the Differential Privacy(DP)protection mechanism proposed by Dwork et al.in 2006.It is based on the idea of random perturbation,with a strict mathematical definition and axiomatic representation,and provides a good trade-off between privacy protection and data accuracy,which is a good way to achieve usable invisibility of data.In recent years,differential privacy has gradually become a default standard for data encryption.Although many algorithms have been proposed and practised in this framework,statistical inference of privacy-preserving data remains a challenge.In particular,traditional statistical analysis methods are difficult to apply to privacy-preserved data due to the uncertainty of the added noise variance and the distribution of the estimated quantities.In this paper,a new differential privacy-preserving mechanism is proposed in the context of massive data,and a general statistical inference framework,including parametric hypothesis testing and confidence interval estimation,is constructed on this basis.Considering the characteristics of massive data itself,if traditional statistical computation methods are used,it will inevitably lead to problems such as excessive computation and difficulty in achieving statistical analysis objectives.the BLB self-service sampling method can produce robust results in statistical inference of large data sets,which greatly improves computational efficiency.However,this sampling method does not take into account the privacy protection of the original data.Therefore,this paper improves the existing differential privacy algorithm and combines it with the BLB method to propose a new differential privacy mechanism to achieve statistical analysis that can be performed on aggregate parameters without exposing individual private data.At the same time,in order to address both the heterogeneity of the noise variance under the differential privacy mechanism,and the uncertainty of the distribution of the estimates,we use the central limit theorem under non-linear expectation theory to construct the corresponding test statistic and propose a hypothesis testing method.In addition,we demonstrate the excellent performance of our proposed inference procedure through a data simulation study.The differential privacy-preserving mechanism based on massive data proposed in this paper satisfies the privacy protection requirement without affecting the subsequent statistical inference,and is a reference value for the sharing of relevant data and statistical analysis.

Keywords/Search Tags:

differential privacy, massive data, BLB algorithm, asymptotic normality, statistical inference

PDF Full Text Request

Related items

1	The Dilemma And Countermeasures Of Criminal Defense In The Context Of Big Data
2	Research On Privacy Protection Government Data Sharing Based On Blockchain
3	The Design And Implement Of Common DSS Model And It's Algorithm
4	Crisis And Response Of Privacy Protection In The Era Of Big Data
5	Establishment Of The Network Model Based On Longitudinal Data With Its Applications
6	Research On The Protection Path Of Consumers’ Fair Trading Rights In Differential Pricing By Big Data Technology
7	Research On Privacy-preserving Classification Technology Based On Differential Privacy
8	A Study On The Mechanism Of Ensuring The Quality Of Statistical Data In Grass Roots Villages And Towns
9	Research On Privacy Protection In Governmental Data Sharing
10	Data Management Research Of Government Statistical Institutions