Font Size: a A A

Research On Privacy Preserving Big Data Publishing Technology

Posted on:2019-04-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YanFull Text:PDF
GTID:1368330596453882Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
The rapid development of Mobile Internet and the widespread use of smart terminals lead to continuous increase in the digitization of personal information,which promotes the arrival of the era of big data.Big data plays an significant role in promoting technology development in various industries and improving service capabilities of data resources.However,it also brings serious challenges to personal privacy security.The multi-source heterogeneous and dynamic publishing features of big data enhanced correlationship between different data sources,which easily lead to the disclosure of private information and the failure of privacy protection methods.These will not only damage users' reputation,property and life safety,but even threaten the national information security.Therefore,the research on privacy preserving data publishing is an important part related to security application and further developments and use of big data.This dissertation aims at the issue of privacy protection during the process of big data publishing,and analyzes the privacy risk factors and uncertainties.System of indicators and evaluation methods of privacy risk situation assessment are designed to form "pre-warning system" for the privacy preserving publishing of big data.A simplified big data association representation method as well as the quasi-identifier attribute identification algorithm are proposed,which help to determine the set of key attributes for privacy protection operation.According to different characteristics of static big data and dynamic big data,corresponding privacy preserving publishing algorithms are designed.Compared with some existing similar algorithms,the privacy protection effect and algorithm performances are significantly improved.The research work of the full dissertation includes the following aspects:(1)Research on privacy risk situation assessment of big data publishing.Combined with the characteristics and application modes of big data release,privacy risks,privacy assets,privacy threats and privacy vulnerabilities are defined for the environment of big data publishing.A three-level privacy risk situation assessment index system is established.Then,a privacy risk situation assessment method is designed based on the theory of set pair analysis,and a least squares partial weighting method is proposed based on the partial connection numbers,which can eliminate the interference and influence of uncertainty factors on allocation of weighting.Case analysis and comparison experiments show that the proposed privacy risk situation assessment method better reflects the status and development trend of privacy risk indicators,and can track and evaluate the privacy risk status and risk factors of the big data release system dynamically.(2)Research on relevance representation of big data and identification of quasi-identifier attributes.Aiming at the problem that multi-source heterogeneous big data have concealedand complex relevance and easily lead to the disclosure of privacy,a graph-based big data entity association representation method is designed,which is further abstracted into attribute graph by linking the publishing data with published data and external knowledge.The roles of quasi-identifiers within attribute graph have been analyzed and defined.The problem of determining quasi-identifier attributes is converted into the problem of finding cut-vertexes for attribute graph from the perspective of the independence of set.Further more,a quasi-identifier partitioning algorithm is designed based on cut-vertex,which determined the set of key attributes to prevent linking attacks and implement privacy protection operations.Compared with some existing quasi-identifier identification methods,the proposed algorithm has better partitioning effect and lower computational complexity.(3)Research on fuzzy publishing technology for privacy protection of static big data.Aiming at the problems of traditional k-anonymity model such as computational complexity,large information loss and difficult problem of determining k value,a fuzzy semantics publishing method for static big data is proposed.The fuzzy publishing algorithm based on set-pair cloud model and the semantic generalization tree are designed according to different characteristics of numerical sensitive attributes and categorical sensitive attributes.Parameters such as fuzzy semantic distinction and generalized reserve degree are designed to reflect the relationship between the published information and the original information.Comparison examinations carried out on Ali cloud platform show that the proposed algorithm has lower computational complexity and better availability of published data than other privacy preserving fuzzy semantics and clustering algorithms.(4)Research on differential privacy decomposition technology for privacy protection of dynamic location big data.Aiming at the uncertainty problems of spatial index structure and privacy budget allocation in dynamic publishing of statistical location big data,a hierarchical hybrid decomposition algorithm is proposed based on differential privacy model.The method of continuously publishing data snapshots on average time interval is used to sample and smooth the dynamical location big data.Spatial clustering of location big data snapshots on different sampling times are carried out by the adaptive density grid partitioning algorithm.Heuristic quad-tree partitioning method based on regional uniformity as well as corresponding dynamic privacy allocation strategy were proposed,which not only solved the problem of determining stop condition for top-down space decomposition,but also equalized the impact of noise error and uniform hypothesis error on query accuracy after publishing.Comparison examinations carried out on Ali cloud platform show that the proposed algorithm has larger advantages in improving regional query accuracy and the operating efficiency.
Keywords/Search Tags:Big Data, Data Publishing, Privacy Protection, Risk Situation Assessment, Set Pair Analysis, Differential Privacy
PDF Full Text Request
Related items