Font Size: a A A

Research On Key Technologies Of Privacy Protection And Compliance On Big Data Platform

Posted on:2022-08-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:L YangFull Text:PDF
GTID:1488306734471804Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The rapid development of big data technology promotes its deep integration and innovative applications in different industries.More and more companies and organizations dig the potential value of data with the data storage and processing capabilities of big data platforms.The increasing amount of data collected and used by enterprises and organizations makes the big data platform bear high data security risks.At present,privacy protection in the data collection stage and data compliance in the data processing stage on the big data platform still face the following challenges :(1)As for the privacy protection of data collected in real time,the dynamic and continuous characteristics of streaming data make the attempts to reduce information loss in the existing static data anonymous algorithms invalid,and the existence of missing values will introduce additional information loss to the algorithm;(2)Data compliance analysis for big data platforms can effectively reduce data abuse,but there is a lack of structured compliance rules describing the scope and purpose of data use to support automated compliance analysis;(3)Data audit components on big data platforms can provide audit logs,but lack data compliance analysis capabilities.Research on key technologies of privacy protection and data compliance on big data platforms is carried out from three aspects: the balance between privacy protection and availability of streaming data on big data platform,automatic generation of data compliance rules,and compliance verification of data processing tasks on big data platform.Specifically,the main research work and innovation of this thesis have the following three parts:(1)A clustering based anonymization algorithm for incomplete streaming dataThe continuous and potentially unlimited characteristics of streaming data make it impossible to scan global data multiple times to reduce information loss like static data anonymization.And the timeliness of streaming data puts forward higher requirements on the output delay of anonymization algorithm.In addition,traditional anonymous algorithms rarely consider missing values in the real environment,and the existing methods to deal with missing values lose a lot of information,thus affecting the availability of data after anonymity.To solve these problems,a clustering based anonymization algorithm for incomplete streaming data is proposed in the thesis.In this method,the streaming data is continuously anonymized and output with a counting-based sliding window and time constraints.And a cluster reuse mechanism is adopted to anonymize the newly achieved data with less information loss.Furthermore,a missing data distance calculation method based on two dimensions of attribute set and attribute value was put forward to support the clustering of missing data,and a generalization method based on Maybe match is proposed to alleviate the additional information loss introduced by missing values.Experiments on several public data sets show that the proposed anonymization algorithm has lower information loss and better availability of output data.Therefore,the research work of this thesis has practical application value in the privacy protection of streaming data.(2)Automatic extraction method of data compliance rules in privacy policyThe unstructured compliance information in the privacy policy is not machine-readable and therefore cannot be directly used for automated compliance analysis.In particular,there are various expressions of data use purposes in privacy policies,and syntactic feature-based methods cannot efficiently identify all purposes in sentences.Aiming at the above problems,a method combining syntax and grammar analysis is proposed to automatically extract purpose-aware rules in privacy policies.Firstly,a purpose-aware rule is proposed to formally describe the data use statement in privacy policy.Secondly,the concepts of explicit purpose and implicit purpose are proposed based on syntactic and semantic analysis,and the two kinds of purposes are identified from sentences by template matching and semantic role labeling model respectively.Because the accuracy of semantic role labeling model is low when it is transferred to the privacy policy domain,the domain adaption method is used to retrain the model with a small number of manual annotated samples,which significantly improves the recognition effect of the model on implicit purpose.Experimental results show that the retrained model based on domain adaptation increases the recall rate of implicit purpose extraction by 13%,and the F1 value of the purpose-aware rules extraction implemented by the model reaches 91%.The method proposed in the thesis is the first to effectively extract purpose-aware rules from privacy policy documents,which provides the rule source for privacy compliance analysis.(3)Graph matching based data compliance verification methodData security policies on the big data platforms only restrict access to limited data resources,rather than data association and data use purpose.In addition,big data platforms lack the native compliance analysis capabilities.Aiming at the above problems,a data compliance analysis model based on graph matching is proposed.By modeling,data rule graph based on directed acyclic graph is proposed to describe the data processing rules and data processing graph to describe the data processing.Then the analysis of the compliance of data processing to rules is naturally transformed into a graph matching problem.In addition,a refinement hierarchical model is proposed to deal with the matching between elements of different granularity in the two graphs.The compliance verification method is implemented on Atlas.And experiments on TPC-DS benchmark prove that the method can effectively analyze the compliance of data processing tasks to three kinds of compliance requirements.The research work of this thesis lays a foundation for data compliance analysis of big data platform.To sum up,the research results of the key technologies of privacy protection and data compliance on big data platforms carried out in this thesis can provide effective privacy protection for streaming data while maintaining high data availability,and realize data compliance analysis on big data platform.It is of great significance to improve the ability of data security protection and improve the means of data compliance check on the big data platforms.
Keywords/Search Tags:Big data platform, Streaming data, Privacy protection, Compliance analysis, Privacy policy
PDF Full Text Request
Related items