Font Size: a A A

Reaserch And Implementation Of Structured Privacy Data Desensitization System

Posted on:2020-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2428330611999662Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Unreasonable use of personal data related to privacy information such as behavior or status may cause leakage of private information.Therefore,how to define,discover,and desensitize private data is an urgent problem in the process of personal data.The thesis studies the existence of intimate information in structured data,proposing how to identify sensitive information in structured data and researching data desensitization technology with different granularity.A flexible,comprehensive,efficient and structured data desensitization system is implemented in view of the design of Spark platform to satisfy their desensitization needs.Firstly,in view of the primary problem of how private information exists in personal data,by analyzing the structural characteristics of the data and the background knowledge of the hypothesized attacker,the paper proposes the existence of private information in three structured data: single attribute column form,multiple attribute columns form as well as the overall form.Secondly,for the problem that the artificially defined private information is too dependent on subjective experience,a double-Bloom filter-based identifier recognition method is proposed for the private information in the form of single attribute column;For the private information in the form of multiple attribute columns,an infrequent attribute column mining algorithm is proposed.Thirdly,the paper studies the desensitization technology in the form of different privacy information.For the single attribute column form,common desensitization techniques such as masking are implemented,and the encryption and hash methods are modified to ensure data integrity.For the multi-attribute column form,with K-Anonymity as the core,the automatic construction generalization layer method is designed for string,numeric and enum type attribute columns respectively,and a non-global greedy generalization algorithm is proposed;For the overall form,the histogram publishing technique is utilized to meet the structured privacy data desensitization process required by the differential privacy framework.For the measurement of data validity loss after desensitization,four methods for measuring data loss loss are proposed.Finally,a structured privacy data desensitization system was designed and implemented based on the above problems.Considering the amount of desensitized data,this paper implements desensitization technology based on Spark distributed computing platform;in order to maintain the business attributes of desensitization data and improve desensitization efficiency,the concept of desensitization template is proposed and template management function is provided to facilitate desensitization users.The desensitization template was shared to improve desensitization efficiency and safety.The functions of sensitive information identification,data desensitization and template management were tested.The test showed that the system met the design goals.
Keywords/Search Tags:privacy protection, sensitive information, data masking, anonymization
PDF Full Text Request
Related items