Font Size: a A A

Research And Implementation Of Data Desensitization System For Preserving Statistical Characteristics Of Sensitive Data

Posted on:2020-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:X MengFull Text:PDF
GTID:2428330602451376Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the rapidly developing mobile Internet environment,user data is experiencing explosive growth,and the industry's attention to data has become increasingly high.The release of real user data for researchers to analyze and mine,which has made significant contributions to scientific research,has also become one of the channels for leaking a large number of user privacy.Once this sensitive information is leaked,it may not only cause personal troubles,such as receiving a large amount of spam,text messages,telephone calls,etc.,which may seriously damage its personal reputation and pose threats to personal and property safety.For instance,the location and activity of the information may be tracked and attacked after the disclosure,which is unacceptable due to legal and moral constraints.Therefore,it is necessary to take privacy protection mechanisms for sensitive information in a timely manner.This puts the contradiction between data privacy protection and data availability early in our eyes.In order to extract valuable information from massive data and thus improve social productivity,the most urgent need to solve is the privacy of user data and the availability of data after desensitization.Therefore,the research and development of data desensitization algorithms has become a top priority.There is a trade-off relationship between privacy protection and usability of data,which are two major problems that must be solved before data is released.This thesis introduces the overall framework of the data desensitization system that retains the statistical characteristics of sensitive data.Firstly,it studies the way of risk disclosure risk assessment,and on this basis,develops related tools;Then,for the problem of privacy leakage,the data desensitization system that retains the statistical characteristics of data is researched and implemented comprehensively and deeply.For K-anonymous data set,the privacy leak risk assessment tool implemented in this thesis can detect its data size,K-anonymity level,L-diversity level,and attribute columns belonging to the HIPAA identifier,and calculate the re-identification risk of the data set under the three specific attack models,mining the specific record index and content with the greatest risk and 1-diversity.For structured data of numeric types,the two desensitization schemes implemented in this thesis achieve the purpose of retaining the statistical characteristics of mean,variance and inner product,Euclidean distance,first-order and second-order sum,respectively.For structured data of label types,the scheme implemented in this thesis achieves the purpose of retaining thestatistical characteristics of frequency and percentage.For the currently widely used geographic information data,this thesis proposes an attack algorithm that uses a third-party path planning API to attack a location K-anonymous data set,and proves that the attack algorithm can effectively capture the security vulnerabilities of the K-anonymous location data set and obtain sensitive information.At the end of the thesis,two enhanced K-anonymous location privacy schemes are proposed,which achieve the dual effects of resisting the threat of the attack algorithm on the data set and retaining the statistical characteristics of the K-anonymity protection level of the original desensitization data set.The reliability of the two enhanced schemes is demonstrated by a large number of experimental results.
Keywords/Search Tags:big data security, K-anonymous, risk assessment, data desensitization, statistical characteristics, track privacy
PDF Full Text Request
Related items