Privacy Protection For Dynamic Continuous Publishing Of Structured Relational Data

Posted on:2023-07-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y Hua

Full Text:PDF

GTID:2558307079988299

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology,massive data is continuously collected,stored and released.In the era of big data,more and more platforms,institutions and individuals participate in the process of data release and information sharing.However,the super large-scale data mining and analysis means that the problem of personal privacy information disclosure is becoming more and more serious.With the continuous release of data over time,more personal information will be disclosed.At present,the research on privacy protection of continuous data release is still in its infancy,and there is an urgent need to study more effective privacy protection methods to ensure the security of data release.Firstly,the main anonymous object of continuous data publishing,that is,quasi-identifier,will be studied.Quasi-identifier is an attribute set used to determine the identity of a specific entity in structured data,which can provide an inference path for query attacks.Whether the selection of quasi-identifier is correct and complete is related to the success or failure of privacy protection.Aiming at the problem of how to solve the correct and complete quasi-identifier of single structured relational data,a quasi-identifier solution method based on functional dependency is proposed.According to the semantic relationship and publishing requirements,the attributes of the relationship data to be published are classified into identification attributes,privacy attributes,sensitive attributes and non-sensitive attributes.Secondly,the dependencies on identifying attributes with other attributes in the relational schema is mined according to semantics and instance data in relational data,so as to obtain the quasi-identifier.Finally,the algorithm of quasi-identifier is designed by Python language,and the experiment is implemented in three real data sets of UCI.The results show that the average group record size of the equivalence class divided on the solved quasi-identifier is reduced by 8% compared with other methods,and the average probability of privacy disclosure is reduced by about 3%.Therefore,the quasi-identifier solved by this method has a better effect in the application of anonymous model.Secondly,combined with the l-diversity principle,a privacy protection method based on distinguish rule is designed for dynamic continuous data publishing.Firstly,integrate the released data into the data to be released at the current time through data consolidation,and mark the record according to the update operation.The release data and record similarity at the previous time are used to divide the equivalent classes,and the non-updated data and updated data are updated in turn.The similar records are divided into the same equivalent class as much as possible through the idea of clustering,and the modified records are constrained by distinguish rule.So,the privacy attribute value set of the equivalent class where the modified records are located is consistent or completely inconsistent with the published version.This can successfully resist the comparison attack.Anonymization processing adopts anatomy method to publish data,so as to reduce the information loss caused by generalization.Finally,the experiment is carried out on the adult data set published for many times,and the experimental results show that the privacy disclosure probability is less than 1 / l under the degree of information loss does not exceed 15%.It proves that the anonymous data obtained by this method has high data availability and low risk of privacy disclosure.

Keywords/Search Tags:

Data publishing, Privacy preserving, Anonymization, Quasi-identifier, l-diversity

PDF Full Text Request

Related items

1	Research On Several Key Problems Related To Anonymity Data In The K-anonymity Privacy-preserving Model
2	Anonymization-based Research On Privacy Preserving Data Publishing In ERP Systems
3	Research On Anonymization Based Privacy Preserving Method On Geosocial Networks
4	Models And Methods For Privacy-Preserving Data Publishing
5	Research On Privacy-preserving Data Publishing Methods And Their Applications
6	Research On Several Problems Related To Privacy-preserving Microdata Publishing
7	Research On Anonymization Technique Based Privacy Preserving Method On Facial Image
8	Research On Anonymity Techniques For Personalization Privacy-preserving Data Publishing
9	Research On Anonymity Techniques For Privacy-Preserving Data Publishing
10	Research On Privacy Preserving Methods For Data Publication