Font Size: a A A

A Study Of Privacy-preserving Method In Data Publishing

Posted on:2009-07-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q WeiFull Text:PDF
GTID:1118360275471010Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The number and variety of data collections containing person-specific information experience an exponential growth with the advances in computer technology, network connectivity and the increasingly augment in disk storage space. Data holders need to release these data which contains person-specific information for some data mining tasks. However, the concerns about individual privacy prevent the dissemination of such person-specific data. So, publishing data about individuals without revealing sensitive information has become a widespread problem.The objective of privacy protection is to prevent adversaries from inferring individuals'sensitive information with high confidence. In the process of data publishing, we assume that the data publisher always knows which attributes are sensitive. In order to prevent disclosing individuals'sensitive information, the publisher hides the original data, publishing in its place an anonymized version. On the other hand, since the goal of data publishing is to perform some data analyses and researches, the data publisher should guarantee that the anonymized data permits accurate data analysis. So it is important to find the trade-off between the privacy protection and utility of anonymized data in privacy-preserving data publishing.Privacy preservation methods usually divide the records in the original data into some equivalence classes and each equivalence class should satisfy a certain privacy principle. The diversity of sensitive values in an equivalence class determines the quality of privacy preservation. The privacy preservation method based on de-clustering utilizes the idea of de-clustering to partition the records that possess the distinct sensitive values into the same equivalence class, and directly releases the QI attributes to capture a large number of information in the original data. At the same time, the average protection expectation takes the number of the records and the number of distinct sensitive values in an equivalence class into account to measure the quality of privacy preservation. The study shows that the method based on de-clustering not only ensures the utility of data but also provides stronger privacy protection.Besides of the identifiers and quasi-identifiers obtained through external database, adversaries can glean other background knowledge to infer target individuals'sensitive information through some approaches. Unfortunately, the publisher cannot know the background knowledge held by adversaries. So in the process of data publishing, data publisher can only assume that adversaries have some background knowledge which exists in practice and can be resolved effectively. For a new type of background knowledge , a privacy principle against background knowledge requires the records contained in each equivalence class to satisfy some constraints, which can prevent adversaries from inferring target individuals'sensitive information with high confidence by the background knowledge . The study shows that this principle can effectively avoid privacy attacks induced by background knowledge and permit accurate data analysis.The contents of the original data may be changed by insertion, deletion and update. The publisher should re-publish the data when the content is changed. Adversaries can infer the target individuals'sensitive information with high confidence through combining multiple released data of the original data. The privacy principle in data re-publication utilizes substitution to guarantee that the signature of a record in later publishing timestamp contains the signature of the record in early publishing timestamp, which in turn prevents adversaries from inferring target individuals'sensitive information with high confidence by combining multiple released data.In practice, because of the difference of individuals in the data and the difference of comprehension of data, etc, the data that needs to be published may contain multiple sensitive attributes. Multiple sensitive attributes lead to some additional challenges, such as that adversary can combine the correlation among sensitive attributes and the background knowledge about some sensitive attributes to infer target individuals'sensitive information. The existing privacy principles have a common hypothesis that the data needed to be published has only one sensitive attribute. So it is difficult for the existing privacy principles to deal with privacy disclosure in privacy-preserving data publishing towards multiple sensitive attributes. The privacy principle, which towards multiple sensitive attributes data publishing, prevents adversaries from inferring target individuals'sensitive information by locally changing the correlation among sensitive attributes.The conventional privacy principles are designed for single-table data, and cannot be used to deal with the privacy disclosure in social network data publishing. Because the individuals in a social network have some relationships and possess some structural properties, but the records in a single-table data are independent. Therefore, the privacy principle in social network data publishing utilizes structure anonymization and label anonymization to prevent adversaries from identifying the target individual in anonymous social network by the structural properties and label information of the target individual.
Keywords/Search Tags:Privacy preservation, Privacy principle, Data publication, Data anonymization, Linking attack, Attribute disclosure
PDF Full Text Request
Related items