Font Size: a A A

Research On Some Key Technologies Of Privacy Preserving In Database Security

Posted on:2012-01-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:1228330368497228Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and information technology, the information system based on database has been widely used in the construction of information infrastructure, such as economic, financial,medical and other areas, and more and more personal information is collected,stored and released by different organizations and institutions (For example, statistical departments, hospitals, insurance companies, etc.), large amounts of information in which is used for industry collaboration and data sharing. However, personal privacy information, which is included in the database system, will face more security threats due to the information can be easily accessed in the new network environment.Currently, the growing of privacy disclose has become a major obstacle for information sharing. Therefore, how to effectively protect these data which contain sensitive information, and how to combine the authenticity of released data and the safe release mechanism of private information, become a major challenge for privacy preserving technology in database security.Technology research on privacy preserving in database security focuses on privacy protection of sensitive personal information. The typical solution of this problem is to modify the data in certain degree, and make the modified data can not only prevent the disclosure of personal privacy information, also retain the accuracy degree of original data and query. In privacy preserving, different individuals have different privacy requirements. In data publishing environment, most of the existing anonymous strategies only provide table-level security granularity, but can not resolve the problem of defining different sensitive information in the same table and unify the importance of data in different applications, as well as the demand of dynamic define sensitive information. Therefore, this paper is of important significance by researching the anonymous model which can dynamically assign sensitive information, applying the model to personal privacy preserving to meet specific applications and individual requirement, and making the result can not only ensure the private information not to be leaked, also maximize the availability of data.In the field of privacy preserving in database security, the existing privacy preserving techniques are mainly used for publishing the information which includes a single dissemination sensitive attribute data. Due to the correlation of multiple sensitive attributes data, the existing technique does not apply to release the data with multiple sensitive attributes. If the existing method is applied to releasing the information with multiple sensitive attributes, it will inevitably lead to disclosure of private information. However, in many practical applications, released data often contains multiple sensitive attributes, such as, patient’s diagnostic records table may contain the medical cost, home address, which are all personal privacy information. Multiple sensitive attributes are often contained in the same table and multiple sensitive attributes values of each tuple correspond to the same individual. There is the other condition, that some attributes do not include the private individual information, but among these attributes with the privacy information exist a clear specific contact, this condition is prone to inference channel. Both cases will lead to a direct threat to the security of private information. Compared to the privacy preserving of a single sensitive attribute datasets, the technology oriented to multiple sensitive attributes dataset faces greater challenges due to the complexity of multiple sensitive attributes dataset. Thus, how to prevent the privacy information disclosure of multiple sensitive attributes datasets becomes an important research subject which is of practical application value.For the same data, there is a certain link among multiple versions of data that occurred in different times. It’s easy to form an inference channel which can be exploited by an attacker. Thus, it results in the problem of privacy leak caused by re-publishing datasets. Datasets re-publication includes singe sensitive attributes datasets re-publication and multiple sensitive attributes datasets re-publication. Compared with the actual data changes that occurred in the application, the assumption of the existing privacy protection technology of datasets re-publication on dynamic database has a big difference. For example, when some individual’s sensitive values has a frequent change, while another individual’s sensitive values has a little change, the existing technology cannot effectively resolve these different change. In addition, existing technology uses the data hidden, or adds virtual data, which casts a large impact on the accuracy of the data. Due to multiple sensitive attributes datasets re-publication involves multiple sensitive attributes and re-publication, the leakage of privacy information caused by re-publishing datasets with multiple sensitive attributes becomes more likely than any other publication styles. There is no correlation research on multiple sensitive attributes datasets re-publication. The current methods do not take the problem of information leakage caused by re-publishing datasets with multiple sensitive attributes in account. Comparably speaking, re-publishing the datasets with multiple sensitive attributes faces more challenge than re-publishing the datasets with single sensitive attributes. Therefore, the research on privacy protection technology of datasets republication for improving theory and application level of privacy protection technology in database security has a significant meaning and practical value.This thesis focuses on the research on privacy preserving in database security in order to overcome some problems and propose novel solutions to resolve multiple challenges in current research area. The research is mainly composed of three issues. The first one is how to dynamically assign sensitive information effectively, achieve privacy protection for personal information, including the study of anonymous models, clustering, and the anonymous algorithm. The second one is how to solve privacy preserving issues for multiple sensitive attributes, namely research on some key technologies, such as the single dimensional sequel set division, multidimensional division, and greedy algorithms. The third one is how to solve the problem of privacy information disclose caused by re-publishing datasets. So, this thesis focuses on separately studies privacy rules and algorithms which suitable for single sensitive attribute datasets republication, multiple sensitive attributes datasets republication and the partition technology based on bucket. The research result and our contribution in this thesis mainly consist of the following four aspects:1) This thesis proposes a novel anonymous model which can dynamically assign sensitive information. In order to meet different needs of individual for QI attributes and sensitive attributes, and effectively avoid the great loss of information due to excessive generalization. This thesis utilizes the characteristics of individual custom sensitive to set sensitive attribute hierarchy, moreover, applies generalization technology to sensitive attributes and fully integrates local generalization technology and multidimensional technology in order to further improve the efficiency of the anonymous. In the algorithm design aspect, this thesis proposes CBM algorithm and D-KAC anonymous algorithm for implementing the dynamic anonymous model. This thesis separately makes compares of privacy protection, data availability, and efficiency of the algorithm by experiment. Experimental results show that the proposed model and algorithm can effectively meet the requirements of personal privacy and provides protection for defined information. Moreover, the experiment verifies that this algorithm has better performance on execution efficiency.2) This thesis proposes a novel anonymous model-(α,β,κ)-anonymity oriented multiple sensitive attributes privacy preserving. In order to avoid the disclosure of sensitive information in the publication datasets with multiple sensitive attributes, this thesis carries out a detailed and deep analysis to homogeneous attack and background knowledge attack, builds (α,β,κ)-anonymity model by utilizing the rule of classified sensitive attributes, and ensure the diversity among the multiple sensitive attributes values. In the algorithm design aspect, this thesis proposes (α,β,κ)-anonymity algorithm. This algorithm implements the model by utilizing top-down multidimensional division method and single dimension sequel sets division method. Through a series of experiments designed by this thesis, compared the algorithms proposed by this thesis with other algorithms. The results of experiments verify the algorithms proposed by this thesis with superiority in the aspect of information loss degree, privacy preserving degree and running time.3) This thesis proposes a novel privacy rule- m-correlation oriented single sensitive attribute datasets re-publication. This rule guarantees that sensitive attribute value is indistinguishable in QI group of continued publishing datasets and eliminates the inference channel caused by re-publishing by making a partition based on sensitive attributes for the tuples in datasets, and introducing confusing manner. Moreover, the method effectively solves the problem of privacy information leaks due to the data is updated, added and deleted, and re-published. This thesis proposes m-correlation algorithm to achieve the novel rule. Experimental results show that m-correlation rule and algorithm can efficiently re-publish datasets with single sensitive attribute and generate the data with higher accuracy.4) In order to effectively eliminate inference channel caused by re-publishing multiple sensitive attributes datasets, resist various background knowledge attack, and reduce the risk of multiple sensitive attributes datasets re-publication, this thesis proposes a novel privacy rule -MDR oriented multiple sensitive attribute datasets re-publication. MDR rule makes each dimension sensitive attribute value included QI group with diversity characteristics and add diversity among sensitive attributes. Moreover, this thesis gives a novel algorithm based on bucket partition technology to reconstruct and divide the data in table. The results of experiments verify MDR algorithm with superiority by compared MDR algorithm with others algorithms in the aspect of data availability and execution time.As stated above, this thesis presents an extensive and deep study on the privacy preserving technologies in database security, proposes several novel solutions, and solves several difficulties the existing technology failed to break through. Experiments based on real datasets indicate the effectiveness and efficiency of our technologies.
Keywords/Search Tags:Database Security, Privacy Preserving, Clustering, Generalization, Dynamic Anonymity, Multiple Sensitive Attributes, Re-publication of Datasets
PDF Full Text Request
Related items