Font Size: a A A

Research On Publishing Sensitive Data Based On Anonymity In Digital Libraries

Posted on:2012-04-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y C LuoFull Text:PDF
GTID:1228330368497234Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of information technology, the resources and innovative services in digital libraries are becoming increasingly rich. At the same time the issue of the users’ privacy is also increasingly prominent. Applied to data sharing and data analysis, the anonymous technique in privacy-preserving data publishing on the one hand has good applicability, versatility and practicality, on the other hand can fully respect the users’ privacy, which is conducive to full application of the data and information sharing, thus promoting library’s services. However, application data in digital libraries has some characteristics of specific areas, which is the diversity of privacy protection demands and the data form. After analyzing various existing anonymity models and anonymization technologies, the thesis points out that the current anonymous data publishing techniques will not solve the privacy problem of sensitive data released under various scenarios in digital libraries. Therefore, it studies some key techniques of anonymous dada publishing for the sensitive data in digital libraries. The main work as follows:(1) Research on the sensitive data publishing framework based on domain knowledge and the applicationFacing with the current challenges of the sensitive data protection, a data publishing architectural framework based on domain knowledge is proposed to meet the application requirements. And several modules of the framework are introduced. Furthermore, an adaptive and personalized data publishing algorithm is given. The framework trying to use an adaptive mechanism, not only can meet the needs of the different data applications, but also can satisfy the needs for the different owners’ privacy protection. In the adaptive data publishing algorithm, it is used together the generalization principles of the quasi-identifier property and sensitive attribute in order to obtain the anonymous released data sheet to meet the demand for privacy protection, while reducing the information loss. That is as much as possible to improve the accuracy of the released data.(2) Research on the technology of the personalized anonymity data publishing based on the generalizationWith the latest development of anonymity, this thesis puts forward a personalized data publishing model applied to release the sensitive data in digital libraries from the perspective of the individual and sensitive attribute values, which is a (P,alpha,k)-anonymity model, and an algorithm based on the generalization. The model gives full consideration to the special user’s privacy and the public users’privacy. First, after introducing the related works and several existing personalized anonymity principles, this thesis gives the personalized privacy constraints with several parameters and proposes a (P,alpha,k)-anonymity model. Second, a heuristic algorithm based on the generalization, TopDown-LA, is proposed. And the techniques of local encoding and specialization used in the algorithm also be explained, which ensure the algorithm to obtain the minimum k-generalization and maximize the accuracy of the anonymous table, and then the complexity and accuracy of the algorithm also be analyzed. Finally, the real data experiments verify the feasibility of this heuristic algorithm. These show that it can fully meet the needs of personalized privacy protection, compared with less loss of information than Basic Incognito and Mondrian, and it has good execution performance. (3) Research on the identity-reserved data publishing technologyThis thesis introduces three specific identity-reserved anonymity principles, and focuses on the two data publishing methods of the clustering-based anonymization and the lossy decomposition, ID Anatomy. In most cases the analysis of the released data in digital libraries not only need to reserve the user’s identity, but also need to consider the needs of the user’s individual privacy. In such cases, the thesis first considers the data with multiple records corresponding to a single individual. In particular, it analyzes the violations of the sensitive data. And it brings forward three specific identity-reserved anonymity principles. Then, the thesis describes the clustering-based algorithm, which applied the weighted-hierarchical-distance methods to assess the information loss, and analyzes its complexity. It also introduces a method of the lossy decomposition, IDAnatomy, which releases the quasi-identifier property and sensitive attributes by using two different relationship tables with their original relations, utilizing the lossy connection to protect the privacy security. And the algorithm guarantees to meet the requirements of privacy and utility. Finally, in the experimental environments it compares several aspects of the original methods and identity-reserved anonymous method, testing the validity of the method.(4) Research on the graph data publishingThis thesis presents a new clustering-based safety grouping strategy for the graph data and two different anonymous data publishing algorithms. Firstly, it analyzes privacy protection data publishing issues of the complex interaction data in digital libraries, and implements an incremental knowledge query model based on the background knowledge of the graph attack problems. Secondly, on the basis of the establishment of bipartite graph model and some related definitions, the issues of the graph anonymization integration and data anonymization are discussed. Also it introduces some bipartite graph data publishing methods, such as primitive anonymous publishing, list approach, partitioning approach, and so on. Then, combined with the latest research results, a new clustering-based safety grouping strategy to improve the data availability of the released bipartite graph is introduced. And it compares the CKG algorithm and KGC algorithm from the implementation strategies. During this period it also highlights the information loss of graph generalization and the description of super-nodes. At last, the experiments show that the clustering-based safety grouping strategy can provide privacy protection for the individuals and increases the availability of anonymous graph data to some extent.In this thesis, the various algorithms not only have made a detailed performance analysis, but also have run with the actual data set in digital libraries or integrated data set. The experimental results and performance analysis show that the proposed methods compared with the related algorithms have good performance and better adaptability.
Keywords/Search Tags:digital library, data publishing, k-anonymity, personalized anonymity, identity-reserved, bipartite graph, safety grouping, clustering
PDF Full Text Request
Related items