| With the rapid development of network technologies,various types of network applications and terminals emerge in an endless stream.When people use these applications and terminals,a large amount of data containing user information is generated.This user information includes general information such as gender and age,and also includes sensitive information such as disease diagnosis records,location records and special commodity purchase records.For some institutions or companies,it has great research significance and commercial value.For example,hospitals can find out the complications related to certain diseases by analyzing the patient's disease diagnosis data to diagnose and treat diseases more effectively;some e-commerce businesses utilize the user's purchase record data analysis to tap the different users of the purchase interest in order to accurately push products for different users.Therefore,the importance of data in the information age is particularly prominent,and researchers are constantly developing new databases to more accurately and comprehensively denote the record of user information.In particular,different types of non-relational databases(such as MongoDB)have been developed in recent years.In such databases,hierarchical markup languages(such as XML,JSON,YAML,etc.)are often used to describe the data,so this type of data is also called hierarchical data.They can clearly represent the structural information of data and so have a higher research value than relational data.When the data is collected by the relevant agencies,it must be properly protected before sharing to the third party for research purposes.Otherwise,the data may cause serious privacy leakage.Therefore,privacy-preserving technology in data publishing has been a hot issue in the field of information security research.However,the research in this direction is mainly focused on traditional relational data,and there is very little research on hierarchical data.Due to the importance of hierarchical data,it is urgent to study the corresponding privacy protection model and anonymous algorithm to solve the privacy protection problem in publishing hierarchical data.In this paper,we study the problems existing in the current hierarchical data l-diversity anonymous method,analyze the reasons for the existing problems in the anonymous data of the current hierarchical data,and propose hierarchical privacy protection model and corresponding anonymous approach of the hierarchical data.The approach is used to solve the homogenous attack problem existing in the current hierarchical data privacy protection methods.The main research work of this paper is as follows:(1)This paper summarizes and analyzes the current research status of the traditional relational data and the hierarchical data privacy protection methods.It points out that the traditional privacy protection models and methods for publishing relational data cannot be directly applied to the scenario of hierarchical data privacy protection.This paper elaborates and analyzes in detail the problems of homogeneous attacks existing in the current privacy protection methods of hierarchical data.(2)A multi-level privacy-preserving model for hierarchical data is prosed:(?_i~h,k)-anonymity model,which is used to solve the problem of privacy leakage caused by homogeneous attacks existing in current privacy protection methods for hierarchical data.Firstly,the model uses the idea of fuzzy set theory to classify the sensitive attribute values of hierarchical data,and then filters the data records in the equivalent class according to the parameters?_i~hso that the number of sensitive attribute values at different levels in the equivalence class does not exceed k*?_i~h.The threshold is set to increase the degree of difference between the values of the sensitive attributes in the equivalence class to effectively prevent the privacy leakage caused by homogenous attacks.(3)Based on the proposed privacy protection model of hierarchical data,the corresponding implementation algorithm is designed.The detailed description of the components of the algorithm and the implemental details of each module are also given.Then,the security of the proposed model and the complexity of the algorithm are analyzed.Finally,the characteristics of the published scene for hierarchical data are introduced,the system framework of multi-level privacy-preserving for publishing hierarchical data,and the software architecture of our proposed anonymous algorithm is proposed.(4)By measuring the information loss,equivalence class dissimilarity,and execution time of anonymous dataset,we evaluated our our algorithm and existing anonymous methods for hierarchical data in terms of data utility,security,and efficiency.The experimental results show that our method is far superior to the existing hierarchical data anonymity methods in terms of data utility and security,and the efficiency of the algorithm is very close to the existing anonymous method for hierarchical data. |