| With the rapid development of information technology,massive data is continuously collected,stored and released.In the era of big data,more and more platforms,institutions and individuals participate in the process of data release and information sharing.However,the super large-scale data mining and analysis means that the problem of personal privacy information disclosure is becoming more and more serious.With the continuous release of data over time,more personal information will be disclosed.At present,the research on privacy protection of continuous data release is still in its infancy,and there is an urgent need to study more effective privacy protection methods to ensure the security of data release.Firstly,the main anonymous object of continuous data publishing,that is,quasi-identifier,will be studied.Quasi-identifier is an attribute set used to determine the identity of a specific entity in structured data,which can provide an inference path for query attacks.Whether the selection of quasi-identifier is correct and complete is related to the success or failure of privacy protection.Aiming at the problem of how to solve the correct and complete quasi-identifier of single structured relational data,a quasi-identifier solution method based on functional dependency is proposed.According to the semantic relationship and publishing requirements,the attributes of the relationship data to be published are classified into identification attributes,privacy attributes,sensitive attributes and non-sensitive attributes.Secondly,the dependencies on identifying attributes with other attributes in the relational schema is mined according to semantics and instance data in relational data,so as to obtain the quasi-identifier.Finally,the algorithm of quasi-identifier is designed by Python language,and the experiment is implemented in three real data sets of UCI.The results show that the average group record size of the equivalence class divided on the solved quasi-identifier is reduced by 8% compared with other methods,and the average probability of privacy disclosure is reduced by about 3%.Therefore,the quasi-identifier solved by this method has a better effect in the application of anonymous model.Secondly,combined with the l-diversity principle,a privacy protection method based on distinguish rule is designed for dynamic continuous data publishing.Firstly,integrate the released data into the data to be released at the current time through data consolidation,and mark the record according to the update operation.The release data and record similarity at the previous time are used to divide the equivalent classes,and the non-updated data and updated data are updated in turn.The similar records are divided into the same equivalent class as much as possible through the idea of clustering,and the modified records are constrained by distinguish rule.So,the privacy attribute value set of the equivalent class where the modified records are located is consistent or completely inconsistent with the published version.This can successfully resist the comparison attack.Anonymization processing adopts anatomy method to publish data,so as to reduce the information loss caused by generalization.Finally,the experiment is carried out on the adult data set published for many times,and the experimental results show that the privacy disclosure probability is less than 1 / l under the degree of information loss does not exceed 15%.It proves that the anonymous data obtained by this method has high data availability and low risk of privacy disclosure. |