Font Size: a A A

Research And Application Of Sensitive Attribute Recognition Method For Data Publishing

Posted on:2021-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ChenFull Text:PDF
GTID:2518306458986609Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of big data,data has become an important element of social activities and a new means of production to promote social development.Government departments,research institutions and other data collectors,in order to share information,open government affairs,scientific research and other purposes,publish the collected data to the big data platform for analysis and sharing.Because the published data may involve personal privacy,the data publisher will protect the privacy of the data set before publishing,but there may still be some sensitive data in the processed data,which will restore the personal privacy information after being associated with the external database.If sensitive data is not accurately identified and protected,there is a risk of personal privacy information disclosure after data sets are attacked by background knowledge,link attacks and other privacy attacks.In the face of frequent information security incidents and increasingly strict information security protection requirements,the protection of personal privacy information security has become the top priority.However,with the growth of the types of data sets,the relationship between data becomes more and more complex.How to realize the sensitive data recognition for data publishing has become an urgent problem.In order to solve this problem,this paper proposes a method of sensitive attribute recognition and classification based on information entropy and association mining.This study completed the following research contents and achieved corresponding results:(1)This paper proposes a sensitive attribute recognition method for data publishing.Information entropy and maximum discrete entropy are used to quantify the sensitivity of attributes,and sensitive attribute set is established by clustering sensitivity.Apriori algorithm and mutual information theory are used to analyze the correlation between attributes,construct attribute dependency graph,mine the association attributes of sensitive attributes,and realize the recognition of sensitive attributes.Experimental results show that the method can be applied to sensitive attribute recognition in data publishing in both trust and non-trust modes,without establishing sensitive data dictionary,and taking into account the correlation between attributes.(2)This paper proposes a classification method of sensitive attributes for data publishing.Based on the correlation between attribute sensitivity and attribute,a sensitive level evaluation method is proposed to achieve the classification of sensitive attributes.Collect personal privacy related data tables,define the types of sensitive attributes and their element spaces,use Jaro-Winkler distance algorithm to compare the similarity between attributes and elements,and realize the classification of sensitive attributes.(3)Based on the method of sensitive attribute recognition and classification and classification proposed in this paper,a set of sensitive attribute recognition system is designed and implemented.The system is described in detail from requirements analysis,outline design,detailed design,system implementation and testing.Through the test,the system can realize the recognition and classification of sensitive attributes for data publishing,and display and statistics of sensitive attributes in the form of charts.
Keywords/Search Tags:privacy protection, information entropy, association rules, mutual information, information discovery system
PDF Full Text Request
Related items