Font Size: a A A

Research Of Privacy Preserving Data Mining Techniques Under Anonymity

Posted on:2018-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:Q K LiuFull Text:PDF
GTID:2348330536981724Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,the information techniques and data science are rapidly growth and have been widely applied in various industries and applications.Although data mining techniques can be used to reveal the implicit information and relationships between items for decision-making,the sensitive and private information can still,however,be discovered at the same time,which may cause the security threats.Anonymization technology is one of the privacy protection technology,which can be used to protect users' identities or sensitive information by generating the equivalence classes for the quasi-identifiers.Thus,transactions(or quasi-identifies)containing the sensitive information may become indistinguishable.Previous works were extensively studied to handle the relational databases but cannot be directly applied to anonymize the transactional databases since the dimension of transactional databases is much higher than that of the relational databases.Since each identifier may have its own sensitive information,it is unfair to set the sensitive information for all identifiers.The main studies of this paper are given as follows.In this dissertation,we first present a new framework called PTA for K-anonymity,which is used to solve the problems of high complexity and high information loss in traditional transactional databases.It consists of three models such as a pre-processing module,a Travelling Salesman Problem(TSP)module,and an anonymization module to anonymize the transaction data and guarantees that at least K-transactions become identical.The pre-processing model is used to treat all items in a transaction database as quasi-identifies(QIDs)and then encodes the database as bitmaps using Gray code method.To minimize the information loss in the anonymity process,the TSP method is thus applied to find a shortest path between transactions and allow to minimize information loss in a derived segment since the distance between consecutive transactions can thus be minimized.In the anonymity model,since the optimized shortest path was found by the segment,a majority-voting mechanism is then used to find the center point of the group to perform anonymization.The divide-and-conquer approach is also applied in the designed PTA framework to speed up the anonymity process.Experiments indicated that the designed PTA framework outperforms the state-of-the-art algorithms for anonymizing the transactional databases.In addition to examine the same sensitive information for transactional databases,it is also an arising issue to anonymize the personal sensitive information individually.For example,each person may have its own diseases and would like to keep them as the sensitive information and could be hidden before and after the data mining process.Besides,even the sensitive information is hidden but it could still be found from the other related information in the databases.In the second part of this dissertation,a Lnn-means algorithm is thus presented to handle the personalized transactional databases.The designed Lnn-means first transforms the original data into relational databases by hierarchical generalization and matrixization methods.A k-means clustering technique is used to group the transactions by their similarity.Finally,the transaction records that have less information loss and satisfy(L,P)-diversity are used to generate equivalence classes.Thus,the probability that the sensitive information could be derived from the other related records becomes smaller than a given threshold,and the(L,P)-diversity can be successfully achieved.Meanwhile,the traditional L-diversity may face the semantic problem in the anonymity process,but the designed algorithm can successfully solve the above problem by applying generalization technique.Thus,the designed algorithm has higher security protection than that of both K-anonymity and L-diversity.In summary,this paper solves the anonymity problems for handling transactional databases and the personal transaction data in privacy-preserving data mining.Extensive experiments have proved that the proposed frameworks and algorithms are feasible and effective to achieve the requirements of data anonymization.
Keywords/Search Tags:privacy preserving data mining, K-anonymity, (L,P)-diversity, transaction data
PDF Full Text Request
Related items