Research Of Privacy Preserving Data Mining Techniques Under Anonymity

Posted on:2018-09-09

Degree:Master

Type:Thesis

Country:China

Candidate:Q K Liu

Full Text:PDF

GTID:2348330536981724

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years,the information techniques and data science are rapidly growth and have been widely applied in various industries and applications.Although data mining techniques can be used to reveal the implicit information and relationships between items for decision-making,the sensitive and private information can still,however,be discovered at the same time,which may cause the security threats.Anonymization technology is one of the privacy protection technology,which can be used to protect users’ identities or sensitive information by generating the equivalence classes for the quasi-identifiers.Thus,transactions(or quasi-identifies)containing the sensitive information may become indistinguishable.Previous works were extensively studied to handle the relational databases but cannot be directly applied to anonymize the transactional databases since the dimension of transactional databases is much higher than that of the relational databases.Since each identifier may have its own sensitive information,it is unfair to set the sensitive information for all identifiers.The main studies of this paper are given as follows.In this dissertation,we first present a new framework called PTA for K-anonymity,which is used to solve the problems of high complexity and high information loss in traditional transactional databases.It consists of three models such as a pre-processing module,a Travelling Salesman Problem(TSP)module,and an anonymization module to anonymize the transaction data and guarantees that at least K-transactions become identical.The pre-processing model is used to treat all items in a transaction database as quasi-identifies(QIDs)and then encodes the database as bitmaps using Gray code method.To minimize the information loss in the anonymity process,the TSP method is thus applied to find a shortest path between transactions and allow to minimize information loss in a derived segment since the distance between consecutive transactions can thus be minimized.In the anonymity model,since the optimized shortest path was found by the segment,a majority-voting mechanism is then used to find the center point of the group to perform anonymization.The divide-and-conquer approach is also applied in the designed PTA framework to speed up the anonymity process.Experiments indicated that the designed PTA framework outperforms the state-of-the-art algorithms for anonymizing the transactional databases.In addition to examine the same sensitive information for transactional databases,it is also an arising issue to anonymize the personal sensitive information individually.For example,each person may have its own diseases and would like to keep them as the sensitive information and could be hidden before and after the data mining process.Besides,even the sensitive information is hidden but it could still be found from the other related information in the databases.In the second part of this dissertation,a Lnn-means algorithm is thus presented to handle the personalized transactional databases.The designed Lnn-means first transforms the original data into relational databases by hierarchical generalization and matrixization methods.A k-means clustering technique is used to group the transactions by their similarity.Finally,the transaction records that have less information loss and satisfy(L,P)-diversity are used to generate equivalence classes.Thus,the probability that the sensitive information could be derived from the other related records becomes smaller than a given threshold,and the(L,P)-diversity can be successfully achieved.Meanwhile,the traditional L-diversity may face the semantic problem in the anonymity process,but the designed algorithm can successfully solve the above problem by applying generalization technique.Thus,the designed algorithm has higher security protection than that of both K-anonymity and L-diversity.In summary,this paper solves the anonymity problems for handling transactional databases and the personal transaction data in privacy-preserving data mining.Extensive experiments have proved that the proposed frameworks and algorithms are feasible and effective to achieve the requirements of data anonymization.

Keywords/Search Tags:

privacy preserving data mining, K-anonymity, (L,P)-diversity, transaction data

PDF Full Text Request

Related items

1	Research On Privacy Preserving Data Publishing Based On Anonymity Models
2	Research On Privacy-preserving Data Publishing Algorithms Based On Different Anonymity Requests
3	Research On Anonymity Models And Algorithms Of Privacy Preserving For Microdata Publishing To Thwarting Similarity Attack
4	Research On Several Key Problems Related To Anonymity Data In The K-anonymity Privacy-preserving Model
5	Research And Implement Of Privacy-preserving Scheme Based On Data Mining
6	Hybrid Methods For Privacy Preserving Data Sharing Techniques On Data Mining Environments
7	Key Technology Of Privacy Preserving Data Publishing Based On Cluster
8	Privacy Preserving Data Mining Based On Rough Set Theory
9	Study On Privacy Preserving Classification Data Mining
10	Research On Privacy- Preserving Data Mining Based On K-anonymity Algorithm