Font Size: a A A

Research On Multi-Level Sensitive Model And Privacy Preserving Method For Set-Valued Data Publication

Posted on:2016-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:S M ZhangFull Text:PDF
GTID:2308330464454739Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularity of Internet technology, Internet-based applications including e-commerce, online social networking and cloud computing develop fast. The network is rapidly gathering vast amounts of data resources of multi-types. It is these huge amounts of data that provide support for human to conduct scientific research, business planning, economic analysis, social group analysis and decisions. The use of data reflects enormous scientific, economic and political values. Driven by data sharing or interests, releasing data in public becomes a key requirement. However, these data often contain individual’s privacy information, releasing directly easily resulting privacy leakage, so privacy protection is the basis of data sharing.As an important data type in publication, set-valued data includes e-commerce data, patient medical data, click stream, etc. Such data are sparse, high-dimensional and extensive, have none fixed identifier and sensitive attributes of the transactions exist diversity. Thus traditional privacy preserving methods for relational data are not suitable for set-valued data. Anonymization technology researches for set-valued data mainly focus on the anonymity of item set. Current studies include k-anonymity, (h, k,p)-anonymity and p-uncertainty, etc. However,k-anonymity makes no distinction between sensitive and non-sensitive attributes of data and ensures that each transaction is identical to k-lothers within the same group through generalization, resulting in serious data distortion. In addition, this method cannot resist the homogenization attack when the transactions in the group share the same sensitive value. For (h, k,p)-anonymity and p-uncertainty, they do not consider corresponding relations between sensitivity of sensitive items and privacy protection degree in set-valued data, and adopt uniform privacy protection methods, which causes that partial data are excessively suppressed due to inacceptable to anonymity requirements, reducing the data utility.In this paper, a series of studies on the above problems are launched. First, privacy protection issue for set-valued data is deeply analyzed. Then the drawbacks of existing privacy models are discussed in detail and specific solution is given. Finally, in order to prevent the identity and sensitive attributes disclosure, a new privacy protection model is proposed and the corresponding algorithm is designed, better balancing the data utility and privacy protection strength.The main research achievements of the paper are as follows:(1) The research background and current status of privacy protection for set-valued data are firstly analyzed. Then km-anonymity,k-anonymity, (h, k,p)-anonymity and p-uncertainty for set-valued data are introduced in detail, and the shortages of these methods are pointed out. km-anonymity assumes that the background knowledge of an attacker is m, using top-down generalization to ensure that any set of m or less items should be contained by at least k transactions. However, the attackers’background knowledge is difficult to determine in practical application. To improve this,k-anonymity assumes that the background knowledge of an attacker is random. By constructing k identical transactions, it makes the attacker cannot distinguish any of this transactions so as to achieve privacy protection purpose. Nevertheless, many transactions of set-valued data do not contain sensitive information, and publishing will not cause privacy disclosure. A large number of useful information will be lost due to over protection while using k-anonymity method and this method cannot resist the homogenization attack. The main shortage of (h, k, p)-anonymity and p-uncertainty is not considering the sensitivity differentiation among different sensitive values.(2) According to the characteristics of set-valued data, the paper proposes sensitivity classification method, which assigns sensitivity level for whole sensitive values and sets different privacy thresholds for each sensitivity level. Based on that, (p, k, p) privacy protection model is designed. In this model, the background knowledge of an attacker is assumed as partial non-sensitive information. To satisfy k-anonymity, cluster-based method is used to handle this partial information. Meanwhile, assign sensitivity level for various sensitive values and then sequentially detect whether sensitive items exceed the specified thresholds according to different sensitive levels, and suppress the sensitive items exceeding the thresholds. Combining k-anonymity and p-uncertainty method, (p, k,p) privacy protection model improves their shortages. Taking the effect of sensitive item distribution on data sensitivity into account, this model can better improve data utility to some degree, prevents the link attack and decreases the disclosure risk of sensitive attributes.(3) Based on the above privacy models, the paper designs a novel lazy cluster-updating (p, k, p)-anonymity algorithm. The algorithm ranks in accordance with the support degree of privacy constraints set, regards information loss as a metric, chooses p of maximum support every time and clusters the two chosen items within minimum generalization information loss until all p meet k-anonymity. Simultaneously, detect whether sensitive association rules exceed the threshold p, and suppress the sensitive item exceeding the threshold. At last, the algorithm complexity is analyzed and the feasibility of algorithm in the paper is verified.(4) The paper experimentally evaluates three real set-valued data sets and measures the data information loss and running time of the algorithm. Experimental results show that comparing to previous proposed k-anonymity and (h, k,p)-anonymity, etc, methods proposed in the paper can better maintain the original data distribution, effectively reduce the information loss and improve the data utility while meeting the privacy requirements.
Keywords/Search Tags:set-valued data, privacy protection, data anonymity
PDF Full Text Request
Related items