Font Size: a A A

Research On Privacy Preserving Approaches In Data Stream Oriented To Information Sharing

Posted on:2014-01-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:J W ZhangFull Text:PDF
GTID:1318330518971255Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,data stream as a common data form has attracted more and more attention of the data mining researchers,data mining algorithms based on data stream quickly and efficiently through scanning data once has provided to people the growing valuable information,which can support the decisions.However,with the rapid development of data steam mining technology,data privacy and information security have to face a threat inevitably.Data mining knowledge can be used to derive sensitive information from non-sensitive information,or some data mining knowledge may be sensitive information.The research on privacy preserving approaches in data stream oriented to information sharing,on the one hand,provides more effective and efficient technical solutions for the protection of sensitive information in the data stream environment and makes owners share information without apprehension to institutional and individual who need,in order to promote the exchange and sharing of information;on the other hand,emphasizes to reduce the information loss of the non-sensitive information after the protection of sensitive information.So that ensures the quality of the shared information and improves the availability of the shared information.This thesis aims on the security information sharing in data stream environment with ensuring high availability of shared information,then carries out more detailed and effective study on privacy protection technology for the original data stream as well as the association rule knowledge:Firstly,aimed to the problem that traditional k-anonymity approach based on the data generalization technology can not show characteristics of data stream and can't solve the link attack threat for the shared original data stream dynamically,we propose the Top-down specialization tree to generate the quasi-identifier attributes with multiple data types,so that achieve continuous k-anonymity of data stream dynamically through branching and pruning of the tree structure;meanwhile,aimed to the problem that high information loss will be produced accompany with data stream anonymity,we introduce the distribution density threshold of data stream and the delay threshold of data publishing to the process of pruning of the tree,so that some nodes who generate the minimum information loss and satisfy k-anonymous property will be shared;on this basis,we propose the k-anonymity algorithm of data stream(KIDS)based on the sliding window model.At last,experiments show:compared with conventional k-anonymity methods,anonymous data stream algorithm KIDS generates smaller time cost and KIDS in different of parameter k generates more small loss of information,it maintains efficiency of the shared original data stream.Secondly,aimed to the problem that is difficult to keep the security of sensitive rules if those hiding algorithms of sensitive rules based on the static data have been applied to data stream.We propose a frequent pattern tree(IMFP-Tree)based on an improved item header table.On this basis,we propose the sensitive association rules hiding algorithm(HSRDS).Applied the item header table that can solve the problem that the traditional frequent pattern tree(FP-Tree)need to count the support of items,it can't be applied to data stream directly,then because that traditional frequent pattern tree can't display the containment relationship of transactions and data items,so that it is difficult to quickly select the sanitized sensitive transactions,we solve the problem by increasing the node domain ListTi into frequent pattern tree.Further,since the proceeding of data sanitizing is usually accompanied higher information loss,we defines two metric thresholds to select the sanitized data items,so that sensitive rules hiding can generate lower side effect.At last,experiments show:compared with conventional FP-Tree,IMFP-Tree can complete data sanitizing in less time,and compared with algorithms Algo2a and SWA,the algorithm HSRDS generates less information loss,and keep the balance between the usefulness of the data stream and the hiding of sensitive rules effectively.Thirdly,for the situation of data stream sharing,aimed to the problem that the hidden sensitive rules may be leaked again because of isolate attacks after sensitive rules have been hidden by data stream sanitizing,we propose a k-anonymity methodof rules(SRA)to protect the hidden sensitive rules,it completes the twice protection for sensitive rules;Meanwhile,existing k-anonymity algorithms of rules are most based on data sanitizing technology,but sanitizing data again is not suit the characteristics of the data stream,so we combine the time sliding window technique and adopt the mode of appending transactions;Then,in order to further improve the efficiency of the algorithm and to reduce the space complexity of the algorithm,we adopt the prime number encoding method.Aimed to the problem that algorithm SRA generates excessive loss of information when value of k is high,we propose the improved k-anonymity algorithm(ASRA)of the sensitive rules.At last,experiments show:compared with algorithm ARH,algorithms SRA and ASRA can generate less time cost and less information loss,especially when the data volume increases.Meanwhile,algorithm ASRA generates less information loss than algorithm SRA,and ASRA can keep the higher usefulness of mining results.Finally,aimed to the threat of inference attacks that are caused by the correlation of shared association rules in the data stream environment,we propose a fast and efficient blocking algorithm of inference attacks(BIA)with minimize information loss based on the ideas of association rules sanitization.Since the existing inference channels based on frequent pattern is insufficient to define the inference attack channels of association rules,then after analyzing the characteristics of inference attack of association rules,we have defined four possible existence inference attack channels of sensitive association rules:disassembled inference,assembled inference,transitive inference and chain inference.Meanwhile,aimed to the problem that the blocking method of inference attack based on frequent pattern will produce higher loss of information for non-sensitive rules,we propose the blocking method of inference attack based on association rules.At last,experiments show:algorithm BIA is more suitable for data stream environments compared with algorithm DSA and BINFCH.Algorithm BIA can not only block the channel of inference attack of sensitive rules,but also produce the lower rate exposure of sensitive rules and the information loss of non-sensitive rules.
Keywords/Search Tags:Information sharing, Anonymization, Specialization tree, Sensitive rules, Double protection
PDF Full Text Request
Related items