Font Size: a A A

Research On Privacy-Preserved Data Publishing Techniques Of Sequence Data

Posted on:2013-10-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:X ShangFull Text:PDF
GTID:1228330395489257Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, the popularity of sensor networks, RFIDs, and wireless positioning equip-ments has further driven the production of sequence data to unprecedented volume and com-plexity. Sequence data has long been considered one of the most important types of data avail-able in both nature and human society. Sequence data in finance and e-commerce area implies vast amount of privacy, which, if not appropriately protected, may become exploited as a source for abuses and crimes. As a kind of unstructured data, sequence data possesses many features that traditional relational data does not own, such as the pattern of time series and the ordered, fast, continuous, infinite flows of stream data, which makes the privacy preservation techniques of traditional relational data not applicable on sequence data.Hence, the privacy-preserved data publishing of sequence data is a significant research topic, and it is widely used in financial analysis, business administration and LBS. However, the privacy preservation technique of sequence data has not attracted enough attention yet, and there is a vast space for improvement in this area. So this thesis mainly focuses on the pri-vacy models and anonymization algorithms for sequence data, which could protect the essential features and adapt to the typical applications of sequence data.Firstly, we summarize the existing privacy preservation techniques, and highlight the con-ventional and advanced k-anonymity models as well as their applications on the sequence data. We analyze the drawbacks of existing work and point out the research challenges, which lead to the research content of this thesis.We propose the framework for the privacy-preserved data publishing technique of se-quence data, and develop the corresponding privacy protection solutions for patten matching application of time-series and stream data dissemination. We describe the critical techniques, the generic definition to patterns and the version derivation technique, in detail.This thesis is the first study of integrating anonymization into a scalable data dissemina-tion infrastructure. We formulate the bandwidth-constrained and flexible-tree-structure dissem- ination model of anonymized stream data based on k-anonymity. The resources constraints, the data volume of each anonymity version and the optimization target are elaborately formu-lated. Besides, we propose two version derivation algorithms, which are more suitable for the resource-constrained applications, based on two well-known k-anonymization algorithms, Mondrian and top-down greedy search. Based on them, we then propose two general ver-sion derivation algorithms, hierarchy-based derivation and generalized-record-based derivation, which are applicable for all k-anonymity and1-diversity algorithms. We can prove the dis-semination plan optimization problem is NP-hard, and then propose two heuristic-based tree construction strategies.Towards a more practical stream data dissemination application, we make some improve-ments and extensions based on the above anonymized streaming data dissemination model, and propose the communication-delay-constrained and half-fixed-structure dissemination model of anonymized stream data. The model formulation is compatible with both k-anonymity and1-diversity, and the model features are more complicated and practical. After that, we propose the client assignment and plan optimization strategies for this dissemination model.In the end, we propose a pattern-preserving anonymization method for time-series data. Relying on a very generic definition to patterns, we propose a novel anonymization model called (k,P)-anonymity. This model publishes both the attribute values and the patterns of time-series in separate data forms. We demonstrate that our model can prevent linkage attacks on the published data while effectively supporting a wide variety of queries on the anonymized data. Two algorithms are designed to enforce (k,P)-anonymity on time-series data. We also propose the reconstruction techniques for supporting customized data publishing, which allows the values and PRs to be published from different subsets of the QI attributes.
Keywords/Search Tags:Time-series, Stream, k-anonymity, I-diversity, Generalization, VersionDerivation, Pattern, Dissemination
PDF Full Text Request
Related items