Research On Privacy-Preserved Data Publishing Techniques Of Sequence Data

Posted on:2013-10-28

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X Shang

Full Text:PDF

GTID:1228330395489257

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years, the popularity of sensor networks, RFIDs, and wireless positioning equip-ments has further driven the production of sequence data to unprecedented volume and com-plexity. Sequence data has long been considered one of the most important types of data avail-able in both nature and human society. Sequence data in finance and e-commerce area implies vast amount of privacy, which, if not appropriately protected, may become exploited as a source for abuses and crimes. As a kind of unstructured data, sequence data possesses many features that traditional relational data does not own, such as the pattern of time series and the ordered, fast, continuous, infinite flows of stream data, which makes the privacy preservation techniques of traditional relational data not applicable on sequence data.Hence, the privacy-preserved data publishing of sequence data is a significant research topic, and it is widely used in financial analysis, business administration and LBS. However, the privacy preservation technique of sequence data has not attracted enough attention yet, and there is a vast space for improvement in this area. So this thesis mainly focuses on the pri-vacy models and anonymization algorithms for sequence data, which could protect the essential features and adapt to the typical applications of sequence data.Firstly, we summarize the existing privacy preservation techniques, and highlight the con-ventional and advanced k-anonymity models as well as their applications on the sequence data. We analyze the drawbacks of existing work and point out the research challenges, which lead to the research content of this thesis.We propose the framework for the privacy-preserved data publishing technique of se-quence data, and develop the corresponding privacy protection solutions for patten matching application of time-series and stream data dissemination. We describe the critical techniques, the generic definition to patterns and the version derivation technique, in detail.This thesis is the first study of integrating anonymization into a scalable data dissemina-tion infrastructure. We formulate the bandwidth-constrained and flexible-tree-structure dissem- ination model of anonymized stream data based on k-anonymity. The resources constraints, the data volume of each anonymity version and the optimization target are elaborately formu-lated. Besides, we propose two version derivation algorithms, which are more suitable for the resource-constrained applications, based on two well-known k-anonymization algorithms, Mondrian and top-down greedy search. Based on them, we then propose two general ver-sion derivation algorithms, hierarchy-based derivation and generalized-record-based derivation, which are applicable for all k-anonymity and1-diversity algorithms. We can prove the dis-semination plan optimization problem is NP-hard, and then propose two heuristic-based tree construction strategies.Towards a more practical stream data dissemination application, we make some improve-ments and extensions based on the above anonymized streaming data dissemination model, and propose the communication-delay-constrained and half-fixed-structure dissemination model of anonymized stream data. The model formulation is compatible with both k-anonymity and1-diversity, and the model features are more complicated and practical. After that, we propose the client assignment and plan optimization strategies for this dissemination model.In the end, we propose a pattern-preserving anonymization method for time-series data. Relying on a very generic definition to patterns, we propose a novel anonymization model called (k,P)-anonymity. This model publishes both the attribute values and the patterns of time-series in separate data forms. We demonstrate that our model can prevent linkage attacks on the published data while effectively supporting a wide variety of queries on the anonymized data. Two algorithms are designed to enforce (k,P)-anonymity on time-series data. We also propose the reconstruction techniques for supporting customized data publishing, which allows the values and PRs to be published from different subsets of the QI attributes.

Keywords/Search Tags:

Time-series, Stream, k-anonymity, I-diversity, Generalization, VersionDerivation, Pattern, Dissemination

PDF Full Text Request

Related items

1	Research On Privacy-preserving Data Publishing Algorithms Based On Different Anonymity Requests
2	Research On Anonymity Models And Algorithms For Privacy-Preservation Data Publishing
3	The Research On Time Series Analysis Techniques
4	Research On Anonymity Models And Algorithms For Resisting The Attack Of Sub-trajectory
5	The Application Of Stream Data Time-Series Pattern Reliance Mining In Stock Market Analysis
6	Research And Implementation Of K-anonymity Privacy Protection Algorithm Based On Local Generalization
7	An Improved Clustering Algorithm For Large-scale Time Series Data
8	Real-Time Interpretation And Optimization Of Stream Time Series In Big Data
9	Hierarchical Clustering Algorithm For Mining Frequent Patterns And Time-series Flow
10	Time Series Classification,Retrieval Methods And Applications