Research Of Olap Technology Over K-Anonymous Data

Posted on:2015-03-29

Degree:Master

Type:Thesis

Country:China

Candidate:J B Zhang

Full Text:PDF

GTID:2268330425981969

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the developing of internet, users can quickly and easily access to a large scale of shared data. There exists privacy information among these data. In order to protect the privacy, data distributor always taking the process of anonymization and k-anonymization. K-anonymous data is a special kind of uncertain data, possible values in a generalized value have equal probability to be the "real" value. The possible words of an uncertain tuple have equal probability either. The probability of identifying an individual through outer join by quasi-identifiers with Data that satisfies k-anonymous constraint is at lease1/k. K-anonymous is an effective method to protect privacy. However, large amount of uncertainty greatly reduce the availability of these data. It is quite a practical issue on knowledge extraction from K-anonymous data.OLAP(On-line analytical processing) is a major means of knowledge discovery. Current research of OLAP over uncertain data is on the basis that tuple’s possible words have unequal probability. Also, traditional uncertain data has lower uncertainty than k-anonymous data, and has much smaller possible words. Mostly, the uncertainty of traditional uncertain data is not human controlled. For instance, the uncertain data gathered by wireless sensor network inherently has uncertainty. The possibility distribution of all possible value in this kind of uncertain data to be the real value is uncertain either. For those reasons, the accuracy of queries over traditional uncertain data is not human controlled. When gigantic uncertainty exists in traditional uncertain data, the accuracy of queries cannot be guaranteed, there is no quantification indicator of the accuracy too.So means of OLAP over traditional uncertain data is not suitable for k-anonymous data. Take these problem into consideration, we find ways to realize OLAP and mining useful knowledge from k-anonymous data. We call this dig "gold" from "dirt" discarded by people.aggregate query is the basic of OLAP. Efficient aggregate query is the crux to improve the efficiency of OLAP over k-anonymous data. In order to gain time efficiency of the query, we give the definition of Independent Attribute Set with respect to the aggregate query. By using independent Attribute set, we can avoid traversing possible words of a tuples. We give a definition of WITH clause constraint with respect to different relations between Atrribute Region and Attribute Query Region. This enhances the ability of aggregate query. Then we give the properties of the query.Basic operations of OLAP are composed of group aggregate query. With level encoding of user-defined dimension, we give two algorithms of grouping aggregate query over fact table and CUBE respectively. After that, we give two ways of encoding the dimensional level tree. Using the encoding, we can increase the speed of group aggregate query over CUBE. If the data compression rate of the encoding is greater than1, less storage is needed.Research of OLAP the availability of k-anonymous data and make important contribution to the application of k-anonymous data.

Keywords/Search Tags:

k-anonymous data, uncertain data, OLAP, aggregate query, level-treeencoding

PDF Full Text Request

Related items

1	Cleaning And Top-k Querying Uncertain Data With Aggregate Contraints
2	Research On Probabilistic Aggregate Nearest Neighbor Query Method Over Uncertain Data
3	Queries Of The K-anonymous Data
4	Aggregation Query Research Over Continuous Data Streams
5	Design And Implementation Of Uncertain Big Data Analysis Prototype System
6	Research Of Mining Algorithm On K-Anonymous Data Sets
7	Study On Skyline Query Processing Techniques On Uncertain Data
8	Research Of Mining Algorithm Of K-Anonymous Data Sets Based On Generalization Tree
9	Data Provenance Management And Similarity Query Over Uncertain Data
10	The Research Of Key Processing Techniques Of Uncertain Skyline Query