Font Size: a A A

Research And Implementation Of Query Privacy Protection For Spatio-temporal Data Based On Spark

Posted on:2016-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:J T YaoFull Text:PDF
GTID:2428330542989423Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the improvement of social informatization,large amount of data is produced everyday.The era of big data is coming.Data is a kind of treasure and how to store and query them effectively has a great significance for future life and production.In recent years,cloud computing and location-based services have got considerable development.Cloud computing,which can provide flexible computing environment,is the effective support of big data.Cloud computing can be extended quickly and automatically to support mobile data stream query processing.and it satisfies the requirements such as dynamic change,response time and processing capacity.Location-based services are gradually enriched in people's life.With the growing demand on privacy,people pay more attention on privacy.However,there still is a gap between intensity of privacy protection and the availability of services.This thesis focus on the related research on CPIR algorithm and cloud platform and proposes a few privacy query algorithms based on Spark and optimizes it.The traditional CPIR algorithm needs to scan the whole data space which causes a large amount of computation so it's not suitable for big data.This thesis proposes three algorithms,the parallel groupping range privacy query algorithm based on Spark,PCPIR-V nearest neighbor privacy query algorithm and PCPIR-V cache optimization algorithm.(1)The range privacy query algorithm divides the grid into different groups in order to reduce the amount of computing and parallellzing computation to improve the efficiency based on Spark.It has a big improvement on server execution time,client execution time and communication cost.(2)PCPIR-V utilizes two parallel strategies,row strategy and bit strategy.The bit strategy divides the row into smaller pieces and improves the efficiency while the grid division is too small.(3)The cache optimized algorithm of PCPIR-V first clusters the CPIR data,then transforms the data into tuples and caches in each cluster.Finally it calulates the result based on the cached tuples and data.It has about 20 percents improvement than PCPIR-V.Despite the PCPIR-V algorithm improves the CPIR performance problems,CPIR needs to scan the whole data space,so it is not practical to calulate the whole data space.This thesis combines CPIR-V and the idea of k-anonymity,and proposes an algorithm KB-CPIR to enhance the CPIR's efficiency in large scale data set.KB-CPIR divides the spatio-temporal data into pieces based on calculation and mapping strategies,then parallellizes computing the CPIR matrix based on Spark..This can avoid the third party trusted server in the traditional k-anonymity framework as well as improves the efficiency of CPIR calculation.The server time of KB-CPIR algorithm has maximum 5 times improvement with the different sizes of data in enaluation.
Keywords/Search Tags:privacy protection, parallel computing, Spark, k-anonymization, spatio-temporal data
PDF Full Text Request
Related items