Privacy Preserving Query Processing Over Unstructured Big Data

Posted on:2020-11-09

Degree:Master

Type:Thesis

Country:China

Candidate:W L Yang

Full Text:PDF

GTID:2428330590458393

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Nowadays,with the wide spread of information in society,big data has become an indispensable product of the times.The advantages of big data are becoming obvious,which urge people to explore efficient technologies to process big data,including data storage,query and analysis.Among them,big data query plays an important role.However,when we benefit from big data,it becomes difficult to protect personal privacy.Illegal disclosure of personal sensitive information happens frequently,which leads to the danger of privacy.Therefore,it is urgent to pay more attentions on privacy preserving.This paper finds that most of researches on similarity joins are oriented to an optimization of time.As an essential operator in data mining and analysis,similarity join is resource intensive and time consuming,particularly when dealing with big data.But there is also a need to ensure data confidentiality in similarity joins,as joining between two files may result in personal information disclosure.Based on the above considerations,this paper proposes a MapReduce-based similarity joins with differential privacy technology(hereafter,referred to as PSJoin).The proposed parallel algorithm is designed to achieve high efficiency,in terms of answering similarity join queries privately and effectively.Specifically,the use of PSJoin ensures the preservation of privacy during the similarity join process and in the published results.A new private global ordering approach is presented to deal with disclosure problem in the process,and a differential private similarity function is provided for this algorithm,which is proved to satisfy differential privacy.Finally,these methods are embedded in MapReduce framework to further deal with the bottleneck of big data query.Comprehensive experimental results and analysis on large-scale real-world datasets demonstrate that our method effectively prevents privacy leakage in similarity joins.Compared with traditional similarity joins,the privacy-preserving similarity joins can further improve query efficiency by adjusting to fixed parameters,which guarantees privacy with only minimal accuracy loss in similarity queries,while offering good scalability consistently.

Keywords/Search Tags:

Big Data Query, Privacy Preserving, Similarity Joins, MapReduce, Differential Privacy

PDF Full Text Request

Related items

1	Research On Privacy-preserving Spatio-textual Similarity Joins
2	Research On Privacy Preserving Publishing Of Big Location Data Based On Differential Privacy
3	Adaptive Differential Privacy And Its Applications
4	Research On Differential Privacy Preservation For Data Analysis
5	Research On Privacy-preserving Method For Location Data Query
6	Research On The Privacy-preserving Methods For Location Privacy And Query Privacy
7	A Study On User Data Privacy-Preserving Mechanism With Differential Privacy
8	Trajectory Privacy Preserving Based On Statistical Differential Privacy
9	Research On Key Technologies Of Privacy Preserving Data Mining Based On Local Differential Privacy
10	Feature Selection Algorithm Based On Privacy Preserving